CN112580625A

CN112580625A - Display device and image content identification method

Info

Publication number: CN112580625A
Application number: CN202011459807.3A
Authority: CN
Inventors: 付延松; 穆聪聪
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-30

Abstract

The application provides a display device and an image content identification method, the display device includes: a display; a controller communicatively coupled to the display, the controller configured to: in response to a local recognition instruction, displaying at least one local recognition frame in a target display interface displayed by the display; responding to a confirmation instruction, and sending at least one selected image to a content identification server, wherein the selected image is an image of an area selected by the local identification frame in the target display interface; and controlling the display to display the content information corresponding to the identification result in response to the identification result returned by the content identification server. Therefore, the user can select any area in the target display interface to identify the content according to the self requirement, and the identification effect is good.

Description

Display device and image content identification method

Technical Field

The embodiment of the application relates to a display technology. And more particularly, to a display apparatus and an image content recognition method.

Background

The television is used as a common household appliance in daily life, and can integrate the functions of audio and video, social contact, information acquisition and the like. In order to facilitate users to search audio and video resources or obtain required information, a television generally provides a screenshot recognition function so as to recognize the content being displayed, and the like.

In the related art, generally, content displayed in a current screen of a television is directly used as an image for content identification, and one image may contain many different contents (for example, there may be multiple character avatars or multiple episode recommendation information at the same time) and may also contain some complicated patterns of contents. Due to the fact that the whole image is wide in breadth and large in whole content, the amount of data needing to be identified is large, time consumption of image content identification is increased, the content identified from the image is possibly too much, and the content of the interested part of the user cannot be highlighted.

Disclosure of Invention

The application provides a display device and an image content identification method, which aim to solve the problem that in the related art, when a television is used for identifying the content of an image, the identification effect is poor.

In one aspect, the present application provides a display apparatus, including: a display and a controller, the controller communicatively coupled to the display, the controller configured to perform the steps of: in response to a local recognition instruction, displaying at least one local recognition frame in a target display interface displayed by the display; responding to a confirmation instruction, and sending at least one selected image to a content identification server, wherein the selected image is an image of an area selected by the local identification frame in the target display interface; receiving an identification result returned by the content identification server after the selected image is received; and controlling the display to display the content information corresponding to the identification result.

In some implementations, the local identification box is displayed at a predetermined location in the display.

In some implementations, the local identification box is displayed in the display at a location determined based on interface elements included in the target interface.

In some implementations, the local recognition box is displayed in the display at a location determined based on a location at which a target object recognized by the controller from the target display interface is located.

In some implementations, prior to sending the at least one selected image to the content recognition server, the controller is further configured to: adjusting the number, size, shape or position of the local recognition frames in response to the local recognition frame adjustment instruction.

In some implementations, the controller is further configured to: at least one selected image is sent to a plurality of different types of content recognition servers.

In some implementations, the controller is further configured to: the plurality of different selected images are respectively sent to a plurality of different types of content recognition servers.

In some implementations, in the step of receiving the recognition result returned by the content recognition server after receiving the selected image, the controller is further configured to: and receiving the recognition results returned by the plurality of content recognition servers, wherein each recognition result corresponds to one selected image.

In some implementations, in the controlling the display to display the content information corresponding to the recognition result, the controller is further configured to: controlling the display to display content information corresponding to a first content recognition server recognition result among the plurality of content recognition servers; and responding to a switching instruction, and controlling the display to display content information corresponding to the identification result of a second content identification server in the plurality of content identification servers.

In some implementations, each of the recognition results includes a plurality of sets of result information, each set of result information corresponding to a target object recognized from the selected image.

In another aspect, the present application provides a content identification method, including the steps of: responding to the local identification instruction, and displaying at least one local identification frame in the target display interface; responding to a confirmation instruction, and sending at least one selected image to a content identification server, wherein the selected image is an image of an area selected by the local identification frame in the target display interface; receiving an identification result returned by the content identification server after the selected image is received; and displaying the content information corresponding to the identification result.

In some implementations, before sending the at least one selected image to the content recognition server, further comprising: adjusting the number, size, shape or position of the local recognition frames in response to the local recognition frame adjustment instruction.

In some implementations, sending the at least one selected image to a content recognition server includes: at least one selected image is sent to a plurality of different types of content recognition servers.

In some implementations, sending the at least one selected image to a content recognition server includes: the plurality of different selected images are respectively sent to a plurality of different types of content recognition servers.

In some implementations, receiving the recognition result returned by the content recognition server after receiving the selected image includes: and receiving the recognition results returned by the plurality of content recognition servers, wherein each recognition result corresponds to one selected image.

In some implementations, controlling the display to display content information corresponding to the recognition result includes: controlling the display to display content information corresponding to a first content recognition server recognition result among the plurality of content recognition servers; and responding to a switching instruction, and controlling the display to display content information corresponding to the identification result of a second content identification server in the plurality of content identification servers.

According to the technical scheme, after the quality of the input image is identified, the content of the partial image selected by the user in the display interface, namely the interested part of the user, is identified, and the interested part of the user can be identified.

Drawings

In order to more clearly illustrate the embodiments of the present application or the implementation manner in the related art, a brief description will be given below of the drawings required for the description of the embodiments or the related art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 illustrates a usage scenario of a display device according to some embodiments;

fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;

FIG. 5 illustrates an icon control interface display of an application in display device 200, in accordance with some embodiments;

FIG. 6 illustrates a network architecture diagram of some embodiments;

fig. 7A to 7J are schematic views illustrating display effects of a local identification box in the embodiment of the present application;

fig. 8A to 8C are schematic views illustrating a display effect of content information according to an embodiment of the present application;

fig. 8D is a schematic diagram of a layer structure in an embodiment of the present application;

fig. 9 is a schematic flowchart of an image content identification method according to an embodiment of the present application.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 is a schematic diagram of a usage scenario of a display device according to an embodiment. As shown in fig. 1, the display apparatus 200 is also in data communication with a server 400, and a user can operate the display apparatus 200 through the smart device 300 or the control device 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.

In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.

In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.

In some embodiments, the smart device 300 and the display device may also be used for communication of data.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

In some embodiments, software steps executed by one step execution agent may be migrated on demand to another step execution agent in data communication therewith for execution. Illustratively, software steps performed by the server may be migrated to be performed on a display device in data communication therewith, and vice versa, as desired.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.

In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.

In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.

Fig. 3 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a central processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions receiving external input, so as to finally display and play various audio-video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal displayed or played on the direct display device 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform at least one of noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical buttons on the body of the display device, or the like).

In some embodiments, a system of a display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

In some embodiments, the display device may directly enter the interface of the preset vod program after being activated, and the interface of the vod program may include at least a navigation bar 510 and a content display area located below the navigation bar 510, as shown in fig. 5, where the content displayed in the content display area may change according to the change of the selected control in the navigation bar. The programs in the application program layer can be integrated in the video-on-demand program and displayed through one control of the navigation bar, and can also be further displayed after the application control in the navigation bar is selected.

In some embodiments, the display device may directly enter a display interface of a signal source selected last time after being started, or a signal source selection interface, where the signal source may be a preset video-on-demand program, or may be at least one of an HDMI interface, a live tv interface, and the like, and after a user selects different signal sources, the display may display contents obtained from different signal sources.

For clarity of explanation of the embodiments of the present application, a network architecture provided by the embodiments of the present application is described below with reference to fig. 6.

Referring to fig. 6, fig. 6 is a schematic diagram of a network architecture according to an embodiment of the present application. In fig. 6, the smart device is configured to receive input information and output a processing result of the information; the voice recognition service equipment is electronic equipment with voice recognition service deployed, the semantic service equipment is electronic equipment with semantic service deployed, and the business service equipment is electronic equipment with business service deployed. The electronic device may include a server, a computer, and the like, and the speech recognition service, the semantic service (also referred to as a semantic engine), and the business service are web services that can be deployed on the electronic device, wherein the speech recognition service is used for recognizing audio as text, the semantic service is used for semantic parsing of the text, and the business service is used for providing specific services such as a weather query service for ink weather, a music query service for QQ music, and the like. In one embodiment, there may be multiple business service devices deployed with different business services in the architecture shown in FIG. 6. If no special description is provided, the service device is a server of the present embodiment.

The following describes, by way of example, a process for processing information input to an intelligent device based on the architecture shown in fig. 6, where the information input to the intelligent device is an example of a query statement input by voice, the process may include the following three stages:

1. stage of speech recognition

The intelligent device can upload the audio of the query sentence to the voice recognition service device after receiving the query sentence input by voice, so that the voice recognition service device can recognize the audio as a text through the voice recognition service and then return the text to the intelligent device.

In one embodiment, before uploading the audio of the query statement to the speech recognition service device, the smart device may perform denoising processing on the audio of the query statement, where the denoising processing may include removing echo and environmental noise.

2. Semantic understanding phase

The intelligent device uploads the text of the query sentence identified by the voice identification service to the semantic service device, and the semantic service device performs semantic analysis on the text through semantic service to obtain the service field, intention and the like of the text.

3. Response phase

And the semantic service equipment issues a query instruction to corresponding business service equipment according to the semantic analysis result of the text of the query statement so as to obtain the query result given by the business service. The intelligent device can obtain the query result from the semantic service device and output the query result, for example, the query result is output to the display device in a wireless or infrared mode. As an embodiment, the semantic service device may further send a semantic parsing result of the query statement to the intelligent device, so that the intelligent device outputs a feedback statement in the semantic parsing result. The semantic service equipment can also send the semantic analysis result of the query statement to the display equipment, so that the intelligent equipment outputs the feedback statement in the semantic analysis result.

It should be noted that the architecture shown in fig. 6 is only an example, and is not intended to limit the scope of the present application. In the embodiment of the present application, other architectures may also be used to implement similar functions, which are not described herein.

The display device 200 in the embodiment of the application can enable a user to select any area in a target display interface for content identification according to the needs of the user, the identification effect is good, and the controller 250 in the display device 200 is in communication connection with the display 275 and is configured to execute the process of image content identification. Unless otherwise stated, the steps performed by the display device in the following embodiments are understood to be performed by the controller 250 or the controller 250 cooperating with other components of the display device 200.

The following describes a process of image content identification provided by an embodiment of the present application with reference to the drawings.

The user may send the local identification instruction directly to the display device, or may send the local identification instruction to the display device through another device.

The controller 250 may receive an identification command sent by a user, and the controller 250 may receive the identification command input by the user directly through the user input/output interface of the display device 200, or may receive an identification command sent by the user through another device (e.g., a mobile phone or a remote controller).

The manner or approach of obtaining the identification instruction by the controller 250 is not limited in this application. For example, the user can send a corresponding identification instruction to the display device by pressing a key designated by the remote controller; alternatively, the user may output a voice recognition instruction to the display device, for example, the user may output a voice "who is the person? "where did the piece of clothing buy from? "what is there in the picture? "and the like.

Upon receiving the identification instruction, the controller 250 controls the display 275 to display a local identification frame in the target display interface displayed by the display in response to the received local identification instruction.

The target display interface may be an interface currently displayed by the display 275, such as a user menu interface, an audio/video playing interface, an information displaying interface, and the like, which is not limited in this application. The number of the local identification frames 601 may be one or more, and the sizes or the shapes of the local identification frames may be the same or different; different local identification frames can be not overlapped with each other, and can be completely or partially overlapped. The present application does not limit the position of the local recognition frame displayed on the display 275, nor the shape, size, number, display mode, etc. of the local recognition frame. The target interface may be as shown in the example of FIG. 7A.

The local recognition box may be displayed in a wire frame manner, or may also be displayed or embodied in a manner of a specific display effect (e.g., a relatively high-brightness display, a three-dimensional effect, etc.). For convenience of description, in the embodiment of the present application, only a wire frame is taken as an example for explanation, and does not represent a limitation to the scheme of the present application, and the display effect of the local identification frame in the form of a wire frame may be as shown in the examples of fig. 7B and 7C.

In some embodiments, the display 275 may display other information in addition to displaying the local identification box in the target display interface. For example, a prompt information box and prompt information corresponding to the local identification box may be displayed, where the prompt information may be used to prompt a user of a next operation that may be performed, or may also be used to explain functions that may be implemented by the local identification box. The prompt may be displayed in the prompt information box 602, and the prompt information box 602 may be displayed in the predetermined position, or may be displayed in a position determined based on the local identification box, and the display effect may be as shown in the example of fig. 7D. It should be noted that the local identification frame and the prompt information may be displayed on the same layer, or may be displayed on different layers.

In other embodiments, a "re-capture" button or an "artwork identification" button may be displayed in the prompt information box 602. The user can move the focus to the "re-intercept" button by operating the remote controller direction key, and can return to the state shown in fig. 7A by clicking the "re-intercept" button. Alternatively, the user may move the focus to the "original image recognition" button by operating the direction key of the remote controller, and send the full image of the target display interface to the content recognition server for recognition by clicking the "original image recognition", and the controller 250 sends the full image of the target display interface to the content recognition server after receiving the confirmation instruction. By adopting the mode provided by the embodiment, the user can conveniently select between the full-screen image recognition function and the local image recognition function.

After the local recognition frame is displayed in the target display interface, the user may also send an adjustment instruction, where the adjustment instruction may be used to adjust the shape, size, position, and number of the local recognition frame. The controller 250, upon receiving the adjustment instruction, adjusts the size, shape, position, number, etc. of the local recognition frame based on the content of the adjustment instruction, and controls the display 275 to display the adjusted local recognition frame.

In some embodiments, the position, size, and the like of the local identification frame may be determined according to the position of the focus in the target display interface and may vary with the variation of the focus position, that is, the area selected by the local selection frame is always the area of the focus. For example, when the focus is located in a certain content display window in the target display interface, the selected area of the local identification box may coincide with or contain the content display window, and the effect thereof may be as shown in fig. 7D. When the position of the focus is changed and the focus is moved from one content display window to another content display window, the position and size of the local identification frame are changed, and the effect can be shown in fig. 7E, for example. By adopting the mode, the user can adjust the position of the local identification frame by adjusting the position of the focus, so that the user can use the local identification frame conveniently.

In other embodiments, the local recognition box may be displayed at an initial position in the target display interface in an initial size, and the controller 250 may adjust the position or size of the local recognition box in response to the adjustment instruction. For example, upon receiving a user's recognition instruction, the display 275 may display a local recognition box at a default location of the target display interface (e.g., at the edge or at the midpoint of the target display interface), the effect of which may be shown, for example, in fig. 7F. If the position of the local recognition frame does not meet the user's requirement, the user may send a position or size adjustment instruction of the local recognition frame, and the controller 250 adjusts the position and size of the local recognition frame in response to the adjustment instruction, so that the local recognition frame selects the image content that the user wants to search, and the effect can be shown in fig. 7G, for example.

In still other embodiments, the position and size of the local identification box may be determined according to an interface element included in the target display interface, wherein the interface element may be a content view, a window, a menu, a picture, a text box, or the like. When a plurality of interface elements are displayed in the target display interface, a local identification box may be displayed for one or more interface elements, and the effect of the local identification box may be as shown in fig. 7H. Each local identification frame corresponds to one interface element, and the area selected by the local identification frame is the area where the corresponding interface element is located. Further, the controller 250 may increase or decrease the number of the local recognition boxes in response to an adjustment instruction transmitted by the user, thereby selecting image contents that the user wants to search through the local selection box.

In still other embodiments, the location and size of the local identification box may be determined based on what is displayed according to the target display interface. For example, the controller 250 may preliminarily recognize the target interface and then recognize the target interface according to the target object (e.g., the preliminarily recognized character avatar, animal image, or article image) recognized by the controller 250 from the target display interface. When a plurality of objects are displayed in the object display interface, a local recognition box may be displayed for each object, for example, as shown in fig. 7I. Each local identification frame corresponds to a target object, and the area selected by the local identification frame is the area where the target object is located. Also, the controller 250 may increase or decrease the number of the local recognition frames, adjust the shape or position of the local recognition frames in response to an adjustment instruction transmitted by the user, and thereby select image content desired to be searched through the local selection frame, for example, as shown in fig. 7J.

The user may send a confirmation instruction to the controller 250 after selecting a selected image to be searched through the local recognition box. Controller 250, upon receiving the confirmation instruction, sends the selected image to the content recognition server.

The selected image refers to an image of an area selected by the local identification frame in the target display interface, the confirmation instruction may be sent directly by a user or indirectly through other equipment (for example, a remote controller), and the confirmation instruction may be a single instruction or a combination of multiple instructions. The user can press the key designated by the remote controller or operate the mobile terminal to send a confirmation instruction to the display device through the remote controller or the mobile terminal; alternatively, the user may output a voice confirmation instruction to the display device, and for example, the user may transmit a voice instruction such as "recognize this region" or "confirm" to the display device. The present application does not limit the specific form and the access route of the confirmation instruction.

For example, in the state shown in fig. 7D, the user moves the local recognition frame to the left to the position shown in fig. 7E by operating the direction key of the remote controller, and the user can issue a local recognition instruction to the display device by clicking the confirmation key of the remote controller. The display device can intercept the selected image which is surrounded by the local identification frame and is provided with the characters and the graphs in the selected image according to the received local identification instruction, and send the selected image to the content identification server.

In some embodiments, there is only one of the local identification boxes. In this case, the controller 250 may transmit an image of the selected area of the local recognition frame in the target display interface to the content recognition server after receiving the confirmation instruction. For example, when the local identification box is as shown in fig. 7E, if the controller 250 receives a remote control signal transmitted from the remote controller when the user presses the "OK" key of the remote controller, the selected image may be transmitted to the contents recognition server.

In other embodiments, there may be more than one local identification box. In this case, the controller 250 may select at least one of all the partial recognition frames as a selected partial recognition frame according to the content of the confirmation instruction after receiving the confirmation instruction, and then transmit the image of the selected area of the selected partial recognition frame to the content recognition server. For example, when the local identification box is as shown in fig. 7I, the controller 250 may send a selected image at the current focus to the content recognition server whenever it receives a remote control signal sent by the remote controller when the user presses the "OK" key of the remote controller for a short time; if a remote control signal transmitted from the remote controller when the user presses the "OK" key of the remote controller for a long time is received, all selected images can be transmitted to the contents recognition server.

According to different application scenes or different preset, a plurality of content recognition servers can exist at the same time, the content recognition servers can be of different types, the content recognition servers of different types can be respectively used for recognizing different target objects, and the fed back recognition results are different. For example, the content recognition server may be one or more of various types of a character recognition server, an image recognition server, a multimedia server, a media asset server, a search engine server, and the like. The content recognition server may be configured to recognize different types of recognition targets such as characters, images, articles, and people in the selected image, and feed back corresponding recognition results, and for convenience of description, the first content recognition server and the second content recognition server are hereinafter referred to as different types of content recognition servers, respectively.

The selected image can be sent to only one content identification server, or can be sent to two or more content identification servers simultaneously; when there are a plurality of selected images, the plurality of selected images may be sent to the same content recognition server, or may be sent to different content recognition servers, for example, a first selected image is sent to a first content server, and a second selected image is sent to a second content server. When the selected image is sent to the content identification server, the selected image can be sent to a plurality of content identification servers at the same time, or after the identification result fed back by the first content identification server is received, the second content identification server is determined according to the identification result, and the selected image is sent to the second content identification server. For example, the controller 250 may first transmit the selected image to the person recognition server, and if the recognition result fed back by the person recognition server does not include valid content (e.g., does not include personal information), may then transmit the selected image to the image recognition server.

After receiving the selected image, the content recognition server may perform recognition or other corresponding processing on the selected image to generate a recognition result, and then send the recognition result to the controller 250, where the controller 250 correspondingly receives the recognition result returned by the content recognition server. Wherein the recognition result may include, in addition to the information recognized from the selected image, information obtained by further processing or searching based on the recognized information, such as a search result obtained by searching based on a text recognized from the selected image, or a recommendation medium rated based on an actor recognized from the selected image.

The same target object may correspond to multiple sets of result information. For example, when the content recognition server recognizes two persons from the selected image, two sets of personal information may be included in the recognition result, each set of personal information corresponding to one of the persons; when the content recognition server recognizes a person and an article from the selected image, the recognition result may include a set of personal information, a set of article profile information, and a set of article purchase link information, the personal information corresponding to the person, and the article information and the article purchase link information corresponding to the article.

In some embodiments, the recognition result may include at least one group of personal information, where each group of personal information corresponds to one face image in the selected image, and each group of personal information may include information (e.g., coordinates, height H of the region in which the face image is located, width W of the region in which the face image is located, and the like) for an area in which the face image is located in the selected image, and identity information of the recognized person, and may further include other information such as media asset information acquired based on the identity information. Fig. 7A is a schematic diagram of a recognized face region. In fig. 7A, the region where the face is located is a rectangular region surrounding the face, and the coordinates (X0, Y0) of the upper left-hand corner of the region where the face image is located in the coordinate system of the display 275; the height H0 of the area where the face is located; the width W0 of the area where the face is located. The display device can determine the presentation position of the content information according to the coordinates (X0, Y0), height H0, width W0.

In other embodiments, the identification result may include at least one set of item information, where each set of item information corresponds to an item in the selected image, and each set of item information may include item name information in the selected image, and may further include a product profile or purchase link information of the item or other items similar to the item. Similarly, coordinate information for the area in the selected image in which the image of the item is located may also be included.

In still other embodiments, the identification result may include at least one set of asset recommendation information, where each set of asset recommendation information corresponds to a selected image, and each set of asset recommendation information is used for recommending at least one set of asset recommendation media based on the selected image. For example, a recommended medium determined based on information of actors contained in the selected image, or a recommended medium determined from a character dress or scene contained in the selected image.

After receiving the recognition result, the controller 250 may control the display 275 to display the content information corresponding to the recognition result. Parameters related to content information display, such as a display position, a display mode, and a display duration of the content information, may be set in advance, or may be determined according to a type of the selected image, content included in the recognition result, and the like.

In some embodiments, when the recognition result includes the personal information described in the foregoing embodiments, a face recognition frame may be displayed on the selected image according to information such as coordinates of an area where the face is located, a height H of the area where the face is located, or a width W of the area where the face is located, and identity information of the recognized person may be displayed in the vicinity of the face recognition frame. As shown in fig. 8A, a schematic diagram is shown in which a face recognition frame 701 is displayed on a selected image, and identity information of a recognized person is displayed in the vicinity of the face recognition frame. In fig. 8A, a face recognition frame is displayed in the selected image, and the identification information "zhang san" of the recognized person is displayed in the vicinity of the face recognition frame. In fig. 8B, two identity information, i.e., "zhang san" and "li si" are respectively displayed.

In other embodiments, when the identification result includes at least one set of item information, an item identification frame may be displayed according to coordinate information of an area where the item image is located, and a product profile or purchase link information of the item may be displayed in a predetermined area. The detailed description of the display mode is omitted here.

In some embodiments, if a plurality of sets of result information are included in the recognition result, the controller 250 may control the display 275 to simultaneously display the sets of result information according to a preset rule or display manner; the display 275 may also be controlled to display one or more groups of result information according to a preset rule or display mode, and further may switch to display automatically or switch to display result information of other groups according to a switching instruction sent by a user, or switch to result information in recognition results fed back by other servers.

For example, when the recognition result includes two sets of personal information, i.e., "zhang san" and "lie si", and each set of personal information corresponds to one person, the controller 250 may control the display 275 to display one set of personal information, which may be shown in the example of fig. 8A; alternatively, the display 275 may be controlled to display only the character information of "zhangsi" in the manner shown in the example of fig. 8B, and the character information of "liquad" may be displayed after receiving the switching instruction, and the character information of "zhangsi" may not be displayed any more in the manner shown in the example of fig. 8C.

For another example, when a group of item profile information and a group of item purchase link information corresponding to the same item are included in the recognition result, the controller 250 may control the display 275 to display the item profile information included in the recognition result fed back by the image recognition server according to a preset rule or a display mode; after receiving the switching instruction sent by the user, the controller 250 may control the display 275 to display the identification result fed back by the shopping search engine according to a preset rule or a display mode

It should be noted that the display of the local identification frame, the content information, and other information may be implemented by adding a new layer, for example, as shown in fig. 8D, a layer for displaying a target display interface is layer B, a layer for displaying the local identification frame is layer M, the layer M is stacked on the layer B for display, and a layer T for displaying the content information may be stacked on the layer M. Taking a coordinate system rule defined by an Android system as an example, the Z-axis coordinate value of the layer M is greater than the Z-axis coordinate value of the layer B, and the Z-axis coordinate value of the layer T is greater than the Z-axis coordinate value of the layer M. By adjusting the display parameters of the layer M or the layer T, different display effects can be achieved. For example, other regions in the layer M except for the region where the local identification frame is located may be all set to be semi-transparent, so as to implement the protrusion of the local identification frame.

For example, the graph where the target display interface is located may be layer B, the local identification frame is located in layer M, and the content information is located in layer T. The size of the local identification frame is a fixed size defined by a product or a size which can be issued according to a background; when the local identification frame is moved by a user through a direction key and the like in a local identification selection state, the current custom View redraws, refreshes and displays the local identification frame in an onDraw () method through a fixed step length so as to achieve the effect that the local identification frame moves along with the key operation of the user in real time. The layer M and the layer T can be both layers where the semi-transparent type mask control is located in the local hollow-out periphery, the middle hollow-out position is consistent with the area of the local identification frame through the user-defined View, and the color is transparent; other locations are filled with translucent colors. The layer P can be a layer where buttons and prompt characters are located, the layer is located on the uppermost layer of all layers, and according to a pre-designed definition, corresponding characters, colors and patterns can be filled in a designated position area, and other positions are kept in a transparent state.

In other embodiments, layer M may also be changed from the display local identification frame to display the content information, so that layer T does not need to be displayed again; or the layer P for realizing other functions or effects can be further displayed in a stacking manner on the layer T. By adopting the method, the information such as the local identification frame, the content information and other information can be displayed under the condition that the content displayed on the target display interface is not changed, and the display complexity of the information such as the local identification frame is reduced.

According to the display device provided by the embodiment of the invention, a part of images can be intercepted from the images displayed by the display device to be used as the selected images, the content identification server is used for carrying out content identification on the selected images to obtain the identification result, and then the content information corresponding to the identification result is displayed. Namely, the user can select any area in the selected image to identify the content according to the self requirement, and the identification effect is good. In addition, the local identification frame is used for selecting the local area which is required to be subjected to content identification, compared with a method for carrying out content identification by using the whole image, the method has the advantages that the identification area is reduced, the success rate and the accuracy rate of identification are improved, and interestingness is increased. Furthermore, because the identification area is reduced, when the content identification server is used for identifying the content, the data transmission quantity and the bandwidth pressure can be reduced, and the return speed of the content identification result is improved.

Corresponding to the embodiment of the display device, the application also provides an embodiment of a display method. Embodiments of the display method of the present application will be described below with reference to the drawings.

Referring to fig. 9, a flowchart of an image content identification method provided in the present application is shown. As shown in fig. 9, the method comprises the following steps:

step 901, responding to a local identification instruction, and displaying a local identification frame in a target display interface displayed by the display.

The display device can receive an identification instruction sent by a user, and when the identification instruction is a local identification instruction, a local identification frame is displayed in a target display interface displayed by the display. For the receiving manner of the local identification command, reference may be made to the foregoing embodiments, and details are not repeated here.

The display modes of the local identification frame are various, at least one local identification frame may be added to the content included in the target display interface, or a layer with a local identification frame may be displayed in an overlapping manner on the target display interface displayed by the display, which is not limited in this application.

The display position of the local identification frame in the display can be different according to different application scenes. For example, the local identification frame may be displayed at a predetermined position; or at a location determined based on interface elements contained in the target interface; or the position of the target object identified from the target display interface by the controller is determined, which is not limited in this application.

After the local recognition frame is displayed, a local recognition frame adjusting instruction for adjusting the display position, size, display mode, or the like of the local recognition frame may be further received. The display mode of the adjusted local identification frame, the mode of implementing the adjustment of the local identification frame, and the like can be referred to the foregoing embodiments, and are not described herein again.

At least one selected image is sent to a content recognition server in response to a confirmation instruction, step 902.

After receiving the reverse confirmation instruction, the display device may transmit the at least one selected image to the content recognition server in response to the confirmation instruction.

The number of the selected images and the number of the content identification servers are not limited, and may be one or more, and the contents contained in different selected images may be independent of each other or may overlap. The identification server may also be one or more, and different identification servers may also be used for identifying different types of content from the image or for providing different pieces of information.

Step 903, receiving the identification result returned by the content identification server.

The identification result may include one identification result returned by one identification server, may include multiple identification results returned by one identification server simultaneously or sequentially, and may include multiple identification results returned by multiple identification servers simultaneously or sequentially. Each of the recognition results may include a plurality of sets of result information, each set of result information corresponding to a target object recognized from the selected image, and the plurality of sets of result information corresponding to the same target object.

Step 904, controlling the display to display the content information corresponding to the recognition result.

The content information may be content contained in the result information, that is, only content contained in the result information itself may be displayed. For example, the content information may be text, graphics, images, and the like contained in the content information. Alternatively, the content information may be information that is further generated or acquired based on the result information. For example, the content information may be a graphic or an image generated based on a parameter included in the result information, or page content acquired based on a link included in the result information. Nor is it limited to this application.

Because the identification result may contain more contents, after the identification result is received, all the result information contained in the identification result can be displayed at one time through content information; or the result information contained in one part of the recognition result can be displayed firstly, and then the result information contained in the other part of the recognition result can be displayed; or a part of the result information included in the recognition result may be displayed first, and then another part of the result information included in the recognition result may be displayed.

In some embodiments, the content information corresponding to the first content recognition server recognition result of the plurality of content recognition servers may be displayed first; and after receiving a switching instruction input by a user, responding to the switching instruction, and displaying content information corresponding to a second content identification server identification result in the plurality of content identification servers.

In other embodiments, the content information corresponding to the identification result of the first content identification server in the plurality of content identification servers may be displayed first; and after receiving a switching instruction input by a user, responding to the switching instruction, and displaying content information corresponding to a second content identification server identification result in the plurality of content identification servers.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, characterized in that the display device comprises:

a display;

a controller communicatively coupled with the display, the controller configured to:

in response to a local recognition instruction, displaying at least one local recognition frame in a target display interface displayed by the display;

responding to a confirmation instruction, and sending at least one selected image to a content identification server, wherein the selected image is an image of an area selected by the local identification frame in the target display interface;

receiving an identification result returned by the content identification server after the selected image is received;

and controlling the display to display the content information corresponding to the identification result.

2. The display device of claim 1,

the local identification frame is displayed at a preset position in the display; or,

the local identification box is displayed in a position determined in the display based on an interface element contained in the target interface; or,

the local recognition frame is displayed in the display at a position determined based on a position where a target object recognized by the controller from the target display interface is located.

3. The display device of claim 1, wherein prior to sending the at least one selected image to the content recognition server, the controller is further configured to:

adjusting the number, size, shape or position of the local recognition frames in response to the local recognition frame adjustment instruction.

4. The display device of claim 1, wherein in the step of sending at least one selected image to a content recognition server, the controller is further configured to:

sending the at least one selected image to a plurality of different types of content recognition servers; or,

the plurality of different selected images are respectively sent to a plurality of different types of content recognition servers.

5. The display device of claim 1, wherein in the step of receiving the recognition result returned by the content recognition server after receiving the selected image, the controller is further configured to:

and receiving the recognition results returned by the plurality of content recognition servers, wherein each recognition result corresponds to one selected image.

6. The display device according to claim 5, wherein in the controlling the display to display the content information corresponding to the recognition result, the controller is further configured to:

controlling the display to display content information corresponding to a first content recognition server recognition result among the plurality of content recognition servers;

and responding to a switching instruction, and controlling the display to display content information corresponding to the identification result of a second content identification server in the plurality of content identification servers.

7. The display device of claim 1,

each recognition result comprises a plurality of groups of result information, and each group of information corresponds to a target object recognized from the selected image.

8. The display device according to claim 6, wherein in the controlling the display to display the content information corresponding to the recognition result, the controller is further configured to:

9. An image content recognition method, comprising:

responding to the local identification instruction, and displaying at least one local identification frame in the target display interface;

and displaying the content information corresponding to the identification result.

10. The method of claim 9, further comprising: