CN112329711A

CN112329711A - Screen content identification method, device and computer readable storage medium

Info

Publication number: CN112329711A
Application number: CN202011335590.5A
Authority: CN
Inventors: 白银波
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2021-02-05

Abstract

The invention discloses a screen picture content identification method, which comprises the following steps: collecting eye movement information of a user; determining a visual focus of the user in a preset screen according to the eye movement information; and identifying the focus content corresponding to the position of the visual focus to generate identification information. The invention also discloses a screen picture content identification device, equipment and a computer readable storage medium. Efficient recognition of screen content is achieved.

Description

Screen content identification method, device and computer readable storage medium

Technical Field

The present invention relates to the field of image recognition, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for screen content recognition.

Background

With the development of the smart television, the smart television can provide more and more attentive services for users, such as identifying content presented by the smart television according to the needs of the users.

The existing method for identifying the content presented on the screen generally needs a user to intercept the current picture and manually select the content of the picture to identify the picture, so that the operation of the identification process is complicated and the efficiency is low.

Disclosure of Invention

The invention mainly aims to provide a screen picture content identification method, a screen picture content identification device, screen picture content identification equipment and a computer readable storage medium, and aims to solve the technical problems of complex operation and low efficiency in the existing screen presentation content identification process.

In order to achieve the above object, the present invention provides a screen content identification method, including the steps of: collecting eye movement information of a user;

determining a visual focus of the user in a preset screen according to the eye movement information;

and identifying the focus content corresponding to the position of the visual focus to generate identification information.

In an embodiment, the step of identifying the focus content corresponding to the position where the visual focus is located and generating the identification information includes:

identifying the whole content of the screen picture to generate a classification result;

and identifying the focus content of the position of the visual focus according to the classification result to generate identification information.

In an embodiment, the classifying result is divided into a first class result and a second class result, and the step of identifying the screen where the visual focus is located according to the classifying result to generate the identification information includes:

when the classification result is a first type, identifying by taking the content contained in a first radius area with the visual focus as the circle center as the focus content to generate identification information; or the like, or, alternatively,

and when the classification result is of a second type, identifying by taking the content contained in a second radius area with the visual focus as the center of a circle as focus content to generate identification information.

In an embodiment, before the step of identifying the focus content corresponding to the position where the visual focus is located and generating the identification information, the method includes:

and displaying the visual focus at a corresponding position in the preset screen.

judging whether an identification signal is received or not;

and when receiving the identification signal, executing the step of identifying the focus content corresponding to the position of the visual focus to generate identification information.

In one embodiment, before the step of determining whether the identification signal is received, the method includes:

receiving voice information input by a user;

the step of determining whether the identification signal is received includes:

comparing the voice information with a preset voice library;

if target voice information matched with the voice information exists in the preset voice library, judging that an identification signal is received; or the like, or, alternatively,

and if the target voice information matched with the voice information does not exist in the preset voice library, judging that the identification signal is not received.

capturing blinking motions of a user;

the step of determining whether the identification signal is received includes:

judging whether the blinking motion meets a preset trigger condition;

if the blinking motion meets the preset triggering condition, judging that an identification signal is received; or the like, or, alternatively,

and if the blinking motion does not meet the preset triggering condition, judging that an identification signal is not received.

In addition, to achieve the above object, the present invention also provides a screen content recognition apparatus, including:

the acquisition module is used for acquiring the eye movement information of the user;

the determining module is used for determining the visual focus of the user in a preset screen according to the eye movement information;

and the identification module is used for identifying the focus content corresponding to the position of the visual focus to generate identification information.

In addition, in order to achieve the above object, the present invention also provides a screen content recognition apparatus;

the screen content recognition apparatus includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

the computer program, when executed by the processor, implements the steps of the screen content recognition method as described above.

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium;

the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the screen content recognition method as described above.

The embodiment of the invention provides a method, a device, equipment and a computer readable storage medium for identifying screen picture content, which are characterized in that eye movement information of a user is collected; determining a visual focus of the user in a preset screen according to the eye movement information; and identifying the focus content corresponding to the position of the visual focus to generate identification information, and quickly selecting the content in the screen image according to the eye movement information of the user to identify and generate the identification information, so that the simplification of the user operation in the process of identifying the content of the screen image is realized, and the identification efficiency of the content of the screen image is improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for identifying screen content according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for identifying screen content according to a second embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows:

the invention provides a solution scheme for identifying the content presented on the screen, which is complicated in operation and low in efficiency in the prior art, and the eye movement information of a user is acquired; determining a visual focus of the user in a preset screen according to the eye movement information; and identifying the focus content corresponding to the position of the visual focus to generate identification information, and quickly selecting the content in the screen image according to the eye movement information of the user to identify and generate the identification information, so that the simplification of the user operation in the process of identifying the content of the screen image is realized, and the identification efficiency of the content of the screen image is improved.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a terminal (also called a screen content identification device, where the screen content identification device may be formed by a separate screen content identification device, or may be formed by combining other devices with the screen content identification device) in a hardware operating environment according to an embodiment of the present invention.

The terminal of the embodiment of the invention can be a fixed terminal or a mobile terminal, such as an intelligent air conditioner with a networking function, an intelligent electric lamp, an intelligent power supply, an intelligent sound box, an automatic driving automobile, a Personal Computer (PC), a smart phone, a tablet computer, an electronic book reader, a portable computer and the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a Central Processing Unit (CPU), a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., WIFI interface, WIreless FIdelity, WIFI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, and a WiFi module; the input unit is compared with a display screen and a touch screen; the network interface may optionally be other than WiFi, bluetooth, probe, etc. in the wireless interface. Such as light sensors, motion sensors, and other sensors. In particular, the light sensor may include an ambient light sensor and a proximity sensor; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the computer software product is stored in a storage medium (storage medium: also called computer storage medium, computer medium, readable storage medium, computer readable storage medium, or direct storage medium, etc., and the storage medium may be a non-volatile readable storage medium, such as RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method according to the embodiments of the present invention, and a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a computer program.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a computer program stored in the memory 1005 and execute the steps of the screen content identification method provided by the following embodiment of the present invention.

Referring to fig. 2, in a first embodiment of a method for identifying screen content according to the present invention, the method for identifying screen content includes:

in step S10, eye movement information of the user is collected.

The screen image content recognition device collects the eye movement information of the user by adopting a preset eye movement collecting device, the screen image recognition device can be intelligent devices such as an intelligent television, a PC, an intelligent mobile phone, a PDA (Personal Digital Assistant) and the like, the embodiment is not particularly limited, in order to realize the collection of the eye movement information of the user, the screen image content recognition device is preset with the eye movement collecting device, the eye movement collecting device adopts a pupil and cornea reflection technology, namely, a light source irradiates the eye to generate obvious reflection, a camera is used for collecting the image of the eye with the reflection effects, then the images collected by the camera are used for identifying the reflection of the light source on the cornea and the pupil, the vector of the eye movement can be calculated by the angle between the cornea and the pupil reflection, and then the direction of the vector is combined with the geometric characteristics of other reflections to calculate the direction of the sight line, and then the user viewpoint is known, the preset eye movement acquisition device can be arranged at the upper end and the lower end of the screen picture content identification equipment, or external acquisition is realized through an interface, and based on the technical realization mode, the eye movement information is the reflection image generated on the cornea and the pupil of the eye.

And step S20, determining the visual focus of the user in a preset screen according to the eye movement information.

The screen content recognition device determines a visual focus of the user in a preset screen according to the eye movement information, as described in the previous steps, the eye movement information is a reflection image generated on the cornea and pupil of the eye, the visual focus is the user sight line position calculated by the screen picture content identification device through the eye movement information, the preset image processing algorithm and the preset three-dimensional eyeball model, in a scene identified by a user using the screen content, the user's gaze position corresponds to a point on the screen in the screen content identification device, it is understood that the line of sight of the user may change or remain unchanged at any time when the user watches the eye, and therefore, the eye movement information collected by the preset eye movement collecting device may also change continuously or continuously, and correspondingly, the visual focus may also change continuously or continuously.

Step S30, identifying the focus content corresponding to the position of the visual focus, and generating identification information.

The screen image content identification device identifies the focus content corresponding to the position of the visual focus to generate identification information, as described in the foregoing steps, the visual focus is a point in the screen, and in order to identify the focus content corresponding to the position of the visual focus, an area needs to be determined with the visual focus as a center to identify the focus content, so as to generate the identification information.

Further, before step S30, step a1 may be further included:

step a1, displaying the visual focus at a corresponding position in the preset screen.

The screen picture content identification equipment displays the visual focus and the corresponding position in the preset screen, so that a user can accurately know whether the position of the visual focus is different from the actual position, the user is guided to accurately identify, the visual focus is displayed in the preset screen, namely, the viewpoint is visually displayed, and the accuracy of selecting the screen picture content by the user is improved.

Further, before step S30, steps b1 to b2 may be further included:

step b1, determine whether the identification signal is received.

Step b2, when the identification signal is received, executing the step of identifying the focus content corresponding to the position of the visual focus to generate identification information.

The screen picture content identification device judges whether an identification signal is received, wherein the identification signal can be blink of a user, specific voice input by the user and/or no movement of a visual focus of the user within preset time, when the identification signal is received, the step of identifying the focus content corresponding to the position of the visual focus is executed, and the screen picture content identification device generates identification information.

In the embodiment, the eye movement information of the user is collected; determining a visual focus of the user in a preset screen according to the eye movement information; and identifying the focus content corresponding to the position of the visual focus to generate identification information, and quickly selecting the content in the screen image according to the eye movement information of the user to identify and generate the identification information, so that the simplification of the user operation in the process of identifying the content of the screen image is realized, and the identification efficiency of the content of the screen image is improved.

Further, referring to fig. 3, on the basis of the first embodiment of the present invention, a second embodiment of the method for identifying screen content according to the present invention is further provided, which is different from the first embodiment of the present invention in that this embodiment provides a further implementation method in which step 30 identifies the focus content at the position of the visual focus according to the classification result to generate the identification information in the first embodiment, and the method for identifying screen content includes:

step S31, the whole content of the screen is identified to generate a classification result.

The screen picture content identification device identifies the whole screen picture content to generate a classification result, it can be understood that the whole screen picture content may be a scene picture without people, animals and other specific objects, or a picture containing people, animals or other specific objects, and for the two kinds of pictures, the identification requirements of the user are different, for example, the first picture lacks identifiable entities, the identification requirement of the user tends to acquire which scenery spot the scene displayed by the picture is or at what position, and the second picture contains specific entities, the identification requirement of the user tends to acquire who the people are and what kind of animals are or names of objects in the picture, therefore, in order to realize the efficient identification of the screen picture content, the whole screen picture content is identified first, and the identification is used for capturing whether the entities exist in the whole screen picture, detailed descriptions of specific implementations of entities in captured screens are omitted herein, as well as mature solutions in the prior art.

Step S32, identifying the focus content of the position of the visual focus according to the classification result, and generating identification information.

The screen image content identification device identifies the focus content at the position of the visual focus according to the classification result to generate identification information, as described in the foregoing steps, the visual focus is a point in the screen, in order to identify the focus content corresponding to the position of the visual focus, an area needs to be determined with the visual focus as a center to be identified as the focus content, and then identification information is generated, and the classification result provides an implementation manner that an area is determined with the visual focus as a center to be the focus content.

Specifically, in step S32, identifying the focus content at the position of the visual focus according to the classification result, and generating the identification information may include steps c1 to c 2:

and c1, when the classification result is the first type, identifying by taking the content contained in the first radius area with the visual focus as the circle center as the focus content to generate identification information.

And c2, when the classification result is a second type, identifying by taking the content contained in a second radius area with the visual focus as the center of a circle as the focus content to generate identification information.

When the classification result is a first type, the screen picture content identification equipment takes the content contained in a first radius area with the visual focus as the circle center as the focus content for identification to generate identification information; when the classification result is a second type, the screen content identification device identifies, by using content included in a second radius area where the visual focus is a circle center as focus content, to generate identification information, where the first type is the screen lacking the entity indicated in step S31, and the second type is the screen including the entity indicated in step S31, where the first radius is larger than the second radius, and the identification process may generate the identification information after searching and identifying the determined focus content by using a background search engine.

In the embodiment, whether the whole content of the screen picture contains the entity or not is preliminarily determined after the whole content of the screen picture is integrally identified, and then the focus content is determined to be identified, so that the fact that the entity exists and the entity-free picture are identified by adopting different identification strategies respectively is realized, and the identification accuracy and efficiency are improved.

Further, on the basis of the above embodiment of the present invention, a third embodiment of the screen content identification method of the present invention is further proposed, where the screen content identification method includes:

before the step b1, a step b11 may be included:

step b11, receiving the voice information input by the user.

At this time, step b1 can be replaced by steps b12-b 14:

step b12, comparing the voice information with a preset voice database.

Step b12, if target voice information matched with the voice information exists in the preset voice library, judging that an identification signal is received.

Step b14, if the preset voice library does not have the target voice information matched with the voice information, judging that the identification signal is not received.

The screen picture content recognition equipment receives voice information input by a user, compares the voice information with a preset voice library, judges that a recognition signal is received if target voice information matched with the voice information exists in the preset voice library, and judges that the recognition signal is not received if the target voice information matched with the voice information does not exist in the preset voice library.

In the embodiment, the screen picture content recognition device receives the voice information of the user as the recognition signal, and then recognizes the screen picture content at the current visual focus when receiving the recognition signal, so that the recognition operation is simplified, and the recognition accuracy and efficiency are further improved.

Further, on the basis of the above embodiment of the present invention, a fourth embodiment of the screen content identification method of the present invention is further proposed, where the screen content identification method includes:

before the step b1, a step b15 may be included:

step b15, the blinking motion of the user is captured.

At this time, step b1 can be replaced by steps b16-b 18:

and b16, judging whether the blinking motion meets a preset triggering condition.

Step b17, if the blinking motion meets the preset triggering condition, determining that an identification signal is received.

Step b18, if the blinking motion does not meet the preset triggering condition, determining that an identification signal is not received.

The method comprises the steps that a screen picture content recognition device captures blinking actions of a user, whether the blinking actions meet preset trigger conditions or not is judged, if the blinking actions meet the preset trigger conditions, the screen picture content recognition device judges that an identification signal is received, if the blinking actions do not meet the preset trigger conditions, the screen picture content recognition device judges that the identification signal is not received, wherein mistaken touch recognition is caused by conventional blinking actions of the user, the preset trigger conditions can be set to be that the blinking actions of the user are continuously detected twice, and the preset trigger conditions can be set according to practical application scenes.

In the embodiment, the screen picture content identification device identifies the screen picture content at the current visual focus by capturing the blinking motion of the user as the identification signal and then receiving the identification signal, so that the identification operation is simplified, and the identification accuracy and efficiency are further improved

In addition, an embodiment of the present invention further provides a screen content recognition apparatus, where the screen content recognition apparatus includes:

Further, the identification module further comprises a classification unit and an identification unit;

the classification unit is used for identifying the whole content of the screen picture to generate a classification result;

and the unit is used for identifying the focus content of the position of the visual focus according to the classification result to generate identification information.

Further, the identification unit further comprises a first identification subunit and a second identification subunit;

the first identification subunit is configured to, when the classification result is a first type, identify, with content included in a first radius area in which the visual focus is a circle center as focus content, to generate identification information;

and the second identification subunit is used for identifying, when the classification result is of a second type, by using the content contained in a second radius area with the visual focus as a circle center as focus content, and generating identification information.

Furthermore, the screen content recognition device further comprises a display module;

and the display module is used for displaying the visual focus at a corresponding position in the preset screen.

Furthermore, the screen content recognition device further comprises a judgment module;

the judging module is used for judging whether the identification signal is received.

Furthermore, the screen picture content identification device also comprises a receiving module, and the judging module also comprises a comparing unit and a first judging unit;

the receiving module is used for receiving voice information input by a user;

the comparison unit is used for comparing the voice information with a preset voice library;

the judging unit is used for judging that an identification signal is received if target voice information matched with the voice information exists in the preset voice library; or the like, or, alternatively,

Further, the screen content recognition device further comprises a capture module, and the judgment module judges the sub-unit and the second judgment unit;

the capturing module is used for capturing the blinking motion of the user;

the second determination unit is used for determining that an identification signal is received if the blinking motion meets the preset trigger condition; or the like, or, alternatively,

In addition, an embodiment of the present invention further provides a screen content recognition device, where the screen content recognition device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

the computer program, when executed by the processor, implements the steps of the screen content identification method as described in the above embodiments.

In addition, the embodiment of the invention also provides a computer readable storage medium.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the operations in the screen content identification method provided by the above-described embodiments.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects; the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

For the apparatus embodiment, since it is substantially similar to the method embodiment, it is described relatively simply, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, in that elements described as separate components may or may not be physically separate. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A screen content identification method is characterized by comprising the following steps:

collecting eye movement information of a user;

2. The method for identifying screen content according to claim 1, wherein the step of identifying the focus content corresponding to the position of the visual focus to generate the identification information comprises:

3. The method for identifying screen content according to claim 2, wherein the classification result is divided into a first type result and a second type result, and the step of identifying the screen at the position of the visual focus according to the classification result to generate the identification information comprises:

4. The method for identifying screen content according to claim 1, wherein the step of identifying the focus content corresponding to the position of the visual focus to generate the identification information comprises:

5. The method for identifying screen content according to claim 1, wherein the step of identifying the focus content corresponding to the position of the visual focus to generate the identification information comprises:

judging whether an identification signal is received or not;

6. The screen content recognition method of claim 5, wherein said step of determining whether an identification signal is received is preceded by:

receiving voice information input by a user;

the step of determining whether the identification signal is received includes:

comparing the voice information with a preset voice library;

7. The screen content recognition method of claim 5, wherein said step of determining whether an identification signal is received is preceded by:

capturing blinking motions of a user;

the step of determining whether the identification signal is received includes:

judging whether the blinking motion meets a preset trigger condition;

8. A screen content recognition apparatus, comprising:

9. A screen content recognition device, characterized by comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

the computer program realizing the steps of the screen content recognition method according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the screen content identification method according to one of claims 1 to 7.