CN110502117B

CN110502117B - Screenshot method in electronic terminal and electronic terminal

Info

Publication number: CN110502117B
Application number: CN201910789435.1A
Authority: CN
Inventors: 徐苏琴; 杨建军; 李斌; 俞斐
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2023-04-25
Anticipated expiration: 2039-08-26
Also published as: CN110502117A

Abstract

The invention provides a screenshot method in an electronic terminal and the electronic terminal, wherein the screenshot method comprises the following steps: receiving a preset input for triggering screen capturing, and entering a screen capturing state; determining the type of the current display interface in the screen capturing state; tracking eye gaze position and determining eye gaze position information; a screenshot is generated based on the determined type of current display interface and eye gaze location information. By adopting the screenshot method in the electronic terminal and the electronic terminal provided by the invention, the steps of screenshot operation can be effectively simplified.

Description

Screenshot method in electronic terminal and electronic terminal

Technical Field

The present invention relates generally to the field of electronic technology, and more particularly, to a screenshot method in an electronic terminal and the electronic terminal.

Background

Currently, in existing screenshot modes, a screenshot operation is generally triggered by combining key or gesture inputs to obtain a screenshot. When a user wants to acquire a long screenshot, the user is generally required to manually select a plurality of pictures to generate the long screenshot, or the user manually scrolls a page to generate the long screenshot, so that the operation process is complicated.

When a user wishes to capture a video, a capture operation is typically triggered during the video playing process to capture the corresponding video content to obtain a capture. However, the screenshot mode may intercept a blurred frame image, so that a user needs to adjust video playing time to intercept a clear frame image, and re-screenshot is performed, which is very inconvenient to operate.

When a user wants to intercept continuous multi-frame images in a video, the user needs to continuously and repeatedly execute a plurality of screenshot operations, and the process is complicated. In addition, when the user wants to combine the intercepted multi-frame images into one image, the method can be realized only by means of image processing software, and the operation is very tedious.

Disclosure of Invention

It is an object of exemplary embodiments of the present invention to provide a screenshot method in an electronic terminal and an electronic terminal, which overcome at least one of the above-mentioned drawbacks.

According to an aspect of an exemplary embodiment of the present invention, there is provided a screenshot method in an electronic terminal, including: receiving a preset input for triggering screen capturing, and entering a screen capturing state; determining the type of the current display interface in the screen capturing state; tracking eye gaze position and determining eye gaze position information; a screenshot is generated based on the determined type of current display interface and eye gaze location information.

Optionally, the predetermined input may include at least one of: input of at least one physical key, gesture input, touch input, voice input.

Optionally, the step of determining the type of the current display interface may include: and determining the type of the current display interface according to the type of the application to which the current display interface belongs and/or the content layout of the current display interface.

Optionally, the type of the current display interface may include a video class, and the video class display interface may include video therein, wherein the step of generating the screenshot based on the determined type of the current display interface and the eye gaze location information may include: according to the eye sight position information, determining the concerned position of the eye sight in the video; and generating a screenshot based on at least one key frame image comprising content corresponding to the focus position in the video.

Alternatively, the video may be formed of a plurality of frame images, and the plurality of frame images may include a plurality of key frame images and a plurality of intermediate frame images, where any key frame image is a complete picture, and any intermediate frame image includes image change information with respect to a corresponding key frame.

Optionally, the step of generating a screenshot based on at least one key frame image in the video that includes content corresponding to the location of interest may include: if the focus position of the eye sight in the video is determined to be the region where the caption in the video is located according to the eye sight position information, acquiring a key frame image at the moment of triggering screen capturing; determining a time value corresponding to the last key frame image of the same scene as the key frame image at the moment of triggering screen capturing, and taking the determined time value as a cut-off moment; extracting a plurality of subtitles from the subtitle file of the video from the trigger screen capturing moment to the cut-off moment; screening out captions expressing a complete semantic meaning from the plurality of captions; and generating a screenshot in a splicing mode by at least one key frame image corresponding to the screened caption.

Optionally, the step of screening subtitles that represent a complete semantic meaning from the plurality of subtitles may include: according to the display time of each caption, the following steps are executed for two adjacent captions with the display time in a reverse order: calculating the time interval between two adjacent subtitles in display time, deleting one subtitle in the rear display time and other subtitles displayed after one subtitle in the rear display time if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is larger than or equal to preset time, and reserving the two subtitles if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is smaller than the preset time; and screening out the subtitles expressing a complete semantic by carrying out semantic analysis on all the reserved subtitles.

Optionally, the at least one key frame image may include a key frame image, wherein the step of generating the screenshot in a stitching manner of the at least one key frame image corresponding to the selected subtitle may include: the screenshot is generated in the form of a subtitle superimposition on the basis of the one key frame image.

Optionally, the at least one key frame image may include a plurality of key frame images, wherein the step of generating the screenshot in a stitching manner of the at least one key frame image corresponding to the selected subtitle may include: and splicing the plurality of key frame images in time sequence to generate a screenshot, or selecting a representative key frame image from the plurality of key frame images, and generating the screenshot in the form of subtitle superposition on the basis of the representative key frame image.

Optionally, the step of selecting a representative key frame image from the plurality of key frame images may include: and selecting one key frame image with the closest time value to the middle time of the time period from the triggering screen capturing time to the cut-off time from the plurality of key frame images as a representative key frame image, or selecting one key frame image with the smallest time value from the plurality of key frame images as the representative key frame image.

Optionally, the screenshot method may further include: if the focus position of the eye sight in the video is not the area where the subtitle is located in the video according to the eye sight position information, identifying an object corresponding to the focus position of the eye sight; acquiring at least one key frame image in a preset time period from the moment of triggering screen capturing; searching candidate key frame images containing the object from the at least one obtained key frame image; removing key frame images with high similarity from the searched candidate key frame images; and generating a screenshot in a splicing mode by using the key frame images which are reserved in the candidate key frame images.

Optionally, the type of the current display interface may include a text class, and text information may be included in the text class display interface, wherein the step of generating the screenshot based on the determined type of the current display interface and the eye gaze location information may include: determining the moving direction of the eye sight according to the eye sight position information; switching the display interface according to the moving direction of the eye sight, and acquiring an image of the display interface corresponding to each switching; and when the display interface switching is finished, forming screenshot of the images of the display interfaces according to the display sequence.

Optionally, the type of the current display interface may include a control class, and a control may be included in the control class display interface, wherein the step of generating the screenshot based on the determined type of the current display interface and the eye gaze location information may include: according to the eyeball line-of-sight position information, controlling the electronic terminal to execute at least one operation step, wherein a control is included on a display interface corresponding to any operation step in the at least one operation step, and any operation step is executed by operating the control according to the eyeball line-of-sight position information; acquiring an image of a display interface corresponding to each operation step executed by the electronic terminal; and forming a screenshot of the images of the display interface corresponding to the at least one operation step according to the execution sequence of the operation steps.

Optionally, the step of acquiring the image of the display interface corresponding to each operation step performed by the electronic terminal may include: adding an identifier at the position of the control on the acquired image of the display interface corresponding to any operation step; and taking the image added with the identifier as an image of a display interface corresponding to any operation step.

According to an aspect of exemplary embodiments of the present invention, there is provided an electronic terminal comprising an input interface, an eye tracking unit, and a processor, wherein the input interface receives a predetermined input for triggering a screen capture, the processor being configured to: in response to a received predetermined input, controlling the electronic terminal to enter a screen capturing state, and determining a type of a current display interface in the screen capturing state, the eye tracking unit tracking an eye gaze position and providing eye gaze position information, wherein the processor is further configured to: a screenshot is generated based on the determined type of current display interface and eye gaze location information.

Optionally, the process of determining the type of the current display interface may include: and determining the type of the current display interface according to the type of the application to which the current display interface belongs and/or the content layout of the current display interface.

Alternatively, the eye tracking unit may include: an infrared light source that emits infrared light toward the eyes of a user; the infrared camera receives infrared light reflected by the pupil of the eyeball of the user and provides eyeball line-of-sight position information.

Optionally, the type of the current display interface may include a video class, and the video class display interface may include video therein, wherein the generating of the screenshot based on the determined type of the current display interface and the eye gaze location information may include: according to the eye sight position information, determining the concerned position of the eye sight in the video; and generating a screenshot based on at least one key frame image comprising content corresponding to the focus position in the video.

Optionally, the process of generating a screenshot based on at least one key frame image in the video that includes content corresponding to the location of interest may include: if the focus position of the eye sight in the video is determined to be the region where the caption in the video is located according to the eye sight position information, acquiring a key frame image at the moment of triggering screen capturing; determining a time value corresponding to the last key frame image of the same scene as the key frame image at the moment of triggering screen capturing, and taking the determined time value as a cut-off moment; extracting a plurality of subtitles from the subtitle file of the video from the trigger screen capturing moment to the cut-off moment; screening out captions expressing a complete semantic meaning from the plurality of captions; and generating a screenshot in a splicing mode by at least one key frame image corresponding to the screened caption.

Optionally, the process of screening subtitles that represent a complete semantic meaning from the plurality of subtitles may include: the following processing is performed for two subtitles adjacent to each other in the display time of each subtitle in reverse order: calculating the time interval between two adjacent subtitles in display time, deleting one subtitle in the rear display time and other subtitles displayed after one subtitle in the rear display time if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is larger than or equal to preset time, and reserving the two subtitles if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is smaller than the preset time; and screening out the subtitles expressing a complete semantic by carrying out semantic analysis on all the reserved subtitles.

Optionally, the at least one key frame image may include a key frame image, where generating the screenshot by stitching the at least one key frame image corresponding to the selected subtitle may include: the screenshot is generated in the form of a subtitle superimposition on the basis of the one key frame image.

Optionally, the at least one key frame image may include a plurality of key frame images, where generating the screenshot in a stitching manner of the at least one key frame image corresponding to the selected subtitle may include: and splicing the plurality of key frame images in time sequence to generate a screenshot, or selecting a representative key frame image from the plurality of key frame images, and generating the screenshot in the form of subtitle superposition on the basis of the representative key frame image.

Optionally, the selecting a representative key frame image from the plurality of key frame images may include: and selecting one key frame image with the closest time value to the middle time of the time period from the triggering screen capturing time to the cut-off time from the plurality of key frame images as a representative key frame image, or selecting one key frame image with the smallest time value from the plurality of key frame images as the representative key frame image.

Optionally, the processor may be further configured to: if the focus position of the eye sight in the video is not the area where the subtitle is located in the video according to the eye sight position information, identifying an object corresponding to the focus position of the eye sight; acquiring at least one key frame image in a preset time period from the moment of triggering screen capturing; searching candidate key frame images containing the object from the at least one obtained key frame image; removing key frame images with high similarity from the searched candidate key frame images; and generating a screenshot in a splicing mode by using the key frame images which are reserved in the candidate key frame images.

Optionally, the type of the current display interface may include a text class, text information may be included in the text class display interface, wherein the electronic terminal may further include a display screen, and the generating the screenshot based on the determined type of the current display interface and the eye gaze location information may include: determining the moving direction of the eye sight according to the eye sight position information; controlling a display screen to switch a display interface according to the moving direction of the eye sight, and acquiring an image of the display interface corresponding to each switching; and when the display interface switching is finished, forming screenshot of the images of the display interfaces according to the display sequence.

Optionally, the type of the current display interface may include a control class, and a control may be included in the control class display interface, where the generating of the screenshot based on the determined type of the current display interface and the eye gaze location information may include: according to the eyeball line-of-sight position information, controlling the electronic terminal to execute at least one operation step, wherein a control is included on a display interface corresponding to any operation step in the at least one operation step, and any operation step is executed by operating the control according to the eyeball line-of-sight position information; acquiring an image of a display interface corresponding to each operation step executed by the electronic terminal; and forming a screenshot of the images of the display interface corresponding to the at least one operation step according to the execution sequence of the operation steps.

Optionally, the process of acquiring the image of the display interface corresponding to each operation step performed by the electronic terminal may include: adding an identifier at the position of the control on the acquired image of the display interface corresponding to any operation step; and taking the image added with the identifier as an image of a display interface corresponding to any operation step.

According to another aspect of exemplary embodiments of the present invention, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described screenshot method in an electronic terminal.

By adopting the screenshot method in the electronic terminal and the electronic terminal provided by the invention, the steps of screenshot operation can be effectively simplified.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The foregoing and other objects, features, and advantages of exemplary embodiments of the invention will become more apparent from the following detailed description, taken in conjunction with the accompanying drawings that illustrate exemplary embodiments in which:

FIG. 1 shows a flowchart of a screenshot method in an electronic terminal according to an exemplary embodiment of the invention;

FIG. 2 illustrates a first exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention;

fig. 3 is a flowchart illustrating the steps of screening subtitles that represent a complete semantic meaning from a plurality of subtitles according to an exemplary embodiment of the invention;

fig. 4 illustrates an exemplary diagram for generating a screenshot in the form of a subtitle superimposition according to an exemplary embodiment of the present invention;

FIG. 5 illustrates an example diagram of a plurality of key frame images spliced chronologically to generate a screenshot in accordance with an example embodiment of the invention;

Fig. 6 to 8 illustrate first application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention;

fig. 9 and 10 illustrate second application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention;

fig. 11 and 12 are diagrams showing a third application example of the screenshot method in the electronic terminal according to an exemplary embodiment of the present invention;

FIG. 13 illustrates a second exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention;

fig. 14 and 15 show fourth application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention;

FIG. 16 illustrates a third exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention;

fig. 17 shows a fifth application example diagram of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention;

FIG. 18 illustrates a fourth exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention;

Fig. 19 illustrates a sixth application example diagram of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention;

fig. 20 shows a block diagram of an electronic terminal according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments will be described below in order to explain the present invention by referring to the figures.

Fig. 1 shows a flowchart of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

Referring to fig. 1, in step S10, a predetermined input for triggering a screen capture is received, and a screen capture state is entered.

As an example, the predetermined input may include at least one of: input of at least one physical key, gesture input, touch input, voice input. It should be appreciated that the predetermined input for triggering a screenshot is not limited to the several ways listed above, but may be triggered in other ways as well.

In step S20, in the screen capturing state, the type of the current display interface is determined.

In one example, the type of the current display interface may be determined based on the type of application to which the current display interface belongs. Here, the type of the application to which the current display interface belongs may be determined in various ways, which the present invention is not limited to.

In an example, taking an Android system as an example of an electronic terminal, the type of an application to which a current display interface belongs can be determined through an Activity Manager (Activity Manager).

manager= (activity manager) getSystemService (context. Activity_service); string className = info. Toppaper nature. Getclassname (); class name of// complete

The type of application can be obtained by parsing the class name. It should be appreciated that the above manner of determining the type of application is merely an example, and that the type of application may also be determined in other manners.

In another example, the type of the current display interface may be determined based on the content layout of the current display interface.

Here, the content layout of the current display interface may be determined in various manners, for example, the content layout of the current display interface may be determined by parsing a description file of the current display interface, which is not limited in the present invention.

Preferably, the type of the current display interface can be determined more accurately by combining the two judging methods.

In step S30, the eye gaze position is tracked, and eye gaze position information is determined.

Here, the eye gaze position is tracked in the screen capturing state in step S30. It should be understood that, in the exemplary embodiment of the present invention, step S30 may be performed first and then step S20 may be performed, or step S20 and step S30 may be performed simultaneously in the screen capturing state, and the present invention is not limited to the execution sequence between step S20 and step S30.

For example, eye gaze location information may be determined by tracking eye gaze location in a variety of ways.

In one example, the eye gaze position is tracked with a front facing camera of the electronic terminal.

For example, the electronic terminal may have a front camera, and in a screen capturing state, the front camera of the electronic terminal may be controlled to be turned on, and capture an image of the user's eye, track the eye gaze position by recognizing the captured image of the user's eye, and determine the eye gaze position information.

The method of identifying the image of the eyes of the user to track the eye gaze position is common knowledge in the art, and the disclosure of this part is not repeated here.

In another example, an eye sensor (also may be referred to as an eye tracking sensor) of the electronic terminal is utilized to track eye gaze position.

For example, the electronic terminal may have an eye sensor that may be controlled to be turned on in a screen capturing state, and the eye sensor may track an eye gaze position and provide eye gaze position information.

The method of tracking the eye gaze position by the eye sensor is common knowledge in the art, and the disclosure of this part is not repeated here.

In yet another example, an infrared light source and an infrared camera of the electronic terminal are utilized to track eye gaze location.

For example, an electronic terminal may have an infrared light source and an infrared camera. In this case, the infrared light source may emit infrared light toward the eyes of the user, and the infrared camera receives the infrared light reflected by the pupils of the eyes of the user and provides eye gaze position information.

In step S40, a screenshot is generated based on the determined type of the current display interface and eyeball line-of-sight position information.

Here, the attention position of the eye gaze may be determined according to the eye gaze position information, and in addition, the movement direction of the eye gaze may be determined according to the eye gaze position information, so as to control the electronic terminal to perform the page turning operation or the sliding operation based on the movement direction of the eye gaze.

By way of example, in an exemplary embodiment of the present invention, the types of the current display interface may include a video class, a text class, and a control class. It should be understood that the types of the current display interface may not be limited to the three types listed above, but may also include other types, and the screenshot method in the electronic terminal described above may also be used for other types of current display interfaces to obtain the screenshot.

The process of generating the screenshot is described below for different types of current display interfaces, respectively.

In the first embodiment, the process of generating the screenshot is described taking the type of the current display interface as a video class as an example. Here, the video class display interface includes video.

For the current display interface of the video class, the screenshot may be generated by: according to the eyeball line of sight position information, the attention position of the eyeball line of sight in the video (namely, the coordinate position of the attention point of the eyeball line of sight in the video) is determined, and a screenshot is generated based on at least one key frame image which comprises the content corresponding to the attention position in the video.

It should be appreciated that the video is formed from a plurality of frame images including a plurality of key frame images, any key frame image being a complete picture, and a plurality of intermediate frame images including image change information relative to the corresponding key frame.

For example, an identifier may be set in advance for a key frame image in a video, and the key frame image may be extracted from the video based on the set identifier. In addition, the key frame image may be extracted from the video in other ways, for example, the key frame image may be extracted by: sampling-based methods, shot-based methods, color feature-based methods, motion analysis-based methods, clustering-based methods. It should be appreciated that the manner in which the key frame images are extracted is not limited to the several methods listed above, and that other methods may be employed to extract key frame images from video.

For example, the screenshot may be generated in different ways depending on the location of interest of the eye's gaze in the video.

In the first case, the focus position of the eye sight in the video is determined as the area where the caption is located in the video according to the eye sight position information.

In this case, the screenshot may be generated from at least one key frame image included in the video that corresponds to a subtitle that triggers the subtitle at the moment of screenshot to express a full semantic.

A process of generating a screenshot based on a current display interface of a video class and eye gaze position information when a focus position of an eye gaze in a video is an area where a subtitle is located in the video will be described below with reference to fig. 2.

Fig. 2 shows a first exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention.

Referring to fig. 2, in step S401, a key frame image at the time of triggering a screen capture is acquired.

In general, one video corresponds to one subtitle file, and one subtitle file includes a plurality of subtitles, each of which is displayed according to a display time. For example, if a video is played to a certain moment, the certain moment is within the display time of a certain caption, the certain caption is displayed on the display screen, and when the video is played to the end time of the certain caption, the certain caption is not displayed any more.

For example, each subtitle may include a start time (begin time), an end time (endTime), a subtitle attribute, a string, a character body (strBody).

As an example, the format of the subtitle file may be as follows:

00：00：02，000—>00：00：7，999

hint subtitle information 1 >

……

00：20：02，000—>00：20：7，999

hint subtitle information 100 >

……

The format of a subtitle may be as follows:

00：20：02，000—>00：20：7，999

hint subtitle information 100 >

Taking the above example as an example, "00:20:02 000 "represents the start time of the subtitle," 00:20:7,999 "indicates the end time of the caption, the" prompt caption information "is a character string, i.e., the displayed caption content," "indicates the start identifier of the character string, and indicates the end identifier of the character string.

After the screenshot is triggered, one subtitle displayed at the time of triggering the screenshot may be searched for from a subtitle file corresponding to the video, and a display time of the one subtitle (i.e., a period from a start time to an end time of the one subtitle) may be determined, and a key frame image at the display time of the one subtitle may be searched for from the video as the key frame image at the time of triggering the screenshot.

If only one key frame image is searched, the searched one key frame image is taken as the key frame image at the moment of triggering the screen capturing. If multiple key frame images are searched, a designated key frame image of the multiple key frame images may be used as the key frame image at the time of triggering the screen capture.

As an example, the designated key frame image may be determined by: and selecting one key frame image with the time value closest to the middle time of the display time of the caption from the searched plurality of key frame images as a designated key frame image. Or selecting one key frame image with the smallest time value from the searched plurality of key frame images as the designated key frame image.

If no key frame image is searched, selecting a key frame image with the nearest time value before the display time of the caption as the key frame image at the moment of triggering the screen capturing.

In step S402, a time value corresponding to the last key frame image of the same scene as the key frame image at the time of triggering the screen capturing is determined, and the determined time value is taken as the cutoff time.

Here, it is considered that a complete semantic is generally expressed in one scene of a video, and thus, it can be considered to search for subtitles expressing a complete semantic from key frame images having the same scene.

For example, the acquired key frame image at the time of triggering the screen capturing may be compared with other key frame images having time values after the time of triggering the screen capturing to determine whether the scene is transformed, and when it is determined that the scene is transformed, the time value corresponding to the last key frame image having the same scene is taken as the cutoff time. Here, whether or not the two images are transformed may be determined using various image recognition processing methods, which the present invention is not limited to.

In step S403, a plurality of subtitles from the trigger screen capturing time to the off time are extracted from the subtitle file of the video.

In step S404, subtitles that express a complete semantic meaning are screened out from the plurality of subtitles.

Fig. 3 shows a flowchart of the steps of screening subtitles that represent a complete semantic meaning from a plurality of subtitles according to an exemplary embodiment of the invention.

Referring to fig. 3, in step S41, a time interval between the i-th subtitle and the i-1 th subtitle is calculated.

Here, the i-1-th caption may refer to two captions adjacent to each other in display time, the i-1-th caption may refer to one caption after the display time among the two captions adjacent to each other in display time, and the i-1-th caption may refer to one caption before the display time among the two captions adjacent to each other in display time.

In step S42, it is determined whether the time interval between the ith subtitle and the i-1 th subtitle is less than a preset time.

If the time interval is not less than (greater than or equal to) the preset time, step S43 is performed: the ith subtitle is deleted, and at the same time, other subtitles displayed after the ith subtitle are also deleted.

In step S44, i=i-1 is made, and step S41 is executed back.

If the time interval is less than the preset time, step S45 is performed: and carrying out semantic analysis on the ith subtitle and the i-1 th subtitle.

In step S46, it is determined whether the semantic content expressed by the two subtitles is identical.

Here, various semantic analysis methods may be utilized to determine whether the semantic content expressed by the two subtitles is identical. For example, the semantic analysis may be performed using a recurrent neural network, but the present invention is not limited thereto, and the semantic analysis may be performed in other ways.

If the i-th subtitle and the i-1 th subtitle do not express the same semantic content, step S47 is performed: the ith subtitle is deleted, and at the same time, other subtitles displayed after the ith subtitle are also deleted.

If the ith subtitle and the i-1 th subtitle express the same semantic content, step S48 is performed: the ith subtitle is reserved.

In step S49, it is determined whether i is equal to 2. Here, the initial value of i is n, where n is the number of the plurality of subtitles determined in step S403, and for example, the nth subtitle refers to one of the plurality of subtitles whose display time is the last. That is, in the exemplary embodiment of the present invention, the semantic judgment is performed in a reverse order according to the display time of each subtitle.

If i is not equal to 2, step S50 is performed: so that i=i-1, and returns to step S41.

If i is equal to 2, the semantic analysis of a plurality of subtitles is completed, and the reserved subtitles are the selected subtitles expressing a complete semantic.

It should be understood that the subtitle semantic filtering manner shown in fig. 3 is only an example, and the present invention is not limited thereto, and a plurality of subtitles may be analyzed by other manners to filter out subtitles expressing a complete semantic. For example, for the process shown in fig. 3, when the time interval is less than the preset time, the ith subtitle is reserved, and step S49 is directly executed, and when i is equal to 2, the judgment of the time interval of the multiple subtitles is completed, and at this time, the reserved subtitles are subjected to semantic analysis to screen out the subtitles expressing a complete semantic.

Returning to fig. 2, in step S405, at least one key frame image corresponding to the screened subtitle is generated as a screenshot in a stitching manner.

For example, a display period corresponding to the selected caption may be determined, and the display period may refer to a period from a start time of a caption whose display time is the forefront among the selected captions to an end time of a caption whose display time is the rearmost among the selected captions. Searching at least one key frame image in the display time period from the video as at least one key frame image corresponding to the screened caption, and splicing the searched at least one key frame image to generate a screenshot.

In one case, the at least one key frame image comprises one key frame image.

In this case, the screenshot may be generated in the form of subtitle superimposition on the basis of one key frame image.

Fig. 4 illustrates an exemplary diagram for generating a screenshot in the form of a subtitle superimposition according to an exemplary embodiment of the present invention.

Taking fig. 4 as an example, along the side of the one key frame image where the subtitles are displayed, the multiple selected subtitles expressing a complete semantic meaning may be sequentially superimposed according to the display time sequence, where the display areas of the multiple subtitles do not overlap. Taking the image formed by the one key frame image and the overlapped multiple subtitles as a screenshot. At this time, the screenshot includes both a key frame image capable of clearly reflecting the video content and a plurality of subtitles expressing a complete semantic meaning in the scene.

In another case, the at least one key frame image includes a plurality of key frame images.

In this case, multiple key frame images may be spliced chronologically to generate a screenshot (as shown in fig. 5). As an example, the stitching style may be selected according to the semantic style of the plurality of subtitles.

In addition, for the case where a plurality of key frame images are included, the above-described form of subtitle superimposition may also be employed to generate a screenshot.

In this case, a representative key frame image may be selected from the plurality of key frame images, and the screenshot may be generated in the form of subtitle superimposition on the basis of the representative key frame image.

As an example, a representative key frame image may be selected from a plurality of key frame images by: and selecting one key frame image with the time value closest to the middle time of the time period from the trigger screen capturing time to the cut-off time from the plurality of key frame images as the representative key frame image. Alternatively, one key frame image with the smallest time value is selected from the plurality of key frame images as the representative key frame image.

The above-described screenshot process for a video class display interface is described below by way of several embodiments.

Fig. 6 to 8 illustrate first application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that the user is currently watching a television set "all trendy", when a predetermined input for triggering screen capture is received, the video play screen is the screen shown in fig. 6, and by tracking the eye gaze position, it is determined that the focus position of the eye gaze in the video at this time is the region where the subtitle is located in the video (for example, the box selection region shown in fig. 6).

Assume that five subtitles are found in total, which express a complete semantic meaning, as follows:

00：00：02，000—>00：00：6，999

Su Mingyu you are girls >

00：00：08，000—>00：00：13，999

how you can follow you two go-bishes >

00：00：14，045—>00：00：18，678

we are responsible only for you keeping you to eighteen years old

00：00：20，045—>00：00：24，678

you will also graft people later >

00：00：26，066—>00：00：30，678

we have also not required you to support for the old

And searching the key frame images in the display time periods (00:00:02, 000 to 00:00:30, 678) corresponding to the five subtitles from the video.

Assuming that only one key frame image is searched (as shown in fig. 7), the five selected subtitles are superimposed on the basis of the key frame image to obtain a screenshot (as shown in fig. 8).

Fig. 9 and 10 illustrate second application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that a video picture currently being viewed by the user is a picture shown in fig. 9 when a predetermined input for triggering screen capturing is received, by tracking the eye gaze position, it is determined that the focus position of the eye gaze in the video at this time is the region where the subtitle is located in the video.

Assume that two captions are selected that represent a complete semantic meaning, as follows:

00：10：02，000—>00：10：12，999

although we smile each other to say "review" 1

00：10：14，000—>00：10：22，999

but we all know that the separation of the tripe and the min is Yongqi 2-

Here, the video may be searched for a key frame image in a display period (00:10:02, 000 to 00:10:22, 999) corresponding to the two subtitles. In addition, for each of the selected subtitles, a key frame image corresponding to each of the subtitles may be searched, for example, a key frame image at a display time of the subtitle may be searched from the video as a key frame image corresponding to the subtitle. Here, the method shown in step S401 of fig. 2 may be used to determine the key frame image corresponding to any subtitle, and the disclosure of this part will not be repeated.

Assuming that the two subtitles have corresponding key frame images, respectively, in this case, the two key frame images may be stitched to generate a screenshot (as shown in fig. 10).

Fig. 11 and 12 illustrate third application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that a video picture currently being viewed by the user is a picture shown in fig. 11 when a predetermined input for triggering screen capturing is received, by tracking the eye gaze position, it is determined that the focus position of the eye gaze in the video at this time is the region where the subtitle is located in the video.

Assuming that four subtitles are in total, which are screened out and express a complete semantic meaning, and each subtitle has a corresponding key frame image, the four key frame images can be mosaiced to generate a screenshot (as shown in fig. 12).

If there are two or more subtitles corresponding to the same keyframe image, the screenshot may also be generated in the manner shown in this example, except that the same keyframe image exists in the screenshot, i.e., the subtitles are different but the pictures are the same.

It should be understood that the subtitle superimposition manner and the image stitching manner shown in the above examples are only examples, and the present invention is not limited thereto, and that other manners may be adopted to generate the screenshot.

In the second case, the focus position of the eye gaze in the video is determined not to be the region where the subtitle is located in the video, based on the eye gaze position information.

In this case, the eye gaze of the user may be considered to be directed to an object in the video, at which point the eye gaze may be determined by identifying the location of interest of the eye gaze in the video, and the screenshot may be generated from at least one key frame image in the video that includes the object of interest.

A procedure of generating a screenshot based on the current display interface of the video class and the eye gaze position information when the eye gaze position of interest in the video is not the region where the subtitle is located in the video will be described below with reference to fig. 13.

Fig. 13 shows a second exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention.

Referring to fig. 13, in step S411, an object corresponding to a focus position of an eye line is identified.

Here, an object may refer to a person or an item in a video. For example, various image recognition processing methods may be utilized to recognize a video to determine an object corresponding to a position of interest of an eye gaze.

In step S412, at least one key frame image is acquired for a predetermined period of time from the trigger screen capture time.

For example, at least one key frame image may be searched from the video for which the time value is within a predetermined period of time from the moment of triggering the screen capture. Here, the time length of the predetermined time period may be set according to the need, which is not limited by the present invention.

In step S413, candidate key frame images including the identified object are searched for from the acquired at least one key frame image.

For example, at least one key frame image may be identified using various image recognition processing methods to determine whether the identified object is contained in the at least one key frame image, the key frame image containing the identified object being determined as a candidate key frame image. Here, the number of the searched candidate key frame images may be greater than or equal to 1.

In step S414, the key frame image having high similarity is removed from the searched candidate key frame images.

For example, the searched candidate key frame images can be subjected to de-duplication processing, the key frame images with high similarity are removed, and the key frame images of the identified object at different angles or under different scenes are kept as much as possible.

In step S415, the key frame images remaining in the candidate key frame images are spliced to generate a screenshot.

For example, the retained key frame images may be stitched to generate a screenshot, various stitching manners may be used to generate a screenshot, and as an example, stitching may be performed according to a frame style, which is not limited in the present invention.

Here, since the key frame images used for generating the screenshot are all clear images, the definition of the obtained screenshot can be effectively ensured.

Fig. 14 and 15 illustrate fourth application example diagrams of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that a video screen currently being viewed by a user is a screen shown in fig. 14 upon receiving a predetermined input for triggering screen capturing, by tracking the eye gaze position, it is determined that the focus position of the eye gaze in the video at this time is not the region where the subtitle is located in the video, and by identifying the focus position of the eye gaze, it is determined that the object of interest is a person in the video (for example, a box selection region shown in fig. 14).

And acquiring at least one key frame image in a preset time period from the moment of triggering screen capturing, screening and de-duplicating the at least one key frame image, and splicing the reserved key frame images containing the angles and the expressions of the person to generate a screenshot.

It should be understood that, in addition to the above-described manner of generating a screenshot by tracking the eye gaze position, a selection operation for determining an object input by the user may be received in the screenshot state, the object may be determined in response to the selection operation, and the screenshot may be obtained by performing the above-described steps S412 to S415. As an example, the selecting operation may include performing a circle selection operation on the current display interface to select the object.

In the second embodiment, the process of generating the screenshot is described taking the type of the current display interface as a text class as an example. Here, the text-based display interface includes text information therein. By way of example, the text-based display interface may include, but is not limited to, a chat interface or a teletext browsing interface of a communication application.

The process of generating a screenshot based on the current display interface of the text class and eye gaze position information is described below with reference to fig. 16.

Fig. 16 shows a third exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention.

Referring to fig. 16, in step S51, the moving direction of the eye gaze is determined based on the eye gaze position information.

Here, the method for determining the moving direction of the eye gaze according to the eye gaze position information is common knowledge in the art, and the disclosure of this part is not repeated herein.

In step S52, the display interface is switched according to the direction of movement of the eye gaze, and an image of the display interface corresponding to each switching is acquired.

If the moving direction of the eye sight line is consistent with the switching mode of the display interface, the display interface can be switched according to the moving direction of the eye sight line.

For example, in the case that the switching mode of the display interface is left-right page turning, if the moving direction of the eye sight line is left-moving or right-moving, the electronic terminal is controlled to be switched left or right from the current display interface. And if the switching mode of the display interface is page turning up and down, if the moving direction of the eye sight is upward movement or downward movement, controlling the current display interface to scroll up or down.

For example, the image of the display interface corresponding to each switching time may be directly captured during each switching time, in addition, the screen recording function may be started while the display interface is switched according to the moving direction of the eye line, and the key frame image may be extracted from the recorded video, so as to obtain the image of the display interface corresponding to each switching time.

In step S53, when it is determined that the switching of the display interfaces is completed, the images of the respective display interfaces are formed into a screenshot in the display order.

For example, the end of the display interface switching may be determined when one of the following conditions is satisfied.

The focus position of the eye sight line is unchanged within a preset time; carrying out semantic analysis on the text information in the corresponding display interface when switching each time, and determining that the text information expressing a complete semantic is displayed; detecting the end of a paragraph of the text information; detecting that the interval distance between two adjacent paragraphs in the display interface is larger than a preset distance value; it is detected that a switch has been made to the final interface.

Here, for the current display interface of the text class, if the current display interface can be switched, the screenshot can be performed according to the moving direction of the eye line by the method shown in fig. 16, and if the current display interface cannot be switched, the image of the current display interface is directly intercepted as the final screenshot.

Fig. 17 shows a fifth application example diagram of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that the current display interface is a product introduction interface (such as the leftmost image in fig. 17) with respect to the intelligent laser projector for samsung in the screen capturing state, the type of the current display interface is determined to be a text class by various means. Assuming that the current display interface is turned left and right, when it is determined that the movement direction of the eye gaze of the user is moving rightward, the electronic terminal is controlled to switch rightward from the current display interface, and images (such as middle and rightmost images in fig. 17) of the corresponding display interface at each switching are acquired.

By performing semantic analysis on the text information in the corresponding display interface at each switching, it is determined that the text information expressing a complete semantic is displayed, for example, by determining that product introduction of the three-star intelligent laser projector through the three display interfaces shown in fig. 17 is displayed through semantic analysis, at this time, the acquired images of the corresponding display interfaces at each switching are formed into a screenshot according to the display sequence (as shown in fig. 17).

In the third embodiment, the process of generating the screenshot is described taking the type of the current display interface as a control class as an example. Here, the control class display interface includes controls therein. As an example, the control may include an icon or menu item on the display interface.

The process of generating a screenshot based on the current display interface of the control class and eye gaze location information is described below with reference to fig. 18.

Fig. 18 shows a fourth exemplary flowchart of steps for generating a screenshot based on a determined type of current display interface and eye gaze location information, according to an exemplary embodiment of the present invention.

Referring to fig. 18, in step S61, the electronic terminal is controlled to perform at least one operation step according to eyeball line-of-sight position information. Here, the display interface corresponding to any one of the at least one operation step includes a control, and the any one operation step is executed by operating the control according to the eye gaze position information.

For example, the process of controlling the electronic terminal to execute any operation step is: when the focus position of the eyeball line of sight in the display interface corresponding to any operation step is determined to be the area where the control is located in the display interface according to the eyeball line of sight position information, the control included in the display interface is automatically operated so as to automatically execute any operation step.

In step S62, an image of a display interface corresponding to the time when the electronic terminal performs each operation step is acquired.

For example, the image of the corresponding display interface may be directly intercepted when the electronic terminal performs each operation step. In a preferred embodiment, the image of the display interface corresponding to the electronic terminal when executing any operation step may also be obtained by, for example, adding a mark at the position where the control is located on the obtained image of the display interface corresponding to any operation step, and taking the image after adding the mark as the image of the display interface corresponding to any operation step.

In step S63, a screenshot is formed of images of the display interface corresponding to at least one operation step in the execution order of the operation steps.

Fig. 19 shows a sixth application example diagram of a screenshot method in an electronic terminal according to an exemplary embodiment of the present invention.

In this example, assuming that the current display interface is the desktop of the electronic terminal (e.g., the leftmost image in fig. 19) in the screen capturing state, the type of the current display interface is determined to be a control class in various ways. When the focus position of the eye sight in the display interface is determined to be the area where the control (namely, the 'set' icon) is located in the display interface according to the eye sight position information, the control is automatically operated, and the electronic terminal is controlled to enter the set interface (such as a second image on the left side in fig. 19). When the focus position of the eye sight in the setting interface is determined to be the area where the control (i.e. the "display" menu item) is located in the setting interface according to the eye sight position information, the control is automatically operated, and the electronic terminal is controlled to enter the display setting interface (such as the second image on the right side in fig. 19). When the focus position of the eye sight in the display setting interface is determined to be the area where the control (namely, the brightness menu item) is located in the display setting interface according to the eye sight position information, the control is automatically operated, and the electronic terminal is controlled to enter the brightness adjusting interface (such as the rightmost image in fig. 19). And determining that the concerned position of the eyeball line of sight in the brightness adjustment interface is the area where the control used for manually adjusting the brightness in the brightness adjustment interface is located according to the eyeball line of sight position information, wherein the mark can be added at the position where the control is located on the image of each interface because the brightness adjustment interface is a final interface, and a screenshot is formed according to the execution sequence of the operation steps (as shown in fig. 19).

In a preferred embodiment, the mosaic style of the generated screenshot may also be adjusted. For example, a user selection operation (e.g., a long press operation) of the generated screenshot may be received, at least one option is presented in response to the selection operation, each option corresponds to a tile style, and the tile style of the generated screenshot is changed according to the user selection of the at least one option, i.e., the tile style of the generated screenshot is adjusted to the tile style corresponding to the selected option.

In addition, the user selected jigsaw style can be recorded, and the recorded user selected jigsaw style is selected to generate the screenshot when the screenshot is generated next time, so that the personalized requirements of the user are met.

As shown in fig. 20, an electronic terminal according to an exemplary embodiment of the present invention includes: an input interface 10, a processor 20 and an eye tracking unit 30.

Specifically, the input interface 10 receives a predetermined input for triggering a screen capture.

As an example, the predetermined input may include, but is not limited to, at least one of: input of at least one physical key, gesture input, touch input, voice input.

The processor 20 controls the electronic terminal to enter a screen capturing state in response to the received predetermined input, and determines the type of the current display interface in the screen capturing state.

For example, the processor 20 may determine the type of the current display interface based on the type of application to which the current display interface belongs. Alternatively, the processor 20 may also determine the type of the current display interface based on the content layout of the current display interface. Preferably, the processor 20 can determine the type of the current display interface more accurately by combining the two judging methods.

The eye tracking unit 30 tracks the eye gaze position and provides eye gaze position information.

Here, the eye tracking unit 30 tracks the eye gaze position in the screen capturing state.

In one example, the eye tracking unit 30 may be a front camera of an electronic terminal.

For example, in the screen capturing state, the processor 20 may control the front camera of the electronic terminal to be turned on and control the front camera to capture an image of the eyes of the user, and the processor 20 determines the eye gaze position information by recognizing the captured image of the eyes of the user.

In another example, the eye tracking unit 30 may be an eye sensor of an electronic terminal.

For example, in the screen capturing state, the processor 20 may control an eyeball sensor of the electronic terminal to be turned on, and the eyeball sensor may track the eyeball line-of-sight position and provide eyeball line-of-sight position information.

In yet another example, the eye tracking unit 30 may include an infrared light source and an infrared camera.

In this case, the infrared light source may emit infrared light toward the eyes of the user, and the infrared camera receives the infrared light reflected by the pupils of the eyes of the user and provides eye gaze position information.

The processor 20 generates a screenshot based on the determined type of current display interface and eye gaze location information provided by the eye tracking unit 30.

By way of example, in an exemplary embodiment of the present invention, the types of the current display interface may include a video class, a text class, and a control class. The process of generating the screenshot is described below for different types of current display interfaces, respectively.

In this case, the process of generating a screenshot based on the determined type of the current display interface and eye gaze location information may include: and according to the eyeball sight position information, determining the focus position of the eyeball sight in the video, and generating a screenshot based on at least one key frame image which comprises the content corresponding to the focus position in the video.

For example, the processor 20 may generate the screenshot differently depending on the location of interest of the eye gaze in the video.

In this case, the processor 20 may generate a screenshot through the following process.

Acquiring a key frame image at the moment of triggering screen capturing; determining a time value corresponding to the last key frame image of the same scene as the key frame image at the moment of triggering screen capturing, and taking the determined time value as a cut-off moment; extracting a plurality of subtitles from a subtitle file of a video from a trigger screen capturing moment to the cut-off moment; selecting a caption expressing a complete semantic meaning from a plurality of captions; and generating a screenshot in a splicing mode by at least one key frame image corresponding to the screened caption.

Preferably, the process of screening subtitles expressing a complete semantic meaning from a plurality of subtitles may comprise: the following processing is performed for two subtitles adjacent to each other in the display time of each subtitle in reverse order: calculating the time interval between two adjacent subtitles in display time, deleting one subtitle in the rear display time and other subtitles displayed after one subtitle in the rear display time if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is larger than or equal to preset time, and reserving the two subtitles if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is smaller than the preset time; and screening out the subtitles expressing a complete semantic by carrying out semantic analysis on all the reserved subtitles.

In one case, the at least one key frame image comprises one key frame image.

In this case, the process of generating the screenshot in a stitched manner of at least one key frame image corresponding to the screened subtitle may include: the screenshot is generated in the form of a subtitle overlay on the basis of one key frame image.

In this case, the process of generating the screenshot in a stitched manner of at least one key frame image corresponding to the screened subtitle may include: the screenshot is generated by splicing a plurality of key frame images in time sequence, or alternatively, a representative key frame image is selected from the plurality of key frame images, and the screenshot is generated in the form of subtitle superposition on the basis of the representative key frame image.

For example, the process of selecting a representative key frame image from a plurality of key frame images may include: and selecting one key frame image with the closest time value to the middle time of the time period from the triggering screen capturing time to the cut-off time from the plurality of key frame images as the representative key frame image, or selecting one key frame image with the smallest time value from the plurality of key frame images as the representative key frame image.

In this case, the processor 20 may generate the screenshot by the following process.

Identifying an object corresponding to a focus position of the eye sight; acquiring at least one key frame image in a preset time period from the moment of triggering screen capturing; searching candidate key frame images containing the object from the acquired at least one key frame image; removing key frame images with high similarity from the searched candidate key frame images; and generating a screenshot in a splicing mode by using the key frame images which are reserved in the candidate key frame images.

Preferably, the electronic terminal according to an exemplary embodiment of the present invention may further include a display screen on which the current display interface may be displayed.

In this case, the processor 20 may generate the screenshot by: determining the moving direction of the eye sight according to the eye sight position information; controlling a display screen to switch a display interface according to the moving direction of the eye sight, and acquiring an image of the display interface corresponding to each switching; and when the display interface switching is finished, forming screenshot of the images of the display interfaces according to the display sequence.

In this case, the processor 20 may generate the screenshot by: according to the eyeball line-of-sight position information, controlling the electronic terminal to execute at least one operation step, wherein a control is included on a display interface corresponding to any operation step in the at least one operation step, and any operation step is executed by operating the control according to the eyeball line-of-sight position information; acquiring an image of a display interface corresponding to each operation step executed by the electronic terminal; and forming a screenshot of the images of the display interface corresponding to at least one operation step according to the execution sequence of the operation steps.

In a preferred embodiment, the process of obtaining the image of the display interface corresponding to the electronic terminal when executing any operation step may include: adding an identifier at the position of the control on the acquired image of the display interface corresponding to any operation step; and taking the image with the added identifier as the image of the display interface corresponding to any operation step.

There is also provided, in accordance with an exemplary embodiment of the present invention, a computer-readable storage medium storing a computer program. The computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the screenshot method in the electronic terminal described above. The computer readable recording medium is any data storage device that can store data which can be read out by a computer system. Examples of the computer-readable recording medium include: read-only memory, random access memory, compact disc read-only, magnetic tape, floppy disk, optical data storage device, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths).

By adopting the screenshot method in the electronic terminal and the electronic terminal, which are provided by the invention, accurate long screenshot is obtained through semantic analysis or image recognition, so that intelligent screenshot is realized.

The foregoing description of exemplary embodiments of the invention has been presented only to be understood as illustrative and not exhaustive, and the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention shall be subject to the scope of the claims.

Claims

1. A method for capturing a picture in an electronic terminal, comprising:

receiving a preset input for triggering screen capturing, and entering a screen capturing state;

determining the type of the current display interface in the screen capturing state;

tracking eye gaze position and determining eye gaze position information;

a screenshot is generated based on the determined type of current display interface and eye gaze location information,

wherein the type of the current display interface comprises a video class, the video class display interface comprises video,

wherein the step of generating a screenshot based on the determined type of the current display interface and eye gaze location information comprises:

according to the eye sight position information, determining the concerned position of the eye sight in the video;

a screenshot is generated based on at least one key frame image in the video that includes content corresponding to the location of interest,

the video is formed by a plurality of frame images, wherein the plurality of frame images comprise a plurality of key frame images and a plurality of intermediate frame images, any key frame image is a complete picture, and any intermediate frame image comprises image change information relative to a corresponding key frame.

2. The screenshot method of claim 1, wherein the predetermined input includes at least one of: input of at least one physical key, gesture input, touch input, voice input.

3. The screenshot method of claim 1, wherein the step of determining the type of currently displayed interface comprises:

and determining the type of the current display interface according to the type of the application to which the current display interface belongs and/or the content layout of the current display interface.

4. The method of screenshot according to claim 1, wherein generating a screenshot based on at least one key frame image in a video that includes content corresponding to the location of interest comprises:

if the focus position of the eye sight in the video is determined to be the region where the caption in the video is located according to the eye sight position information, acquiring a key frame image at the moment of triggering screen capturing;

determining a time value corresponding to the last key frame image of the same scene as the key frame image at the moment of triggering screen capturing, and taking the determined time value as a cut-off moment;

extracting a plurality of subtitles from the subtitle file of the video from the trigger screen capturing moment to the cut-off moment;

screening out captions expressing a complete semantic meaning from the plurality of captions;

and generating a screenshot in a splicing mode by at least one key frame image corresponding to the screened caption.

5. The method of claim 4, wherein the step of screening subtitles from the plurality of subtitles for a complete semantic meaning comprises:

According to the display time of each caption, the following steps are executed for two adjacent captions with the display time in a reverse order: calculating the time interval between two adjacent subtitles in display time, deleting one subtitle in the rear display time and other subtitles displayed after one subtitle in the rear display time if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is larger than or equal to preset time, and reserving the two subtitles if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is smaller than the preset time;

and screening out the subtitles expressing a complete semantic by carrying out semantic analysis on all the reserved subtitles.

6. The method of screenshot according to claim 4, wherein the at least one key frame image comprises a key frame image,

the step of generating a screenshot in a splicing manner by at least one key frame image corresponding to the screened caption comprises the following steps: the screenshot is generated in the form of a subtitle superimposition on the basis of the one key frame image.

7. The method of screenshot according to claim 4, wherein the at least one key frame image comprises a plurality of key frame images,

The step of generating a screenshot in a splicing manner by at least one key frame image corresponding to the screened caption comprises the following steps: the plurality of key frame images are chronologically stitched to generate a screenshot,

or selecting a representative key frame image from the plurality of key frame images, and generating a screenshot in the form of subtitle superposition on the basis of the representative key frame image.

8. The method of capturing images of claim 7, wherein selecting a representative key frame image from the plurality of key frame images comprises:

selecting one key frame image with the time value closest to the middle time of the time period from the trigger screen capturing time to the cut-off time from the plurality of key frame images as a representative key frame image,

or selecting one key frame image with the smallest time value from the plurality of key frame images as the representative key frame image.

9. The screenshot method of claim 4, wherein the screenshot method further comprises:

if the focus position of the eye sight in the video is not the area where the subtitle is located in the video according to the eye sight position information, identifying an object corresponding to the focus position of the eye sight;

Acquiring at least one key frame image in a preset time period from the moment of triggering screen capturing;

searching candidate key frame images containing the object from the at least one obtained key frame image;

removing key frame images with high similarity from the searched candidate key frame images;

and generating a screenshot in a splicing mode by using the key frame images which are reserved in the candidate key frame images.

10. The method of claim 1, wherein the type of the currently displayed interface includes a text class, the text class display interface includes text information,

determining the moving direction of the eye sight according to the eye sight position information;

switching the display interface according to the moving direction of the eye sight, and acquiring an image of the display interface corresponding to each switching;

and when the display interface switching is finished, forming screenshot of the images of the display interfaces according to the display sequence.

11. The method of claim 1, wherein the type of the current display interface comprises a control class, the control class display interface comprises a control,

according to the eyeball line-of-sight position information, controlling the electronic terminal to execute at least one operation step, wherein a control is included on a display interface corresponding to any operation step in the at least one operation step, and any operation step is executed by operating the control according to the eyeball line-of-sight position information;

acquiring an image of a display interface corresponding to each operation step executed by the electronic terminal;

and forming a screenshot of the images of the display interface corresponding to the at least one operation step according to the execution sequence of the operation steps.

12. The method of claim 11, wherein the step of obtaining an image of a corresponding display interface of the electronic terminal when performing each operation step comprises:

adding an identifier at the position of the control on the acquired image of the display interface corresponding to any operation step;

and taking the image added with the identifier as an image of a display interface corresponding to any operation step.

13. An electronic terminal is characterized by comprising an input interface, an eyeball tracking unit and a processor,

Wherein the input interface receives a predetermined input for triggering a screen capture,

the processor is configured to: controlling the electronic terminal to enter a screen capturing state in response to the received preset input, determining the type of the current display interface in the screen capturing state,

the eye tracking unit tracks the eye gaze position and provides eye gaze position information,

wherein the processor is further configured to: a screenshot is generated based on the determined type of current display interface and eye gaze location information,

wherein the process of generating a screenshot based on the determined type of the current display interface and eye gaze location information comprises:

14. The electronic terminal of claim 13, wherein the predetermined input comprises at least one of: input of at least one physical key, gesture input, touch input, voice input.

15. The electronic terminal of claim 13, wherein determining a type of currently displayed interface comprises:

16. The electronic terminal of claim 13, wherein the eye tracking unit comprises:

an infrared light source that emits infrared light toward the eyes of a user;

the infrared camera receives infrared light reflected by the pupil of the eyeball of the user and provides eyeball line-of-sight position information.

17. The electronic terminal of claim 13, wherein generating a screenshot based on at least one keyframe image in a video that includes content corresponding to the location of interest comprises:

18. The electronic terminal of claim 17, wherein the process of screening subtitles from the plurality of subtitles to represent a complete semantic meaning comprises:

the following processing is performed for two subtitles adjacent to each other in the display time of each subtitle in reverse order: calculating the time interval between two adjacent subtitles in display time, deleting one subtitle in the rear display time and other subtitles displayed after one subtitle in the rear display time if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is larger than or equal to preset time, and reserving the two subtitles if the time interval between one subtitle in the rear display time and one subtitle in the front display time in the two subtitles is smaller than the preset time;

19. The electronic terminal of claim 17, wherein the at least one key frame image comprises a key frame image,

the processing for generating the screenshot by a splicing mode from at least one key frame image corresponding to the screened caption comprises the following steps: the screenshot is generated in the form of a subtitle superimposition on the basis of the one key frame image.

20. The electronic terminal of claim 17, wherein the at least one key frame image comprises a plurality of key frame images,

the processing for generating the screenshot by a splicing mode from at least one key frame image corresponding to the screened caption comprises the following steps: the plurality of key frame images are chronologically stitched to generate a screenshot,

21. The electronic terminal of claim 20, wherein the process of selecting a representative key frame image from the plurality of key frame images comprises:

22. The electronic terminal of claim 17, wherein the processor is further configured to:

23. The electronic terminal of claim 13, wherein the type of the currently displayed interface comprises a text class, the text class display interface comprises text information,

Wherein the electronic terminal further comprises a display screen, and the process of generating the screenshot based on the determined type of the current display interface and eyeball line-of-sight position information comprises:

controlling a display screen to switch a display interface according to the moving direction of the eye sight, and acquiring an image of the display interface corresponding to each switching;

24. The electronic terminal of claim 13, wherein the type of the current display interface comprises a control class, the control class display interface comprising controls,

25. The electronic terminal of claim 24, wherein the process of obtaining an image of a corresponding display interface of the electronic terminal when performing each of the operating steps comprises:

26. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the screenshot method in an electronic terminal according to any one of claims 1 to 12.