CN115396652A

CN115396652A - Intelligent image capturing method, system, device and storage medium

Info

Publication number: CN115396652A
Application number: CN202211031001.3A
Authority: CN
Inventors: 李志勇
Original assignee: Hefei Zhiqiyong Technology Development Co ltd
Current assignee: Hefei Zhiqiyong Technology Development Co ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-11-25

Abstract

The embodiment of the specification provides an intelligent image capturing method, an intelligent image capturing system, an intelligent image capturing device and a storage medium, wherein the method comprises the steps of acquiring a fixation point sequence of a target user in a target time period, wherein the fixation point sequence comprises fixation points of the target user at a plurality of time points in the target time period; determining whether a picture needs to be captured based on the gaze point sequence; when a picture needs to be captured, determining a target mode for capturing the picture; and performing picture capture based on the target mode.

Description

Intelligent image capturing method, system, device and storage medium

Technical Field

The present disclosure relates to the field of image capture, and more particularly, to an intelligent image capture method, system, apparatus, and storage medium.

Background

When image acquisition is performed on existing augmented reality glasses, intelligent glasses and the like, active triggering by a user is generally required, namely, corresponding image acquisition operation can be executed only when a photographing or video recording instruction input by the user is received. When the scene of the user changes rapidly or the user is inconvenient to actively trigger the glasses to acquire images, the user cannot send an image acquisition instruction, so that the glasses cannot capture the content of interest of the user in time.

Accordingly, it is desirable to provide an intelligent image capturing method and system for improving picture capturing efficiency and improving user experience.

Disclosure of Invention

One of the embodiments of the present specification provides an intelligent image capturing method, including: acquiring a fixation point sequence of a target user in a target time period, wherein the fixation point sequence comprises fixation points of the target user at a plurality of time points in the target time period; determining whether a picture needs to be captured based on the gaze point sequence; when a picture needs to be captured, determining a target mode for capturing the picture; and performing picture capture based on the target mode.

One of the embodiments of the present specification provides an intelligent image capturing system, including: the acquisition module is used for acquiring a fixation point sequence of a target user in a target time period, wherein the fixation point sequence comprises fixation points of the target user at a plurality of time points in the target time period; a first determining module, configured to determine whether a picture needs to be captured based on the gaze point sequence; the second determining module is used for determining a target mode of capturing the picture when the picture needs to be captured; and the processing module is used for carrying out picture capture based on the target mode.

One of the embodiments of the present specification provides an intelligent image capturing apparatus, comprising at least one processor and at least one memory, the at least one memory storing computer instructions; at least one processor is configured to execute at least some of the computer instructions to implement the intelligent image capture method as in any of the above embodiments.

One of the embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes the intelligent image capturing method according to any one of the embodiments.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of an intelligent image capture system according to some embodiments of the present description;

FIG. 2 is an exemplary block diagram of an intelligent image capture system according to some embodiments of the present description;

FIG. 3 is an exemplary flow diagram of a method of intelligent image capture according to some embodiments of the present description;

FIG. 4 is an exemplary flow diagram illustrating a method for determining a target manner in which to capture a visual according to some embodiments of the present description;

FIG. 5A is an exemplary flow diagram illustrating a determination of a target manner in which to capture a picture based on an ambient picture flatness value in accordance with some embodiments of the present description;

fig. 5B is an exemplary diagram illustrating tagging of a current picture and a reference image according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flowcharts are used in this specification to illustrate the operations performed by the system according to embodiments of the present specification. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to or removed from these processes.

FIG. 1 is a schematic diagram of an application scenario of an intelligent image capture system, shown in some embodiments herein. The intelligent image capture system 100 may perform intelligent image capture by implementing the methods and/or processes disclosed in this specification.

As shown in fig. 1, an application scenario 100 of the intelligent image capturing system may include a target terminal 110, a user 120, a processing device 130, a memory 140, a network 150, and a terminal 160. The intelligent image capture system may determine whether a picture needs to be captured based on a sequence of gaze points of the user 120 wearing the target terminal 110 for a target time period by implementing the methods and/or processes disclosed herein.

In some embodiments, the target terminal 110 may be communicatively coupled to the network 150 and the processing device 130. For example, the target terminal 110 may communicate the gaze point sequence information of the user 120 through the network 150. In some embodiments, the target terminal 110 may be communicatively coupled to other components of the intelligent image capture system via a network 150, for example, the target terminal 110 may be communicatively coupled to the memory 140 via the network 150. The target terminal 110 may include smart glasses, a wearable helmet, and the like. It is understood that the target terminal 110 may be referred to as a user terminal, an image capture terminal, or the like.

User 120 may be a requestor of intelligent image capture. In some embodiments, the user 120 may communicate with the target terminal 110 through the terminal device 160. For example, the user 120 may determine a targeted manner of capturing a visual via the terminal device 160 (e.g., a cell phone).

Processing device 130 may process data and/or information obtained from other devices or system components. Processing device 130 may execute program instructions based on such data, information, and/or processing results to perform one or more of the functions described herein. In some embodiments, the processing device 130 may include one or more sub-processing devices (e.g., a single-core processing device or a multi-core processing device). For example only, the processing device 130 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), the like, or any combination thereof.

Memory 140 may be a storage device for storing data, instructions, and/or any other information. In some embodiments, memory 140 may store data and/or information obtained from target terminal 110, user 120, or the like. For example, the memory 140 may store the target user's gaze point at a plurality of time points within the target time period, and the like. In some embodiments, the memory 140 may be located in the target terminal 110 or the processing device 130. In some embodiments, memory 140 may include mass storage, removable storage, and the like, or any combination thereof.

The network 150 may connect the components of the application scenario 100 and/or connect the system with external resource components. The network 150 enables communication between the various components and with other components outside the system to facilitate the exchange of data and/or information. In some embodiments, the network 150 may be any one or more of a wired network or a wireless network. For example, the network 120 may include a cable network, a fiber network, and the like, or any combination thereof. The network connection between the parts can be in one way or in multiple ways. In some embodiments, the network may be point-to-point, shared, centralized, etc. or a combination of topologies. In some embodiments, network 150 may include one or more network access points. In some embodiments, a sequence of gaze points, target patterns, etc. data may be communicated over the network 150.

The terminal 160 may be an executive body of the intelligent image capturing system provided by some embodiments of the present specification. Terminal 160 may refer to one or more terminal devices or software used by a user. In some embodiments, the terminal 160 may be a mobile device 160-1, a tablet computer 160-2, a laptop computer 160-3, the like, or any combination thereof. In some embodiments, terminal 160 may interact with other components in the intelligent image capture system over network 150. For example, the terminal 160 may receive a capture instruction issued by the user 120 to the target terminal 110, and forward the capture instruction to the target terminal 110.

FIG. 2 is an exemplary block diagram of an intelligent image capture system, shown in accordance with some embodiments of the present description.

As shown in fig. 2, in some embodiments, the intelligent image capture system 200 may include an acquisition module 210, a first determination module 220, a second determination module 230, a processing module 240. In some embodiments, one or more modules in the intelligent image capture system 200 may be connected to each other, either wirelessly or wired.

In some embodiments, the obtaining module 210 may be configured to obtain a gaze point sequence of the target user within the target time period, the gaze point sequence including gaze points of the target user at a plurality of time points within the target time period. For more on the sequence of gaze points, see fig. 3 and its related description.

In some embodiments, the first determination module 220 may be configured to determine whether a picture needs to be captured based on the gaze point sequence. In some embodiments, the first determining module 220 may be further configured to determine a gaze point plateau value of the target user within the target time period based on the gaze point sequence, the gaze point plateau value characterizing displacement amplitudes of the gaze point of the target user at a plurality of time points; when the stable value of the fixation point meets a first preset condition, determining that a picture needs to be captured; when the gazing point smoothness value does not satisfy the first preset condition, it is determined that a picture does not need to be captured. See fig. 3 and its associated description for further details regarding determining whether a picture needs to be captured.

In some embodiments, the second determination module 230 may be used to determine a target manner in which to capture a picture when it is desired to capture a picture. In some embodiments, the second determining module 230 may be further configured to obtain an environmental picture sequence corresponding to the gaze point sequence, where the environmental picture sequence includes environmental pictures of a gaze point range of the target user at a plurality of time points; determining an environmental picture steady value based on the environmental picture sequence, wherein the environmental picture steady value represents the variation amplitude of the environmental picture at a plurality of time points; based on the ambient picture flatness value, a target manner of capturing the picture is determined. The second determining module 230 may be further configured to obtain a reference image set of the environment picture, where the reference image set includes at least one labeled reference image; acquiring a current picture of a fixation point range of a target user; determining the similarity between each reference image in the reference image set and the current picture; determining a target reference image based on the similarity between each reference image and the current picture; determining a tendency factor of the current picture based on the tendency preset parameter, the similarity between the target reference image and the current picture and the label of the target reference image; and determining a target mode for capturing the picture based on the tendency factor and the ambient picture flatness value. For more on determining the target manner in which to capture a visual, reference may be made to fig. 4, 5A, and related description thereof.

In some embodiments, the processing module 240 may be used for frame capture based on a target approach. For more on-target-based picture capture, reference may be made to fig. 3 and its associated description.

It should be appreciated that the system and its modules illustrated in FIG. 2 may be implemented in a variety of ways.

It should be noted that the above description of the intelligent image capturing system and the modules thereof is merely for convenience of description and should not limit the present disclosure to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. In some embodiments, the obtaining module 210, the first determining module 220, the second determining module 230, and the processing module 240 disclosed in fig. 2 may be different modules in a system, or may be a module that implements the functions of two or more modules described above. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

FIG. 3 is an exemplary flow diagram of a method of intelligent image capture, shown in accordance with some embodiments of the present description. As shown in fig. 3, the process 300 includes the following steps. In some embodiments, flow 300 may be performed by a target terminal.

The target terminal may be an execution subject of the intelligent image capturing method. For example, the target terminal may be a wearable device such as virtual reality/augmented reality glasses, smart glasses, a smart helmet, and the like. The target terminal may include a sensor having an image capturing function. For example, the target terminal may include components such as a camera.

Step 310, a gaze point sequence of the target user in the target time period is obtained, where the gaze point sequence includes gaze points of the target object at multiple time points in the target time period.

The target user refers to a person who uses or wears the target terminal, for example, the target user may be a smart glasses debugging person, a smart glasses exhibition person, or the like.

The target time period refers to a certain time period in which the smart glasses are used, for example, the target time period may be third to seventh minutes in which the smart glasses are used, the length of the target time period is four minutes, and the like. In some embodiments, the length of the target time period may be preset. For example, the length of the target period may be set to four minutes in advance.

In some embodiments, the length of the target time period may be related to the current remaining capacity of the smart glasses. For example, the length of the target time period may be positively correlated with the current remaining capacity of the smart glasses, and the length of the target time period may be determined by a preset rule based on the current remaining capacity of the smart glasses.

The current remaining capacity of the target terminal refers to the remaining capacity of the target terminal in the memory at the current time, for example, the target terminal may include a random access memory therein, and the current remaining capacity may be the remaining capacity of the random access memory. The current remaining capacity may be 20KB, 300MB, 30GB, etc. The larger the current remaining capacity is, the longer the length of the target period may be.

The gaze point refers to the point at which the target user's pupils are aligned during visual perception. For example, if the point of sight line alignment with the object 1 is located on the desktop of the table, the target user with the point of regard 1 is the table; the point of sight at the object 2 is located on the surface of the book and the target user at the point of gaze 2 is the book. In some embodiments, the gaze point may be represented by two-dimensional coordinates, three-dimensional coordinates, or the like.

The gaze point sequence refers to a gaze track formed by a plurality of gaze points of a target user continuously in a period of time. In some embodiments, the sequence of gaze points comprises the gaze points of the target user at a plurality of time points within the target time period. For example, the target user is at 02: the fixation points of 00-02 are tables, dinner plates and forks; the target user is at 04: the fixation points of 00-04 are glass, red wine labels and the like.

In some embodiments, the target terminal may obtain a sequence of gaze points of the user and a time corresponding to the gaze point based on the image recognition positioning. For example, a sequence of gaze points of the user is acquired by techniques such as eye tracking, pupil capture, and the like. In some embodiments, the target terminal may acquire the gaze point once every certain period of time, and arrange the multiple pieces of time-gaze point information in a time axis to form a gaze point sequence. For example, the target terminal may acquire the gazing point once every 0.1 μ sec, and arrange all the gazing points acquired within the time period from 04 to 04 in chronological order to form a gazing point sequence from 04 to 10.

Step 320, based on the gazing point sequence, determining whether a picture needs to be captured.

Capturing a frame refers to the process of capturing all or part of the area from the smart glasses display as an image. Capturing the picture may include capturing a still picture and capturing a moving picture, for example, capturing the still picture may be screen capture, screenshot, an image may be obtained by capturing the still picture, capturing the moving picture may be screen recording, and the like, and a video may be obtained by capturing the moving picture.

In some embodiments, the target terminal may determine whether a picture needs to be captured based on the gaze point sequence through a machine learning model, a preset relationship, or the like.

In some embodiments, the target terminal may determine a gaze point plateau value for the target user over the target time period based on the sequence of gaze points. And the stable fixation value of the fixation point represents the displacement amplitude of the fixation point of the target user at a plurality of time points.

The gazing point smoothness value refers to a relevant numerical value reflecting the stability of the gazing point of the target user, for example, the gazing point smoothness value may be a numerical value between 0 and 1, such as 0.8. In some embodiments, the gaze point plateau value may be related to a dwell time of the gaze point, a displacement amplitude of the gaze point. For example, the longer the dwell time of the point of regard, the greater the point of regard smoothness value; the smaller the displacement amplitude of the fixation point is, the larger the fixation value of the fixation point is. The higher the fixation value of the gaze point is, the more stable the gaze point is, and the higher the possibility that the user gazes at a certain object for a long time is.

In some embodiments, a variance of the sequence of gaze points may be calculated based on the sequence of gaze points of the target user, determining a gaze point plateau value. For example, based on the eye tracking and the positioning manner of judging whether the head rotates or not, the state of the target user is judged, the gaze point sequence is obtained, each gaze point can be used as a three-dimensional coordinate vector, and the steady value of the gaze point is determined by calculating the variance of the gaze point vector sequence.

In some embodiments, when the gazing point smoothness value satisfies a first preset condition, determining that a picture needs to be captured; when the gazing point smoothness value does not satisfy the first preset condition, it is determined that a picture does not need to be captured.

The first preset condition refers to a preset condition that the captured picture needs to satisfy. In some embodiments, the first preset condition may be that the gaze point plateau value is greater than the gaze point plateau value threshold. For example, the first preset condition may be that the gaze point plateau is greater than 0.6. The first preset condition may be determined based on a manual setting.

In some embodiments, the size of the point of regard smoothness threshold is related to the current remaining capacity of the smart glasses. For example, when the current remaining capacity of the smart glasses is small, the greater the threshold value of the plateau value may be set in order to save capacity; as another example, when the current remaining capacity of the smart glasses is larger, the threshold value of the plateau value may be set smaller in order to increase the possibility of performing an action. Illustratively, the point-of-regard smoothness threshold = k/c. And k is a first preset parameter, and c is the current residual capacity of the intelligent glasses. The first preset parameter may be determined based on a manual setting.

Some embodiments of the present description determine, through a gaze point sequence, a gaze point stationary value of a target user within a target time period; whether pictures need to be captured or not is determined by judging whether the stable value of the fixation point meets a first preset condition or not, so that the operation of a target user on the intelligent glasses can be reduced, and the contents which are interested by the user can be captured effectively in time; in addition, image capture can be performed only when the user and the intelligent glasses are stable enough, so that the problem that fuzzy and invalid images are captured when the user moves violently, the intelligent glasses are not properly worn and the like is avoided.

Step 330, when the picture needs to be captured, determining the target mode of capturing the picture.

The target mode refers to a related mode of capturing a picture, and for example, the target mode may be a mode of taking an image, taking a video, or the like.

In some embodiments, the target mode may be determined based on manual settings, or determined by the target terminal based on preset relationship intelligent judgment.

In some embodiments, the target terminal may determine a target manner of capturing the picture based on the gaze point sequence. For example, the target manner of capturing the picture may be determined based on the change amplitude of the environmental picture corresponding to the gazing point sequence. See fig. 4 and its associated description for more details regarding the targeting mode.

And step 340, performing picture capture based on the target mode.

In some embodiments, the target terminal performs picture capture based on the selected target mode. For example, image and video capture is performed based on a camera of the target terminal.

Some embodiments of the present description determine whether a picture needs to be captured by acquiring a gaze point sequence of a target user within a target time period; and when the picture needs to be captured, determining a target mode for capturing the picture, and effectively capturing the content interested by the target user according to the watching point sequence of the target user, so that the picture capturing efficiency is improved, and the user experience is remarkably improved.

FIG. 4 is an exemplary flow diagram illustrating determining a targeted manner of capturing a visual according to some embodiments of the present description. As shown in fig. 4, the process 400 includes the following steps. In some embodiments, flow 400 may be performed by a target terminal.

Step 410, obtaining an environment picture sequence corresponding to the gazing point sequence, where the environment picture sequence includes environment pictures of the gazing point range of the target user at multiple time points.

The environment screen refers to a screen of the surrounding environment corresponding to the gaze point of the target user, for example, a screen that the current user sees through the smart glasses, a screen that the smart glasses can capture, and the like. The environment picture sequence refers to a picture trajectory of the surrounding environment corresponding to the gaze point sequence of the target user. The ambient picture sequence includes ambient pictures of a gaze point range of the target user at a plurality of time points. For example, the target user is at 02: the environment picture of the fixation point range of 00-04; the target user is at 04: the environment picture in the fixation point range of 00 to 06.

In some embodiments, the target terminal may be automatically located based on image recognition, and obtain an environmental picture sequence corresponding to the gaze point sequence and a time corresponding to the environmental picture. For example, the sequence of environmental frames is acquired by techniques such as eye tracking, pupil capture, and the like. The target terminal can acquire the environment picture once every a period of time, and arrange a plurality of time-environment picture information by a time axis to form an environment picture sequence. For example, the target terminal may acquire the environment pictures every 0.1 μ sec, and arrange all the environment pictures acquired in the time period of 06.

And step 420, determining an environment picture stable value based on the environment picture sequence, wherein the environment picture stable value represents the change amplitude of the environment picture at a plurality of time points.

The environmental picture steady value refers to a relevant value reflecting the degree of environmental picture stability, for example, the environmental picture steady value may be a value between 0 and 1, such as 0.8. In some embodiments, the ambient picture flatness value is related to a dwell time of the ambient picture, a magnitude of change of the ambient picture. For example, the longer the dwell time of the ambient picture, the greater the ambient picture plateau value; the smaller the change amplitude of the ambient picture is, the larger the ambient picture plateau value is. Wherein, the more stable the ambient picture value is, the more stable the ambient picture is.

In some embodiments, the ambient picture flatness value characterizes a magnitude of change of the ambient picture at a plurality of points in time.

In some embodiments, an ambient picture flatness value may be determined based on the ambient picture sequence. For example, the target terminal may determine the ambient picture flatness value based on the ambient picture sequence, calculating a variance of the ambient picture sequence. Wherein the variance of the sequence of computing environment pictures may be a variance of a sequence of image feature vectors corresponding to the sequence of computing environment pictures. The image feature vector may be a vector reflecting features of an environment picture, elements of the image feature vector may include coordinates of a gaze point, resolution of the image picture, a gray value, a pixel value of a pixel point, and the like, and a plurality of image feature vectors constitute an image feature vector sequence. In some embodiments, the image feature and the image feature vector may be obtained by passing the sequence of environmental pictures through an image feature extraction layer. For example, the image feature extraction layer may be a neural network model, the input of which may be a sequence of ambient pictures and the output may be a sequence of image feature vectors and corresponding image features. The image feature extraction layer can be obtained through historical environment picture training, wherein image feature vectors and image features corresponding to the historical environment pictures are training labels, and the labels can be determined through manual setting.

Step 430, determining a target mode for capturing the picture based on the ambient picture flatness value.

In some embodiments, when the environment picture flatness value satisfies the second preset condition, the target manner may be determined as capturing a picture by photographing an image; when the ambient-picture stationary value does not satisfy the second preset condition, the target manner may be determined as a video-captured picture by photographing. In some embodiments, the target mode may also be determined by the flow described in fig. 5A, with particular reference to fig. 5A and its associated description.

The second preset condition is a preset condition for determining the type of the target mode, for example, the second preset condition may be that the environmental frame smoothness value is greater than a value B, for example, the value B may be 0.8. The second preset condition may be determined based on a manual setting.

In some embodiments, the second preset condition comprises the ambient picture plateau value being greater than the ambient picture plateau value threshold.

The ambient picture stationarity threshold refers to a critical value for determining whether the ambient picture is stationary, for example, the ambient picture stationarity threshold may be set to 0.5.

In some embodiments, the size of the ambient picture flatness value threshold is related to the current remaining capacity of the target terminal. For example, when the current remaining capacity of the smart glasses is smaller, in order to increase the possibility of performing a photographing action, the environmental picture plateau value threshold may be set smaller; for another example, when the current remaining capacity of the smart glasses is larger, the ambient picture stabilization value threshold may be set larger in order to increase the possibility of performing a photographing action. It should be understood that, when the current remaining capacity of the smart glasses is smaller, the remaining capacity should store an image with higher information value (capable of providing more information), so that the threshold of the environmental frame smoothness value is increased to raise the threshold of storing the image; when the current residual capacity of the intelligent glasses is larger, the residual capacity can store more images to meet the diversity and the possibility of the images, and therefore, the threshold value of the stable value of the environmental picture is reduced to acquire more images. Illustratively, the ambient picture plateau value threshold = m c. Wherein m is a second preset parameter, and c is the current remaining capacity of the smart glasses. The second preset parameter may be determined based on a manual setting.

In some embodiments, the magnitude of the ambient picture stationarity threshold may also be related to a point of regard stationarity value. For example, when the gaze point plateau value is larger, the environmental picture plateau value threshold may be set smaller in order to increase the possibility of performing a photographing action. It will be appreciated that the more stable the user's gaze point, the more likely the user is to be gazed atThe still picture is more inclined to take a picture when viewed, and the smaller the threshold value of the ambient picture smoothness value can be set. Illustratively, the ambient picture plateau value threshold = k ₁ *c-k ₂ * f; wherein k is ₁ 、k ₂ The second preset parameter and the third preset parameter, c the current residual capacity, and f the fixation point stable value. The second preset parameter and the third preset parameter may be determined based on manual settings.

Some embodiments of the present description determine an environmental picture smoothness value by obtaining an environmental picture sequence corresponding to a gaze point sequence; and then, based on the stable value of the environment picture, determining a target picture capturing mode, combining the environment picture according to different fixation point conditions of a target user, and selecting a corresponding picture capturing mode more accurately, so that the picture capturing efficiency is improved, and the user experience is remarkably improved.

FIG. 5A is an exemplary flow diagram illustrating a determination of a target manner to capture a visual based on an ambient visual smoothness value according to some embodiments of the present description. As shown in fig. 5A, the process 500 includes the following steps. In some embodiments, flow 500 may be performed by a target terminal.

Step 510, a reference image set of the environment picture is obtained, wherein the reference image set comprises at least one reference image with a label.

The reference image refers to an image that can reflect a reference of the environmental picture, and the reference image set refers to a set of reference images. For example, the reference image may be an image containing "taihe hall," the reference image set may be a plurality of images containing "taihe hall," and so on.

In some embodiments, the reference image set may include at least one labeled reference image. The label may be information reflecting a certain characteristic of the reference image set. Illustratively, as shown in fig. 5B, the label 512 of the reference image 511 in the reference image set includes the time, weather, etc. when the reference image was captured. For example, the reference image is an image of "taihe hall", and the labels of the reference image may be time "10 am," weather "clear," and the like.

In some embodiments, the target terminal may acquire the reference image set based on a network. For example, popular tourist spots and photo spots can be obtained as the reference image set based on the network.

In some embodiments, the target terminal may determine the reference image set based on current location information. Wherein, different position information may correspond to different reference image sets. For example, the current location information is "Beijing university", and the corresponding reference image set may include "Beijing university canteen, beijing university dormitory, beijing university library, and the like". The current location information may be obtained based on a positioning device such as GPS, beidou, etc., or through a network.

In some embodiments, the target terminal may further determine the reference image set corresponding to the position according to the photographing or photographing frequency of different users at each position, and select an image or video with the highest frequency from the images or videos. For example, the intelligent glasses may update the database offline, record the photographing frequencies of different users in the more popular photographing places, and select the image corresponding to the place with the highest frequency as the reference image set corresponding to the position; for another example, the smart glasses may record the shooting frequencies of different users at the hot shooting places, select a video meeting a preset condition from the videos, and capture a first frame of the video as a reference image set corresponding to the position.

In some embodiments, the target terminal may determine the reference image set based on the target user's selection, sharing, storage, etc. of the captured images or videos. For example, when the target user further operates on one of the captured images/videos, such as uploading a network, sharing, collecting, storing, etc., the corresponding captured image may be added to the reference image set.

Step 520, obtain the current picture of the target user's gaze point range.

The current frame refers to a frame of the current gaze point range of the target user, for example, when the gaze point of the target user is a blackboard, the current frame of the gaze point range may be a platform, a blackboard, or the like. The target terminal may obtain a current picture of the gaze point range of the target user. In some embodiments, the target terminal may add a label to the current picture. For example, as shown in fig. 5B, a label 522 such as a time, weather, or the like is added to the current screen 521.

In step 530, the similarity between each reference image in the reference image set and the current picture is determined.

The similarity refers to the degree of similarity between the reference image and the current picture, for example, the similarity may be a value between 0 and 1, for example, a similarity of 0.1 represents less similarity, a similarity of 0.9 represents more similarity, and the like.

In some embodiments, the similarity between each reference image in the set of reference images and the current picture may be determined based on a machine learning model. For example, for each reference image in the reference image set, a similarity determination model is used, the reference image and the current picture are input, and the similarity between the reference image and the current picture is output. The similarity determination model can be obtained based on historical reference images and historical picture training. The training labels may be similarities between the historical reference images and the historical picture. The training labels may be determined manually.

Step 540, determining a target reference image based on the similarity between each reference image and the current picture.

The target reference image is a reference image confirmed based on similarity, for example, if the similarity between the picture a captured by the current smart glasses of the user and the reference image B in the reference image set is the highest, the reference image B is the target reference image. And when the similarity between the reference image and the current picture is greater than a similarity threshold value, taking the reference image as a target reference image, wherein the similarity threshold value can be determined based on manual setting.

In some embodiments, the target terminal may also determine a similarity between each reference image in the reference image set and the current picture. In some embodiments, the target terminal may also determine a similarity between the label of each reference image and the label of the current picture. Illustratively, similarity = similarity between the current picture and a reference image, similarity between the current picture and a time stamp, and similarity between the current picture and a weather stamp. For example, if the image similarity is 0.7, the time tag similarity is 0.8, and the weather tag similarity is 1, the similarity between the reference image and the current frame is 0.56.

In some embodiments of the present description, similarity matching between a tag of a reference image set and a tag of a current picture is introduced to implement similarity fast matching, and when a tag has a large difference, it can be determined that the reference image is not matched with the current picture only by performing similarity matching between the tags, but not by performing similarity matching between the reference image and the current picture, thereby saving computation power and improving computation efficiency.

Step 550, determining a tendency factor of the current picture based on the tendency preset parameter, the similarity between the target reference image and the current picture and the label of the target reference image.

The tendency preset parameter is a preset parameter reflecting the degree of tendency of the target mode. For example, the tendency preset parameter may be a preset value between 0 and 1, and the larger the tendency preset parameter is, the greater the tendency degree (i.e., the probability) of the target terminal selecting a certain target manner is. The tendency preset parameter may be determined based on manual settings.

The tendency factor refers to a tendency value for correcting the stable value of the environmental picture, wherein the tendency factor may be a value greater than 1 or less than 1.

In some embodiments, the tendency factor may be determined based on the label of the current picture corresponding to the target reference image. In some embodiments, the determination of the propensity factor may vary from tag to tag. For example, the similarity between the current picture a and the reference image B in the reference image set is the highest (e.g., 95%), the label corresponding to the reference image B is a video, and the trend factor =1-m × n, where m is a preset trend parameter (e.g., 20%), and n is the similarity between the current picture a and the reference image B (i.e., the aforementioned 95%). For another example, the similarity between the current picture a and the reference image B in the reference image set is the highest (e.g. 95%), and the label corresponding to the reference image B is a photo, and the trend factor =1+ m × n, where m is a preset trend parameter (e.g. 20%), and n is the similarity between the current picture a and the reference image B (i.e. the aforementioned 95%).

Step 560 determines a target manner of capturing the frame based on the trend factor and the ambient frame plateau value.

In some embodiments, when the tendency factor and the ambient picture smoothness value satisfy a second preset condition, determining that the target mode of capturing the picture is to capture an image; and when the tendency factor and the environmental picture smoothness value do not meet a second preset condition, determining that the target mode for capturing the picture is shooting video. For example, when the product of the tendency factor and the ambient picture stationarity value is greater than the ambient picture stationarity value threshold, determining that a picture is captured by capturing an image; and when the product of the tendency factor and the ambient picture steady value is less than the ambient picture steady value threshold, determining that the picture is captured by shooting the video.

Through the trend factor determination described in some embodiments of the present specification, the environmental picture stability value may be modified, and the target mode may be determined according to the label of the reference image with the highest similarity to the current picture, so that the mode of capturing the current picture may be determined based on the shooting habit, the common shooting mode, and the like, and the realizability of the mode of capturing the current picture and the picture capturing effect may be improved.

Some embodiments of the present description also provide an intelligent image capture device comprising at least one processor and at least one memory, the at least one memory storing computer instructions; at least one processor is configured to execute at least a portion of the computer instructions to implement the intelligent image capture method as in any of the above embodiments.

Some embodiments of the present description also provide a computer-readable storage medium storing computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes the intelligent image capturing method according to any one of the above embodiments.

It should be noted that the above description of the flow is for illustration and description only and does not limit the scope of the application of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are still within the scope of the present specification.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered as illustrative only and not limiting, of the present invention. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, though not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.

Additionally, the order in which elements and sequences are described in this specification, the use of numerical letters, or other designations are not intended to limit the order of the processes and methods described in this specification, unless explicitly stated in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the foregoing description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Where numerals describing the number of components, attributes or the like are used in some embodiments, it is to be understood that such numerals used in the description of the embodiments are modified in some instances by the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document is inconsistent or contrary to the present specification, and except where the application history document is inconsistent or contrary to the present specification, the application history document is not inconsistent or contrary to the present specification, but is to be read in the broadest scope of the present claims (either currently or hereafter added to the present specification). It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. An intelligent image capturing method, which is executed by a target terminal, comprises the following steps:

acquiring a fixation point sequence of a target user in a target time period, wherein the fixation point sequence comprises fixation points of the target user at a plurality of time points in the target time period;

determining whether a picture needs to be captured based on the gaze point sequence;

when a picture needs to be captured, determining a target mode for capturing the picture; and

and performing picture capture based on the target mode.

2. The method of claim 1, wherein the determining whether a picture needs to be captured based on the sequence of gaze points comprises:

determining a fixation point stable value of the target user in the target time period based on the fixation point sequence, wherein the fixation point stable value represents displacement amplitude of the fixation point of the target user at the plurality of time points;

when the stable value of the fixation point meets a first preset condition, determining that a picture needs to be captured;

and when the fixation point stable value does not meet a first preset condition, determining that a picture does not need to be captured.

3. The method of claim 1, wherein determining a target manner of capturing the picture when the picture needs to be captured comprises:

acquiring an environment picture sequence corresponding to the gazing point sequence, wherein the environment picture sequence comprises environment pictures of the gazing point range of the target user at the multiple time points;

determining an ambient picture stationarity value based on the ambient picture sequence, the ambient picture stationarity value characterizing a magnitude of change of the ambient picture at the plurality of time points;

determining the target manner in which to capture the scene based on the ambient scene smoothness value.

4. The method of claim 3, wherein determining the target manner in which to capture the picture based on the ambient picture stationarity value comprises:

acquiring a reference image set of the environment picture, wherein the reference image set comprises at least one reference image with a label;

acquiring a current picture of the fixation point range of the target user;

determining a similarity between each reference image in the reference image set and the current picture;

determining a target reference image based on the similarity between each reference image and the current picture;

determining a tendency factor of the current picture based on a tendency preset parameter, the similarity between the target reference image and the current picture and the label of the target reference image; and

determining the target manner of capturing a picture based on the tendency factor and the ambient picture flatness value.

5. An intelligent image capture system, the system comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a fixation point sequence of a target user in a target time period, and the fixation point sequence comprises fixation points of the target user at a plurality of time points in the target time period;

a first determining module, configured to determine whether a picture needs to be captured based on the gaze point sequence;

the second determining module is used for determining a target mode of capturing the picture when the picture needs to be captured; and

and the processing module is used for carrying out picture capture based on the target mode.

6. The system of claim 5, wherein the first determination module is further configured to:

determining a fixation point smoothness value of the target user within the target time period based on the fixation point sequence, the fixation point smoothness value representing displacement amplitudes of the fixation point of the target user at the plurality of time points;

7. The system of claim 5, wherein the second determination module is further configured to:

determining the target manner of capturing a picture based on the ambient picture flatness value.

8. The system of claim 7, wherein the second determination module is further configured to:

acquiring a current picture of the fixation point range of the target user;

9. An intelligent image capture device, characterized in that the device comprises at least one processor and at least one memory;

the at least one memory is for storing computer instructions;

the at least one processor is configured to execute at least some of the computer instructions to implement the method of claims 1-4.

10. A computer-readable storage medium, wherein the storage medium stores computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer performs the method of claims 1-4.