CN117472256A

CN117472256A - Image processing method and electronic equipment

Info

Publication number: CN117472256A
Application number: CN202311800464.6A
Authority: CN
Inventors: 刘洋
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-01-30

Abstract

The embodiment of the application is applied to the field of image processing, and provides an image processing method and electronic equipment, wherein the method is applied to the electronic equipment and comprises the following steps: under the condition that a display screen of the electronic equipment displays a first image, acquiring a first detection image, wherein a user watching the first image is recorded in the first detection image; processing the first detection image to obtain a first target position in the first image; amplifying the first image by taking the first target position as the center to obtain a second image; the second image is displayed using the display screen. Based on the technical method, dependence on manual operation of a user can be reduced, the flexibility of image amplification control is improved, and user experience is improved.

Description

Image processing method and electronic equipment

Technical Field

The present application relates to the field of image processing, and more particularly, to an image processing method and an electronic device.

Background

With the increasing functions of the terminal equipment, people can collect media data by utilizing the functions of photographing, recording, video recording and the like of the intelligent terminal such as a mobile phone and the like, modify and edit the media data by utilizing the editing function of the intelligent terminal, and display, play or share the media data to others through the intelligent terminal.

When a user views an image displayed on a display screen of the electronic device, if some areas of the image need to be enlarged, the user can perform an enlarging operation on the display screen. The zoom-in operation may be that two fingers of the user can slide on the display screen in directions away from each other. The electronic device may zoom in on the displayed image centered on the midpoint of the line connecting the starting positions of the two fingers. The amplification operation is complicated. That is, in a scene of enlarging an image, a user needs to perform a complicated manual operation to cause the electronic apparatus to enlarge the displayed image centering on the position indicated by the user.

When a user uses an electronic device, the user generally needs to firmly grasp the electronic device with one hand. The other hand operates on the display screen. In the case of a complicated gesture operation in which the user performs the zoom-in operation with one hand, the attention of the user is mainly focused on the hand performing the zoom-in operation, and the attention of the other hand is lowered, resulting in a high risk of dropping the electronic apparatus.

Disclosure of Invention

The application provides an image processing method and electronic equipment, which can avoid manual operation such as touch operation and the like of a user on the electronic equipment, improve the flexibility of image amplification control, reduce the falling risk of the electronic equipment and improve the user experience.

In a first aspect, an image processing method is provided, applied to an electronic device, and the method includes: under the condition that a display screen of the electronic equipment displays a first image, acquiring a first detection image, wherein a user watching the first image is recorded in the first detection image; processing the first detection image to obtain a first target position in the first image; amplifying the first image by taking the first target position as the center to obtain a second image; the second image is displayed using the display screen.

According to the image processing method provided by the embodiment of the application, the first target position in the first image is determined according to the first detection image acquired under the condition that the first image is displayed on the display screen, and the user is recorded in the first detection image. The first image is amplified by taking the first target position as the center, and the second image obtained after the amplification is displayed by the display screen, so that the requirement of the first image on the central position for amplification is met, meanwhile, manual operation such as touch operation and the like of a user on the electronic equipment is avoided, the flexibility of the control mode of the user on the image amplification is improved, and the user experience is improved.

Moreover, the user does not need to control the electronic equipment through manual operation such as touch operation, and two hands can grab and stabilize the electronic equipment, so that the risk of falling of the electronic equipment is reduced.

In one possible implementation manner, before the zooming in on the first image centered on the first target position, the method further includes: determining target distances among a plurality of target nodes of the face of the user according to a proportion indication image, wherein the proportion indication image is an image recorded with the user and acquired under the condition that the display screen displays the first image; determining a target amplification ratio corresponding to the target distance according to the corresponding relation between the distance and the amplification ratio; the amplifying the first image with the first target position as the center includes: and amplifying the first image by taking the first target position as the center according to the target amplification ratio so as to obtain the second image.

And determining the target distance between the face nodes of the user according to the proportion indication image acquired under the condition that the first image is displayed on the display screen, and amplifying the first image according to the target amplification proportion corresponding to the target distance, namely determining the amplification proportion according to the proportion indication image, wherein the amplification proportion of the first image is more in accordance with the user requirement. And the target distance is determined according to the proportion indication image, so that the determination of the target distance does not depend on manual operation of the electronic equipment by a user, such as touch operation, and the like, the dependence degree of image amplification on the manual operation of the user is reduced, and the user experience is improved.

It should be understood that the scale indicator image may be the first detection image or may be another image other than the first detection image. The sensor that collects the scale indicating image may be the same as or different from the sensor that collects the first detection image.

In a possible implementation manner, the second image is obtained by amplifying a first target area in the first image according to a preset amplifying scale; the method further comprises the steps of: under the condition that the display screen of the electronic equipment displays the second image, acquiring a second detection image, wherein the second detection image records the user; processing the second detection image to obtain a second target position in the second image; amplifying a second target area in the second image by taking the second target position as a center to obtain a third image; and displaying the third image by using the display screen.

And under the condition that the display screen displays a second image obtained by amplifying the first image, the user can acquire the second detection image again, so that a second target position in the second image is determined according to the second detection image, the second image is amplified by taking the second target position as the center, and a third image obtained by amplifying the second image is displayed. That is, the center of each enlargement of the image displayed on the display screen is determined based on the image recorded with the user, so that the enlargement processing of the image displayed on the display screen more meets the user's needs.

In one possible implementation manner, before the zooming in on the first image centered on the first target position, the method further includes: determining target distances among a plurality of target nodes of the face of the user according to a proportion indication image, wherein the proportion indication image is an image recorded with the user and acquired under the condition that the display screen displays the first image; determining a target amplification ratio corresponding to the target distance according to the corresponding relation between the distance and the amplification ratio; the amplifying the first image with the first target position as the center to obtain a second image includes: and amplifying the first image according to the preset amplification ratio under the condition that the target amplification ratio is larger than or equal to the preset amplification ratio so as to obtain the second image.

By processing the first detection image, there may be a difference between the first target position in the obtained first detection image and the position that the user actually wants to instruct. In the case of a magnification of a smaller multiple of the first image, the difference between the magnified image and the image that the user wishes to see is smaller as a result of this difference. But as the magnification of the first image increases, the difference between the magnified image and the image that the user wishes to see increases. And when the target magnification ratio determined according to the proportion indication image is larger and is larger than or equal to the preset magnification ratio, the first image is magnified according to the preset magnification ratio by taking the first target position as the center to obtain a second image, and when the second image obtained by the magnification is displayed on the display screen, the second target position is determined according to the second detection image recorded with the user, and the second image is magnified by taking the second target position as the center, so that the image displayed by the electronic equipment is more consistent with the image hoped to be seen by the user, and the user experience is improved.

In one possible implementation, the first detection image is acquired by a first camera in the electronic device, the first camera and the display screen face in the same direction, and the first target position is a position where the user gazes.

In the case where the user views the first image, the position at which the user gazes is taken as the first target position. That is, the user can indicate the first target position by controlling the gazing position, so that the difficulty of indicating the first target position by the user is reduced, and the user experience is improved.

In the case that the user views the first image displayed on the display screen, the eyes of the user are generally located in the image acquisition range of the first camera, that is, the image acquired by the first camera is generally recorded with the eyes of the user. Thus, from the first detection image, the first target position can be determined.

In one possible implementation manner, the processing the first detection image to obtain a first target position in the first image includes: processing the first detection image by using a gaze point estimation model to obtain a first initial position, wherein the gaze point estimation model is a neural network model obtained through training; and determining the first target position corresponding to the first initial position according to the corresponding relation between the initial position and the target position, wherein the corresponding relation is determined by a plurality of second initial positions and gaze positions corresponding to each second initial position, the plurality of second initial positions are obtained by respectively processing a plurality of user gaze images by using the gaze point estimation model, and the gaze positions corresponding to each second initial position are positions on the display screen, which are watched by the user when the first camera acquires the user gaze images for obtaining the second initial positions.

And according to the second initial position and the gazing position obtained by processing the user gazing image acquired by the first camera when the user gazes at the gazing position on the display screen by the gazing point estimation model, the corresponding relation between the initial position and the target position can be determined. And determining a first target position corresponding to the first initial position obtained by processing the first detection image by using the gaze point estimation model according to the corresponding relation between the initial position and the target position, so that the first target position is more accurate. Meanwhile, the corresponding relation between the initial position and the target position is determined in a simple and convenient mode, so that the accuracy of the first target position is improved more simply and conveniently.

In one possible implementation manner, the first image is a preview image acquired in real time by a second camera in the electronic device; the amplifying the first image with the first target position as the center to obtain a second image includes: zooming is carried out by taking the first target position as the center, and the second image is a zoomed preview image.

The method and the device can be applied to the situation that a user views the preview image acquired by the second camera in real time by using the electronic equipment, and the zooming of the first image can be understood as zooming of which the proportion of the shooting object in the acquired image is increased in the image acquisition process. That is, when the user uses the photographing function of the electronic device, the zoom may be controlled using the scheme of the present application.

In one possible implementation, the first detection image is acquired by a sensor in the electronic device, a first orientation of the sensor being different from a second orientation of the second camera.

In one possible implementation manner, the amplifying the first image to obtain a second image includes: and amplifying the first image to obtain the second image under the condition that the user behavior of the user recorded in the user behavior image is a preset amplifying behavior.

Under the condition that the user behavior recorded on the user behavior image acquired by the user is the preset amplifying behavior, amplifying the first image so that the amplifying of the first image is more in line with the user requirement.

It should be appreciated that the user behavior image and the first detection image may be the same or different images. The user behavior image and the first detection image may be acquired by the same or different sensor.

In a second aspect, an image processing apparatus is provided comprising means for performing the method of the first aspect. The device can be a terminal device or a chip in the terminal device. The image processing apparatus comprises means for performing the method of the first aspect described above.

In a third aspect, there is provided an electronic device comprising a memory for storing a computer program and a processor for calling and running the computer program from the memory, causing the electronic device to perform the method of the first aspect.

In a fourth aspect, there is provided a chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface, the processor performing the method of the first aspect when the processor executes the instructions.

In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing computer program code for implementing the method of the first aspect.

In a sixth aspect, there is provided a computer program product comprising: computer program code for implementing the method of the first aspect.

Drawings

FIG. 1 is a schematic diagram of a hardware system suitable for use in the apparatus of the present application;

FIG. 2 is a schematic diagram of a software system suitable for use with the apparatus of the present application;

FIG. 3 is a schematic flow chart of an image processing method provided by an embodiment of the present application;

FIGS. 4 and 5 are schematic diagrams of graphical user interfaces provided by embodiments of the present application;

FIG. 6 is a schematic flow chart of another image processing method provided by an embodiment of the present application;

FIGS. 7 and 8 are schematic diagrams of graphical user interfaces provided by embodiments of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

With the rapid development of electronic technology and image processing technology, the photographing functions of electronic devices such as smart phones, tablet computers, digital cameras and the like are becoming more and more powerful. In the process of photographing by using the electronic equipment, a user can control the electronic equipment to realize functions of zooming, focusing, photographing and the like by operating a touch screen of the electronic equipment or keys, a knob and the like of the electronic equipment. The user can control the electronic equipment by operating a touch screen of the electronic equipment or keys, knobs and the like of the electronic equipment.

However, when the user controls the photographing function of the electronic device in a manual manner, only one hand can perform the operation of controlling the photographing function, and the other hand can firmly grasp the electronic device, so that the risk of falling the electronic device is high. For electronic devices with full touch screens, a user needs to avoid contact with the touch screen while firmly grabbing the electronic device when photographing, otherwise, the contact with the touch screen may cause the electronic device to perform other functions, which affects photographing functions. Particularly, in the case where the user wishes to zoom in on the display area of the object recorded in the image by zooming, the two fingers of the user need to slide in directions away from each other on the display screen to complete the zooming operation, and the midpoint of the line connecting the start positions of the two fingers of the user is the center of the area to be zoomed in.

The electronic device may also have functions of displaying images, playing videos, etc. A user may need to view and appreciate certain areas of an image in more detail while viewing the image. Similar to the case of photographing, the user can realize the enlargement of the image partial area through the enlargement operation.

However, the manual operation by the user is complicated. When a user uses an electronic device, the user generally needs to firmly grasp the electronic device with one hand. The other hand operates on the display screen. In the case of a complicated gesture operation in which the user performs the zoom-in operation with one hand, the attention of the user is mainly focused on the hand performing the zoom-in operation, and the attention of the other hand is lowered, resulting in a high risk of dropping the electronic apparatus.

In order to solve the above problems, an embodiment of the present application provides an image processing method and an electronic device.

Fig. 1 shows a hardware system suitable for use in the electronic device of the present application.

The method provided by the embodiment of the application can be applied to various electronic devices capable of networking communication, such as mobile phones, tablet computers, wearable devices, notebook computers, netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the embodiment of the application does not limit the specific types of the electronic devices.

Fig. 1 shows a schematic configuration of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

The processor 110 may include one or more interfaces.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.

The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate vibration. The motor 191 may be used for incoming call alerting as well as for touch feedback. The motor 191 may generate different vibration feedback effects for touch operations acting on different applications. The motor 191 may also produce different vibration feedback effects for touch operations acting on different areas of the display screen 194. Different application scenarios (e.g., time alert, receipt message, alarm clock, and game) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, which may be used to indicate a change in state of charge and charge, or may be used to indicate a message, missed call, and notification.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, a An Zhuoyun row (Android run) system library, and a kernel layer. The application layer may include a series of application packages.

As shown in fig. 2, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android runtimes include core libraries and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer may include a display driver, a camera driver, an audio driver, a sensor driver, and the like.

The camera or the gallery in the application layer can be used for sending acquisition indication information to the camera driver in the kernel layer, so that the camera driver drives the camera to acquire images.

The camera or gallery may be used to process images captured by the camera or images stored in memory. The camera or gallery may also send display indication information to the display driver to cause the display driver to drive the display screen such that the display screen displays an image indicated by the display indication information.

The image processing method provided in the embodiment of the present application is described in detail below with reference to fig. 3 to 8. The main execution body of the method provided in the present application may be an electronic device, or may be a software/hardware module capable of performing sound processing in the electronic device, and for convenience of explanation, the electronic device is taken as an example in the following embodiment.

Fig. 3 is a schematic flowchart of an image processing method provided in an embodiment of the present application. The method may include steps S310 to S320, which are described in detail below, respectively.

In step S310, under the condition that the display screen of the electronic device displays the first image, a first detection image is collected, and a user watching the first image is recorded in the first detection image.

The first image may be an image stored in the electronic device, an image obtained by scaling the image stored in the electronic device, or a preview image displayed on a display screen in the electronic device in real time in a photographing or video scene.

Before proceeding to S310, a camera start operation of the user may be acquired. The display screen may display the preview image in real time in response to a camera start operation of the user. The first image may be a preview image displayed in real time.

Fig. 4 (a) shows a graphical user interface (graphical user interface, GUI) of the electronic device, which is the desktop 410 of the electronic device. In the event that a camera start operation is detected in which the user clicks an icon 411 of a camera Application (APP) on the desktop 410, the electronic device may start the camera application, displaying another GUI as shown in (b) of fig. 4, which may be referred to as a photographing interface 420. A viewfinder 422 may be included on the capture interface 420. In the preview state, a preview image can be displayed in real time in the viewfinder 422. The first image may be a preview image displayed by the viewfinder 422.

Alternatively, before proceeding to S310, the image browsing operation of the user may be acquired. The display screen may display an image selected by the user in response to an image browsing operation by the user.

Fig. 5 (a) shows still another GUI of the electronic device, which is a gallery interface 510. The gallery interface includes a plurality of image icons. Different image icons correspond to different images. In the case of detecting an image browsing operation in which the user clicks a certain image icon, the electronic device may display another GUI as in (b) of fig. 5, which may be referred to as an image display interface 520. The image display interface 520 may include an image display area 521, where the image display area 521 is used to display an image corresponding to an image icon clicked by the user. The first image may be an image displayed by the image display area 521.

Acquiring the first detection image may be understood as acquiring the first detection image with a sensor in the electronic device. The sensor for acquiring the first detection image may be oriented in the same direction as the display screen. The same is also understood to be approximately the same. The sensor may be an infrared sensor, a distance sensor or a camera in the electronic device.

In the case where the sensor is an infrared sensor, the first detection image may be a temperature map determined from temperature information detected by the infrared sensor. In the case where the sensor is a distance sensor, the first detection image may be a depth map determined from distance information detected by the distance sensor. In the case where the sensor is a camera, the first detection image may be a color map determined according to light information collected by the camera.

Step S320, processing the first detection image to obtain a first target position in the first image.

In some embodiments, the first target location in the first image may be determined from a first indicated location of the user in the first detection image.

For example, the direction of the first indication position with respect to the origin of the first detection image may be the direction of the first target position with respect to the origin of the first image. The origin of the first detection image and the origin of the first image may be the same type of points in the image, for example, the origin of the first detection image and the origin of the first image may be a midpoint of the first detection image and the first image, a vertex in the same direction, an upper edge bisector, a lower edge trisection, and the like, respectively. Different sizes of the user in the first detection image may correspond to different distances between the first indication position and the origin.

The size of the user in the first detection image may be the size of the area where the user is located in the first detection image, or may be the size of the area where a certain or some preset limb or organ of the user is located in the first detection image. For example, the size of the user's face or the size of the user's eyes may be used as the size of the user in the first detection image.

The size of an image can be understood as the width and height of a picture, and can be expressed as "width x height". The width of a picture can be understood as the number of pixels in the direction of the width. The height of a picture can be understood as the number of pixels in the direction of the height. The size of a region in an image can be understood as the number of pixels in that region. In the case where the region is rectangular, the dimensions of the region may be expressed as the width and height of the region.

The user may adjust the size of the user in the first detected image by adjusting the distance to a sensor in the electronic device for acquiring the first detected image and/or adjusting the direction relative to the sensor.

The first indication location represents a location of the user in the first detection image. The first indication position may be a position of a center of an area where the user is located in the first detection image. In the case that the sensor for acquiring the first detection image is a camera, the first indication position may also represent a position of a certain or some preset limb or organs of the user in the area where the first detection image is located. For example, the first indication position may represent a face position where the face of the user is located or a pupil position where the pupil of the user is located in the first detection image. The position of the preset limb or organ of the user in the first detection image may be the position of the center of the area where the preset limb or organ of the user is located in the first detection image.

The sensor for acquiring the first detection image may be a first camera of the electronic device. The first camera and the display screen face the same direction. That is, the first camera is a front camera.

In the case that the user views the first image displayed on the display screen, the eyes of the user are generally located in the image acquisition range of the first camera, that is, the image acquired by the first camera is generally recorded with the eyes of the user. In the case where the user views the first image displayed on the display screen, the user's face may not be entirely located in the image captured by the first camera. Therefore, the first indication position represents the pupil position of the pupil in the first detection image, so that the first indication position is determined more conveniently and with higher accuracy.

For another example, according to the correspondence between the indication position and the target position, a first target position corresponding to the first indication position of the user in the first detection image may be determined.

Different indicated positions may represent different positions in the image acquired by the first camera.

The first detection image acquired by the sensor may not be completely sized with the first image. The correspondence between the indication position and the target position may be determined according to a proportional relationship between the size of the image acquired by the first sensor and the size between the first images.

For example, the indication position indicating the upper left corner of the first detection image may correspond to the target position of the upper left corner of the first image, the target position in the first image corresponding to the indication position of the upper left corner of the first detection image having a horizontal distance x, the horizontal distance x from the upper left corner of the first image being the product of x and a horizontal dimension ratio, the horizontal dimension ratio being the ratio of the width of the first image to the width of the first detection image. The width of an image represents the length of the side in the horizontal direction of the image.

And determining the first target position according to the corresponding relation between the indication position and the target position, wherein the determination mode of the first target position is simpler and more convenient. The user can adjust the position in the first detection image, namely the relative position with the electronic equipment, namely the first target position, so that the user operation is simpler and more convenient.

In other embodiments, the first target location may be a location at which the user gazes.

To facilitate the determination of the position of the user's gaze, the sensor for acquiring the first detection image may be a first camera facing the same direction as the display screen.

In the case that the user views the first image displayed on the display screen, the eyes of the user are generally located in the image acquisition range of the first camera, that is, the image acquired by the first camera is generally recorded with the eyes of the user. According to the first detection image acquired by the first camera, the position of the user's gaze can be determined.

Illustratively, the first detected image is processed using a gaze point estimation model, and a first target location in the first image may be obtained.

The gaze point estimation model may be a trained neural network model. The training process may include training the initial gaze point estimation model with pre-training data.

The neural network may be composed of neural units, which may be referred to as x _s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be:

wherein s=1, 2, … … n, n is a natural number greater than 1, W _s Is x _s B is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to induce a nerveThe input signal in the cell is converted to an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by joining together a number of the above-described single neural units, i.e., the output of one neural unit may be the input of another. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

The gaze point estimation model may be a deep neural network model. Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. In DNN, the layers may be fully connected, that is, any neuron of the i-th layer must be connected to any neuron of the i+1-th layer.

The pre-training data comprises a pre-training sample and a labeling target position, wherein the labeling target position represents the position of a training display screen in training electronic equipment for collecting pre-training detection images, wherein the character recorded by the pre-training detection images is watched by the training display screen. The pre-training detection image may be an image acquired by a training camera of the training electronic device with the person looking at the tagged target location. The training electronic device may be the same as or different from the electronic device applying the method shown in fig. 3.

In the training process, the initial gaze point estimation model may be used to process the pre-training samples to obtain pre-training target positions. Based on the difference between the pre-trained target position and the annotated target position, parameters of the initial gaze point estimation model may be adjusted to minimize the difference.

The difference between the pre-training target position and the labeling target position may be expressed as a loss value.

In training the neural network model, because the output of the neural network model is as close to the value that is actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actual target value that is actually expected, and then according to the difference between the predicted value of the current network and the actual target value (of course, there is usually an initialization process before the first update, that is, the parameters are preconfigured for each layer in the neural network model), for example, if the predicted value of the model is higher, the weight vector is adjusted to make it predict lower, and the adjustment is continued until the neural network model can predict the actual target value or a value very close to the actual target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value of the loss function, i.e. the loss value (loss), the larger the difference, the training of the neural network model becomes a process of reducing the loss as much as possible.

The size of parameters in the initial neural network model can be corrected in the training process by adopting an error Back Propagation (BP) algorithm, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and is intended to derive parameters of the optimal neural network model, e.g., a weight matrix.

The pre-training samples may include pre-training detection images and/or pre-training pupil positions, etc. Where the pre-training sample includes pre-training pupil positions, the pre-training sample may also include pre-training face positions and/or pre-training eye images.

Processing the first detected image with the gaze point estimation model may be understood as processing the reasoning data with the gaze point estimation model to obtain the first target position. The inferential data may comprise the first detected image, and may also comprise other information determined from the first detected image.

In the case where the pre-training sample comprises a pre-training detection image, the inference data may comprise a first detection image.

Where the pre-training sample includes a pre-training pupil position, the inferential data may include the pupil position. The pre-training pupil position represents the position of the pupil of the person in the pre-training detection image. The pupil position represents the position of the pupil of the user in the first detection image.

In the case where the pre-training sample includes pre-training face locations, the inference data may include face locations. The pretrained face position represents the position where the face of the person is located in the pretrained detection image. The face position represents the position in which the face of the user is located in the first detection image.

The location of a limb or organ such as the pupil, face, etc. is also understood to be the region of the limb or organ.

In the case where the pre-training sample comprises a pre-training eye image, the inference data may comprise an eye image. The pretrained face position represents an image of an area where eyes of a person are located in the pretrained detection image. The eye image represents an image of an area where eyes of the user are located in the first detection image.

And carrying out parameter adjustment on the initial gaze point estimation model by utilizing the pre-training data, wherein the adjusted initial gaze point estimation model can be used as a gaze point estimation model. In this case, the pre-training data may also be referred to as training data, the pre-training sample and the pre-training target position may be referred to as training sample and training target position, respectively, and the pre-training detection image, the pre-training pupil position, the pre-training face position, the pre-training eye image, etc. may be referred to as training detection image, training pupil position, pre-training face position, training eye image, etc. respectively.

The person in the pre-training data may be the user of the electronic device to which the method shown in fig. 3 is applied, or may be another person other than the user of the electronic device.

In the case where the characters in the plurality of pre-training data used in the process of performing parameter adjustment on the initial gaze point estimation model by using the pre-training data include other characters than the user of the electronic device, the adjusted initial gaze point estimation model may also be used as the pre-training gaze point estimation model. The training process of parameter adjustment of the initial gaze point estimation model with pre-training data may be understood as pre-training.

And training the pre-training gaze point estimation model again by utilizing the training data, namely continuously adjusting parameters of the pre-training gaze point estimation model to obtain the gaze point estimation model.

The training data comprises a pre-training sample and a labeling target position, and the labeling target position represents the position of a user watching a display screen in the electronic equipment, which is recorded by the training detection image. The electronic device is an electronic device applying the method shown in fig. 3. The user is a user using the electronic device. The training detection image may be an image acquired by a first camera of the electronic device with the user looking at the tagged target location.

In the retraining process, the pre-training gaze point estimation model may be used to process training samples to obtain pre-training target positions. Based on the difference between the training target position and the labeling target position in the training data, parameters of the pre-training gaze point estimation model may be adjusted to minimize the difference.

In the case where the pre-training sample comprises a pre-training detection image, the training sample may comprise a training detection image.

In the case where the pre-training sample comprises a pre-training pupil position, the training sample may comprise a training pupil position. The training pupil position represents the position of the pupil of the person in the training detection image.

In the case where the pre-training samples include pre-training face positions, the training samples may include training face positions. The training face position represents the position where the face of the person in the training detection image is located.

In the case where the pre-training sample comprises a pre-training eye image, the training sample may comprise a training eye image. The training face position represents an image of an area where eyes of a person are located in the training detection image.

Retraining of the pre-trained gaze point estimation model may be understood as a calibration of the pre-trained gaze point estimation model for the user.

The first detected image is processed with a gaze point estimation model, the output of which may be the first target position.

Alternatively, in a case where the characters in the plurality of pre-training data used in the process of performing parameter adjustment on the initial gaze point estimation model using the pre-training data include other characters than the user of the electronic device, the adjusted initial gaze point estimation model may be used as the gaze point estimation model.

The first detected image is processed with a gaze point estimation model, the output of which may be referred to as a first initial position.

After the first initial position is determined, a first target position corresponding to the first initial position may be determined according to the correspondence between the initial position and the target position.

The first target position corresponding to the first initial is determined according to the corresponding relation between the initial position and the target position, and the calibration process for the user can be understood.

The correspondence between the initial position and the target position may be determined according to the plurality of second initial positions and gaze positions corresponding to each of the second initial positions. The plurality of second initial positions are obtained by processing a plurality of user gaze images using a gaze point estimation model, respectively. The gaze location corresponding to each second initial location is a location on the display screen at which the user gazes when the first camera captures the user gaze image for obtaining the second initial location.

That is, when the user respectively gazes at a plurality of gaze locations of the display screen, the first camera may acquire a user gaze image corresponding to each gaze location. And processing the user fixation image corresponding to each fixation position by using the fixation point estimation model, so as to obtain a second initial position corresponding to each fixation position. And according to the second initial position corresponding to each gaze position, the corresponding relation between the initial position and the target position can be determined.

In the correspondence, the plurality of initial positions may include the plurality of second initial positions, and the target position corresponding to each of the second initial positions may be a gaze position corresponding to the second initial position. Alternatively, curve fitting may be performed for the second initial position corresponding to each gaze location, and the result of curve fitting, that is, the functional relationship between the second initial position and the gaze location, may represent the corresponding relationship between the initial position and the target position.

It should be appreciated that the training of the gaze point estimation model may be performed by the electronic device applying the method shown in fig. 3, or by other electronic devices. For example, training with the pre-training data to obtain the pre-trained gaze point estimation model may be performed by other electronics, and training with the training data to obtain the gaze point estimation model may be performed by the electronic device applying the method shown in fig. 3.

By calibrating the first target position for the user, the obtained first target position is more accurate. Compared with the mode of training the pre-training gaze point estimation model obtained by pre-training, the method for determining the corresponding relation between the initial position and the target position is simpler and more convenient, and the calibration for the user is simpler and more convenient.

Step S330, taking the first target position as the center, amplifying the first image to obtain a second image.

The magnification ratio of the first image may be preset or may be determined according to an instruction of the user.

Before proceeding to step S330, it may be determined whether to proceed to step S330 according to the user behavior of the user recorded in the user behavior image.

In case the user behavior is a preset amplifying behavior, step S330 may be performed; otherwise, in the case that the user behavior is not the preset amplifying behavior, step S330 may not be performed.

The user behavior image and the first detection image may be the same or different images.

If the user behavior image is the first detection image, step S320 may also be performed in the case that the user behavior is a preset magnification behavior.

If the user behavior image is not the first detection image, the electronic device may be utilized to perform acquisition of the user behavior image with the same or different sensor as that used to acquire the first detection image in a case where the first image is displayed on a display screen of the electronic device. The user behavior image may be recorded with the user. Thus, in the case where the user behavior of the user recorded in the user behavior image is a preset amplification behavior, step S310 may be performed to acquire a first detection image using a sensor in the electronic device.

The sensor for capturing the user behavior image may be the same as or different from the sensor for capturing the first detection image in the electronic device. The sensor for capturing an image of the user's behavior may be an infrared sensor, a distance sensor or a camera in the electronic device.

Determining whether the user behavior is a preset amplifying behavior according to the first detection image, and determining a first target position by the first detection image when the user behavior is the preset amplifying behavior, so that the user is required to indicate the first target position in the first image while the user is required to perform the preset amplifying behavior, and the preset amplifying behavior and the manner of indicating the first target position in the first image are limited more. For example, the behavior of the user for indicating the first target position in the first image and the preset magnification behavior cannot be opposite or contradictory behaviors. The user performs the preset amplifying action and the action of indicating the first target position in the first image simultaneously, and the requirement on the action of the user is high.

And under the condition that the user behavior is determined to be the preset amplifying behavior, the first detection image is acquired, and the first target position is determined according to the first detection image, so that the implementation is simpler and more convenient.

In step S330, the first image may be enlarged according to a target enlargement ratio with the first target position as a center, so as to obtain a second image.

In some embodiments, the second image may be scaled up to the target.

Before proceeding to step S330, a distance between the user and the electronic device may be acquired. According to the user distance between the user and the electronic equipment and the corresponding relation between the distance and the amplification proportion, the target amplification proportion corresponding to the user distance can be determined.

The distance between the user and the electronic device may be expressed as the distance between the user's head and the electronic device.

The distance between the user and the electronic device can be determined according to information collected by a distance sensor in the electronic device. Alternatively, the distance between the user and the electronic device may be determined according to the size of the limb or organ of the user in the scale indication image. The distance sensor may be, for example, a time of flight (TOF) sensor or the like.

Alternatively, before proceeding to step S330, a target distance between a plurality of target nodes acquiring the face of the user is determined from the scale instruction image. According to the corresponding relation between the distance and the amplification ratio, the target amplification ratio corresponding to the target distance can be determined.

The scale indicating image may be an image recorded with the user acquired in a case where the display screen displays the first image.

The scale indicating image may be the first detection image or may be another image. The scale indication graphic may be the same or a different image than the user behavior image. The sensor for acquiring the scale indicating image may be the same as or different from the sensor for acquiring the first detection image, the sensor for acquiring the user behavior image.

Under the condition that the sensors for acquiring the proportion indication image, the first detection image and the user behavior image are the same sensor, the requirements on the sensor in the electronic equipment are reduced, so that the image processing method provided by the embodiment of the application has wider applicability.

In order to obtain target distances between a plurality of target nodes of the face of the user, the first detection image may be identified to obtain a plurality of target nodes of the face of the user, thereby determining the target distances.

Alternatively, after determining the first target position according to the first detection image, the image may be acquired again by using the first camera to obtain the scale indication image. The scale indication image has the user viewing the first image recorded therein. According to the target distances between a plurality of target nodes of the face of the user recorded in the scale indication image and the corresponding relation between the distances and the amplification scales, the target amplification scale corresponding to the target distances can be determined.

The target distances between the plurality of target nodes of the face may be expressed as distances of the plurality of target nodes in the image, or may be expressed as a ratio of the distances of the plurality of target nodes in the image to the distances of the plurality of reference nodes in the image.

The target distance may represent a lip distance, a pupil distance, etc. of the user.

In the case where the target distance represents the lip distance of the user, the plurality of target nodes of the face may include the midpoint of the upper lip, the midpoint of the lower lip, and the like. The target distance may be expressed as a distance of the plurality of target nodes in the image captured by the first camera, or as a ratio of the distance of the target nodes in the image captured by the first camera to the distance of the reference node. Where the target distance is expressed as a ratio of the distance of the plurality of target nodes in the image to the distance of the plurality of reference nodes in the image, the plurality of reference nodes may include a left end point and a right end point of the lip.

The user adjusts the distance between the middle point of the upper lip and the middle point of the lower lip through the opening degree of the mouth, so that the distance of the target node in the image acquired by the first camera is adjusted, and the setting of the target magnification can be realized.

In the case where the target distance represents a pupil distance of the user, the plurality of target nodes of the face may include a left eye pupil node and a right eye pupil node. The target distance may be expressed as a distance of the plurality of target nodes in the image acquired by the first camera.

The user can set the target magnification by adjusting the relative orientation of the face and the camera, or by adjusting the distance between the face and the first camera.

In other embodiments, the second image may be obtained by amplifying the first image by a preset amplification scale.

In the case where the first image is an image stored in the electronic device or the first electronic device is a scaled image stored in the electronic device, the magnification ratio of the first image to the size of the second image in the area of the first image may be understood as a ratio between the size of the first image and the size of the second image in the area of the first image, that is, a ratio between the width (or length) of the first image and the width (or length) of the second image in the area of the first image. Alternatively, the magnification ratio of the first image may be a ratio of the size of the stored image to the size of the area where the second image is stored in the electronic device.

In the case that the first image is a preview image acquired in real time by the second camera in the electronic device, the magnification ratio of the first image is understood as a ratio of a size of a certain object in the second image to a size of the certain object in the first image, that is, a ratio of the size of the first image to a size of an area where the second image is located in the first image, in the case that a scene acquired by the second camera is unchanged. Alternatively, the magnification ratio at which the first image is magnified may be understood as a ratio between the zoom magnification of the second image and the zoom magnification of the first image. The magnification of the first image may be expressed as a zoom factor of the second image.

In the case where the first image is a preview image acquired by the second camera in real time, zooming is performed with the first target position as the center, or it may be understood that the second image may be a zoomed preview image with the first target position as the center.

In the case of optical zooming, the second camera may also be referred to as a preview image acquired in real time by the second camera after zooming.

Step S340, displaying the second image by using the display screen.

In the case where the second image is obtained by amplifying the first image according to a preset amplification ratio, the amplification ratio of the first image may not meet the user's requirement for the amplification ratio.

Illustratively, after step S330, the second image may be enlarged again.

That is, after step S340, in the case where the display screen of the electronic device displays the second image, the second detection image may be acquired with the first sensor, the second detection image having the user recorded therein.

And processing the second detection image to obtain a second target position in the second image.

And amplifying a second target area in the second image by taking the second target position as the center, so as to obtain a third image. Thus, the third image can be displayed using the display screen.

In some embodiments, in the case where the target magnification ratio of the instruction of the user is determined before proceeding to step S330, the magnitude relation of the target magnification ratio and the preset magnification ratio may be judged. In the case where the target magnification ratio is greater than or equal to the preset magnification ratio, in step S330, the first image may be magnified according to the preset magnification ratio to obtain the second image. And under the condition that the display screen displays the second image, the first sensor can acquire the second detection image, determine the second target position according to the second detection image, and zoom in the second image again by taking the second target position as the center.

By processing the first detection image, there may be a difference between the first target position in the obtained first detection image and the position that the user actually wants to instruct. In the case of a magnification of a smaller multiple of the first image, the difference between the magnified image and the image that the user wishes to see is smaller as a result of this difference. But as the magnification of the first image increases, the difference between the magnified image and the image that the user wishes to see increases. If the target magnification ratio is larger and larger than or equal to the preset magnification ratio, the position of the second image obtained by magnification in the first image may not meet the user requirement if the magnification is performed according to the target magnification ratio.

Therefore, when the target magnification ratio is larger than the preset magnification ratio, the first image is magnified according to the preset magnification ratio by taking the first target position as the center, and after the magnified second image is obtained, the first sensor is used for image collection again, the second target position in the second image is determined according to the collected second detection image, and the second image is magnified by taking the second target position as the center, so that the third image obtained by magnifying the second image more accords with the requirement of a user.

The processing after the first image is enlarged to obtain the second image according to the preset ratio can be understood as processing performed again with the second image as the first image.

When the target magnification ratio is larger than the preset magnification ratio, the display screen displays the second image obtained by magnifying according to the preset magnification ratio, and when the second image is magnified, the magnification can be performed according to the difference between the target magnification ratio and the second target ratio. Alternatively, in the case where the difference between the target magnification ratio and the second target ratio is greater than or equal to the preset magnification ratio, the second image may be magnified in accordance with the preset magnification ratio. After the second image is enlarged according to the preset enlargement ratio, the third image obtained by enlargement may be used as the second image, and the enlargement of the image may be performed again.

Or when the target magnification ratio is larger than the preset magnification ratio, the display screen displays the second image obtained by magnifying according to the preset magnification ratio, the target magnification ratio can be obtained again, and the second image can be magnified according to the obtained target magnification ratio.

According to the method provided by the embodiment of the application, the first target position in the first image is determined according to the first detection image acquired by the sensor under the condition that the first image is displayed on the display screen, and the user is recorded in the first detection image. The first image is amplified by taking the first target position as the center, and the second image obtained after the amplification is displayed by the display screen, so that the requirement of the first image on the central position for amplification is met, meanwhile, manual operation such as touch operation and the like of a user on the electronic equipment is avoided, the flexibility of the control mode of the user on the image amplification is improved, and the user experience is improved.

The image processing method shown in fig. 3 will be described below with reference to fig. 6 to 8, taking the first image as a preview image displayed in real time in the photographing scene as an example.

Fig. 6 is a schematic flowchart of an image processing method provided in an embodiment of the present application. The relationship information determining method shown in fig. 6 may include steps S601 to S606, which are described in detail below.

Step S601, displaying the first image in real time by using the display screen.

The first image is an image acquired by the second camera in real time.

The second camera may be a rear camera of the electronic device. The user is taking a picture by using the electronic device, and the first image is a preview image acquired by the second camera in real time. The first image may be an image in a viewfinder 422 in the photographing interface 420 shown in fig. 4 (b).

In the case where the display screen displays the first image, step S602 may be performed.

Step S602, a first camera is used for collecting a user behavior image.

The first camera may be a front camera of the electronic device.

The user behavior image may have recorded therein a user viewing a first image displayed on the display screen.

The number of images in the user behavior image may be one or more. For example, the user behavior image may be a video including a plurality of frame images.

Step S603, determining whether the user behavior of the user recorded in the user behavior image is a preset amplifying behavior.

The preset amplifying behavior can be actions or expressions such as user nodding, shaking head, blinking, opening mouth, closing eyes and the like. Alternatively, the preset amplifying behavior may be that the user keeps a certain action or expression for a preset keeping time.

For example, in the case where the preset magnification behavior is a mouth opening, if the lip distance of the user exceeds the preset lip distance threshold in the user behavior image, it may be determined that the user behavior is the preset magnification behavior; in the case where the preset magnification behavior is blinking, if the time interval in which the pupil of the user is not recorded in the user behavior image satisfies the preset duration range, it may be determined that the user behavior is the preset magnification behavior. For another example, if the user keeps looking at a certain position in the screen for a preset holding period, it may be determined that the user behavior is a preset zooming-in behavior.

In case the user behavior is not a preset amplification behavior, step S602 may be performed again. The capturing of the user behavior image with the first camera may be performed periodically or aperiodically.

In the case where the user behavior is a preset amplification behavior, steps S604 to S611 may be performed.

The determination of whether the user behavior is a preset amplification behavior may be understood as determining input information for the switch-like control performed on the electronic device. The input of the switch class control is either "0" or "1". Inputs "0" and "1" correspond to different operations of the electronic device. Whether the user behavior is a preset amplification behavior, i.e., whether the input of the switch class control is "1" is determined. For example, in the case where the user action is a preset amplification action, it may be determined that the input of the switch-like control is "1", and the operation of the electronic device corresponding to the input of "1" may be step S604; in the case where the user action is not the preset amplification behavior, it may be determined that the input of the switch class control is "0", and the operation of the electronic device corresponding to the input of "0" may be to proceed to step S602.

In step S604, a first detection image is acquired by using a first camera.

The user is recorded in the first detection image.

Before proceeding to step S604, a first alert message may also be output to alert the user to indicate the first target location.

The electronic device can remind the user through vibration generated by the motor, flashing of the indicator light, or sound generated by the loudspeaker. That is, the reminding information output by the electronic device may be by sound, vibration, flashing of an indicator light, or the like.

Step S605 determines a first target position in the first image from the first detection image.

In the case that the user behavior is determined to be the preset amplifying behavior in step S603, image acquisition is performed by using the first camera to obtain a first detection image, and in step S605, a first target position in the first image is determined according to the first detection image.

The first target position may be a position at which the user gazes, or may be a position in the first image corresponding to the position of the user in the first detection image. The position of the user in the first detection image may be the center of the region of the head of the user in the first detection image, or may be the midpoint of the line connecting the positions of the two pupils of the user in the first detection image.

Determining the first target position from the first detection image may be understood as determining input information for coordinate-type control of the electronic device. The input of the coordinate type control is the coordinates of the display screen. The display screen displays a first image. The first target location in the first image may be understood as a location on the display screen based on the area of the first image in the display screen.

In the case where the first target position, that is, the input information of the coordinate type control is determined, steps S606 to S609 may be performed. That is, for the input information of the coordinate class control, the operation of the electronic apparatus may be to proceed to step S606 to step S609.

In the case where the first target position is determined in step S605, the first target position may also be displayed on the first image using the display screen.

In the case where the electronic device determines the first target position, the electronic device may display a position hint interface 720 as shown in (a) of fig. 7. The position presenting interface 720 includes a view box 721, and a first image can be displayed in real time in the view box 721. The location hint interface 720 also includes a location icon 724 that is located within the view box 721. The location icon 724 is used to indicate a first target location in the first image.

The electronic device may display the first target location by highlighting or blinking.

As described in (b) of fig. 7, the first target position may be a position of the user's gaze determined from the first detection image.

Step S606, a first camera is used for acquiring a proportion indication image.

The scale indication image has the user recorded therein.

Before proceeding to step S606, a second alert may also be output to alert the user to indicate the target zoom information.

Step S607, determining target zoom information according to the scale instruction image.

The target zoom information may represent a target zoom factor, and the target zoom information may also represent a ratio of the target zoom factor to the current zoom factor.

Determining the target zoom information from the scale indication image may be understood as determining input information for adjustment type control of the electronic device. The input information of the adjustment class control may be continuously variable information or may be discrete, i.e., discontinuously variable information. The electronic device may determine, according to the input information of the adjustment class control, output information corresponding to the input information of the adjustment class control, where the output information and the input information may have a linear or nonlinear correlation, and the output information may be positively correlated or negatively correlated with the input information.

The input information of the adjustment type control may be, for example, a head distance, a lip distance and a pupil distance of the user in the proportional indication image, a state stay time length, and the like.

The user can adjust the distance between the head and the electronic device, namely, the head distance by adjusting the distance between the head and the electronic device.

The user can adjust the lip distance of the user recorded in the comparative example instruction image by adjusting the degree to which the mouth is opened or by adjusting the distance from the electronic device.

The user may adjust the pupil distance of the user recorded in the comparative example instruction image by adjusting the relative orientation with the electronic device or by adjusting the distance with the electronic device.

The user can adjust by adjusting the duration of holding a certain preset state, i.e. the state stay duration recorded in the scale indication image.

The preset state may be a certain expression or action of the user, etc. For example, the preset state may be that the user's eyes converge at a certain point on the display screen, or may also be that the user opens a mouth, or the like.

From the scale indication image, input information for the adjustment class control can be determined. From the input information of the adjustment class control, the output information can be determined. The target zoom information may be understood as output information determined by the electronic device from input information of the adjustment class control.

In step S608, zooming is performed based on the target zooming information with the first target position as the center, so as to obtain a second image.

Zooming may be achieved by one or more of optical zooming (optical zoom), digital zooming (digital zoom), binary zooming (Hybrid zoom), and the like.

Before proceeding to step S608, an image area may also be determined according to the target zoom information and the first target position, and the image area may be displayed using the display screen, which may be understood as an area of the second image corresponding to the first image. The second image is an image zoomed according to the target zoom information based on the first image centering on the target position in the first image. Under the condition that the scene acquired by the second camera is changed, the second image is an image obtained by amplifying and zooming an image area in the first image.

The image area is determined based on the target zoom information and the first target position.

In the case where the electronic device determines the image area, the electronic device may display a zoom-prompt interface 810 as shown in (a) of fig. 8. The zoom presentation interface 810 includes a view frame 811, and a first image can be displayed in real time within the view frame 811. The zoom-presentation interface 810 also includes a bezel 812 positioned in the image area within the viewfinder 811.

Step S609, the second image is displayed by using the display screen.

Before proceeding to step S609, it may also be determined whether the user confirms the image region.

In some embodiments, where the display screen displays the first image and the image region in the first image, the indication image may be confirmed using the first camera acquisition region. The user's confirmation of the image area may be determined by the area confirmation indicating the user's behavior in the image.

The user is recorded in the area confirmation instruction image. In the case where the user behavior in the region confirmation instruction image is a preset confirmation behavior, the user confirmation image region may be determined.

Alternatively, in the case where the user behavior in the region indication image is a preset denial behavior, it may be determined that the user negates the image region, that is, the user does not confirm the image region.

In other embodiments, the user may indicate negation of the image area by a manual negation operation of the electronic device.

For example, a cancel icon may also be displayed while the display screen displays an image area on the first image. The zoom-tip interface 810 as shown in fig. 8 (a) may further include a cancel icon 813. In the case of acquiring a click operation of the cancel icon 813 by the user, the electronic device may determine that the user negates the image area. In a case where the time period for which the zoom-out interface 810 is displayed exceeds the preset time period without acquiring the click operation of the cancel icon 813 by the user, it may be determined that the user has confirmed the image area.

In the case where the user negates the image area, step S604 to step S607 may be performed.

In the case where the user confirms the image area, step S609 may be performed.

Fig. 8 (b) shows still another GUI of the electronic device, which is a zoom image interface 820. A viewfinder 821 may be included in the zoom image interface 820. The view finder 821 is used to display a second image obtained after zooming. The second image may also be an image acquired by the second camera in real time, or the second image may be an image obtained by processing the image acquired by the second camera.

Similar to the determination before step S609 is performed, after step S605, it may also be determined whether the user confirms the first target position.

In the case where the display screen displays the first target position on the first image, the position confirmation instruction image may be acquired with the first camera. The confirmation of the first target location by the user may be determined by the user behavior in the location confirmation indication image. The user is recorded in the position confirmation instruction image.

Alternatively, in the case where the display screen displays the first target position on the first image, the negation of the first target position by the user, that is, the first target position is not determined by the user, may be determined according to a manual negation operation by the user.

After step S609, image acquisition may also be performed by using the first camera to obtain a new user behavior image. And storing a user shooting image which is a second image when the user behavior is determined to be the preset shooting behavior under the condition that the user behavior in the new user behavior image is the preset shooting behavior. It should be appreciated that the preset photographing behavior and the preset amplifying behavior may be different behaviors.

In the case where the action of the user in the captured image is a preset photographing behavior, the user photographed image may also be displayed in the viewfinder. That is, the second image acquired by the second camera in real time may not be displayed in the viewfinder.

It should be appreciated that in some embodiments, step S606 may also occur before or simultaneously with step S604. That is, the electronic device in the embodiment of the present application does not limit the order in which the scale indication image for determining the target zoom information and the first detection image for determining the first target position are collected by the electronic device, and the order in which the target zoom information and the first target position are determined.

Through steps S601 to S609, parameters in the shooting process are adjusted according to behaviors, expressions or actions of the user recorded in the image acquired by the first camera, manual operation of the user in the shooting process is reduced, flexibility of control of the user on the shooting process is improved, possibility of misoperation of the user in the shooting process is reduced, risk of falling of the electronic equipment is reduced, and user experience is improved.

It should be appreciated that the above illustration is to aid one skilled in the art in understanding the embodiments of the application and is not intended to limit the embodiments of the application to the specific numerical values or the specific scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or variations can be made, and such modifications or variations are intended to be within the scope of the embodiments of the present application.

The image processing method of the embodiment of the present application is described in detail above with reference to fig. 3 to 8, and the device embodiment of the present application will be described in detail below with reference to fig. 9. It should be understood that the image processing apparatus in the embodiment of the present application may perform the foregoing various image processing methods in the embodiment of the present application, that is, specific working procedures of the following various products may refer to corresponding procedures in the foregoing method embodiments.

Fig. 9 is a schematic diagram of an image processing apparatus provided in an embodiment of the present application.

It should be understood that the image processing apparatus 900 may perform the image processing methods shown in fig. 3 and 6. The image processing apparatus 900 may be located in an electronic device. The image processing apparatus 900 includes: an acquisition unit 910, a processing unit 920 and a display unit 930.

The acquisition unit 910 is configured to acquire a first detection image when the display screen of the electronic device displays a first image, where a user watching the first image is recorded.

The processing unit 920 is configured to process the first detected image to obtain a first target position in the first image.

The processing unit 920 is further configured to zoom in on the first image with the first target position as a center, so as to obtain a second image.

The display unit 930 is configured to display the second image using a display screen.

Optionally, the processing unit 920 is further configured to determine a target distance between a plurality of target nodes of the face of the user according to a scale indication image, where the scale indication image is an image captured with the first image displayed on the display screen and recorded with the user.

The processing unit 920 is further configured to determine a target magnification ratio corresponding to the target distance according to a correspondence between the distance and the magnification ratio.

The processing unit 920 is further configured to zoom in on the first image based on the target zoom scale with the first target position as a center, so as to obtain a second image.

Optionally, the second image is obtained by amplifying the first target area in the first image according to a preset amplifying scale.

The acquisition unit 910 is further configured to acquire a second detection image when the display screen of the electronic device displays a second image, where the second detection image records a user.

The processing unit 920 is further configured to process the second detected image to obtain a second target position in the second image.

The processing unit 920 is further configured to zoom in on a second target area in the second image with the second target position as a center, so as to obtain a third image.

The display unit 930 is further configured to display a third image using a display screen.

The processing unit 920 is further configured to, if the target magnification ratio is greater than or equal to the preset magnification ratio, amplify the first image according to the preset magnification ratio to obtain the second image.

Optionally, the first detection image is collected by a first camera in the electronic device, the first camera and the display screen face the same direction, and the first target position is a position where the user gazes.

Optionally, the processing unit 920 is specifically configured to process the first detected image with a gaze point estimation model to obtain a first initial position, where the gaze point estimation model is a neural network model obtained by training.

The processing unit 920 is further configured to determine the first target position corresponding to the first initial position according to a correspondence between an initial position and a target position, where the correspondence is determined by a plurality of second initial positions and gaze positions corresponding to each second initial position, where the plurality of second initial positions are obtained by respectively processing a plurality of user gaze images using the gaze point estimation model, and the gaze position corresponding to each second initial position is a position on the display screen where the user gazes when the first camera collects the user gaze image for obtaining the second initial position.

Optionally, the first image is a preview image acquired in real time by a second camera in the electronic device.

The processing unit 920 is specifically configured to zoom around the first target position, and the second image is a zoomed preview image.

Optionally, the first detection image is acquired by a sensor in the electronic device, the first orientation of the sensor being different from the second orientation of the second camera.

Optionally, the processing unit 920 is specifically configured to zoom in on the first image to obtain the second image when the user behavior of the user recorded in the user behavior image is a preset zoom-in behavior, where the user behavior image is an image recorded with the user and acquired when the display screen displays the first image.

Optionally, the display unit 930 is further configured to display, with a display screen, an image area, where the second image is located in the first image.

The image processing apparatus 900 is embodied in the form of functional units. The term "unit" herein may be implemented in software and/or hardware, without specific limitation.

For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include application specific integrated circuits (application specific integrated circuit, ASICs), electronic circuits, processors (e.g., shared, proprietary, or group processors, etc.) and memory for executing one or more software or firmware programs, merged logic circuits, and/or other suitable components that support the described functions.

Thus, the elements of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present application also provides a chip comprising a data interface and one or more processors. When the one or more processors execute the instructions, the one or more processors read the instructions stored on the memory through the data interface to implement the image processing method described in the above method embodiment.

The one or more processors may be general purpose processors or special purpose processors. For example, the one or more processors may be a central processing unit (central processing unit, CPU), digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA), or other programmable logic device such as discrete gates, transistor logic, or discrete hardware components.

The chip may be part of a terminal device or other electronic device. For example, the chip may be located in the electronic device 100.

The processor and the memory may be provided separately or may be integrated. For example, the processor and memory may be integrated on a System On Chip (SOC) of the terminal device. That is, the chip may also include a memory.

The memory may have a program stored thereon, which is executable by the processor to generate instructions such that the processor performs the image processing method described in the above method embodiments according to the instructions.

Optionally, the memory may also have data stored therein. Alternatively, the processor may also read data stored in the memory, which may be stored at the same memory address as the program, or which may be stored at a different memory address than the program.

For example, the memory may be used to store a related program of the image processing method provided in the embodiment of the present application, and the processor may be used to call the related program of the image processing method stored in the memory to implement the image processing method of the embodiment of the present application. For example, in the case that a first image is displayed on a display screen of the electronic device, a first detection image is acquired, and a user watching the first image is recorded in the first detection image; processing the first detection image to obtain a first target position in the first image; amplifying the first image by taking the first target position as the center to obtain a second image; and displaying the second image by using the display screen.

The chip may be provided in an electronic device. The first detection image is acquired, which is also understood to be acquired by means of a sensor in the electronic device or by controlling the sensor to acquire the first detection image.

The present application also provides a computer program product which, when executed by a processor, implements the image processing method according to any of the method embodiments of the present application.

The computer program product may be stored in a memory, for example, as a program that is ultimately converted into an executable object file that can be executed by a processor through preprocessing, compiling, assembling, and linking processes.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, implements the image processing method according to any of the method embodiments of the present application. The computer program may be a high-level language program or an executable object program.

Such as memory 1302. The memory 1302 may be volatile memory or nonvolatile memory, or the memory 1302 may include both volatile memory and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

Embodiments of the present application may relate to the use of user data, and in practical applications, user-specific personal data may be used in the schemes described herein within the scope allowed by applicable laws and regulations under conditions that meet applicable legal and regulatory requirements of the country where the application is located (e.g., the user explicitly agrees, practical notification to the user, etc.).

In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance, as well as a particular order or sequence. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, applied to an electronic device, comprising:

under the condition that a display screen of the electronic equipment displays a first image, acquiring a first detection image, wherein a user watching the first image is recorded in the first detection image;

Processing the first detection image to obtain a first target position in the first image;

amplifying the first image by taking the first target position as the center to obtain a second image;

and displaying the second image by using the display screen.

2. The method of claim 1, wherein prior to the zooming in on the first image centered on the first target location, the method further comprises:

determining target distances among a plurality of target nodes of the face of the user according to a proportion indication image, wherein the proportion indication image is an image recorded with the user and acquired under the condition that the display screen displays the first image;

determining a target amplification ratio corresponding to the target distance according to the corresponding relation between the distance and the amplification ratio;

the amplifying the first image with the first target position as the center includes: and amplifying the first image by taking the first target position as the center according to the target amplification ratio so as to obtain the second image.

3. The method of claim 1, wherein the second image is obtained by enlarging a first target area in the first image according to a preset enlargement scale; the method further comprises the steps of:

Under the condition that the display screen of the electronic equipment displays the second image, acquiring a second detection image, wherein the second detection image records the user;

processing the second detection image to obtain a second target position in the second image;

amplifying a second target area in the second image by taking the second target position as a center to obtain a third image;

and displaying the third image by using the display screen.

4. A method according to claim 3, wherein, before the zooming in on the first image centered on the first target location, the method further comprises:

the amplifying the first image with the first target position as the center to obtain a second image includes: and amplifying the first image according to the preset amplification ratio under the condition that the target amplification ratio is larger than or equal to the preset amplification ratio so as to obtain the second image.

5. The method of any of claims 1-4, wherein the first detected image is acquired by a first camera in the electronic device, the first camera being oriented in the same direction as the display screen, the first target location being a location at which the user gazes.

6. The method of claim 5, wherein processing the first detected image to obtain a first target location in the first image comprises:

processing the first detection image by using a gaze point estimation model to obtain a first initial position, wherein the gaze point estimation model is a neural network model obtained through training;

and determining the first target position corresponding to the first initial position according to the corresponding relation between the initial position and the target position, wherein the corresponding relation is determined by a plurality of second initial positions and gaze positions corresponding to each second initial position, the plurality of second initial positions are obtained by respectively processing a plurality of user gaze images by using the gaze point estimation model, and the gaze positions corresponding to each second initial position are positions on the display screen, which are watched by the user when the first camera acquires the user gaze images for obtaining the second initial positions.

7. The method of any of claims 1-4, wherein the first image is a preview image acquired in real time by a second camera in the electronic device;

the amplifying the first image with the first target position as the center to obtain a second image includes: zooming is carried out by taking the first target position as the center, and the second image is a zoomed preview image.

8. The method of claim 7, wherein the first detection image is acquired by a sensor in the electronic device, a first orientation of the sensor being different from a second orientation of the second camera.

9. The method of any of claims 1-4, wherein the zooming in on the first image to obtain a second image comprises: amplifying the first image to obtain the second image under the condition that the user behavior of the user recorded in the user behavior image is preset amplifying behavior, wherein the user behavior image is an image recorded with the user and collected under the condition that the display screen displays the first image.

10. A method according to any one of claims 1-3, wherein prior to said displaying the second image with the display screen, the method further comprises: and displaying an image area by using the display screen, wherein the image area is an area where the second image is located in the first image.

11. An electronic device comprising a processor and a memory, the memory for storing a computer program, the processor for calling and running the computer program from the memory, causing the electronic device to perform the method of any one of claims 1 to 10.

12. A chip comprising a processor and a data interface, the processor being configured to execute instructions read from a memory via the data interface to implement the method of any one of claims 1 to 10.

13. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for implementing the method of any one of claims 1 to 10.