CN113452895A

CN113452895A - Shooting method and equipment

Info

Publication number: CN113452895A
Application number: CN202010225415.4A
Authority: CN
Inventors: 邵保泰; 刘子鸾; 赵智彪; 王军; 刘星
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2021-09-28

Abstract

The application provides a shooting method and equipment. Relate to the artificial intelligence field, concretely relates to computer vision field. The method comprises the following steps: and constructing training data sets aiming at different shot objects, designing and training a super-resolution reconstruction network, and optimizing the image by using the super-resolution reconstruction network. According to the method and the device, the resolution of the image can be improved when the electronic equipment carries out high-magnification (for example, 10-magnification and above) shooting, and the imaging effect is improved.

Description

Shooting method and equipment

Technical Field

The present application relates to the field of image capturing technologies, and in particular, to a capturing method and device.

Background

In some embodiments, zoom capability is a measure of the shooting capability of an electronic device. The zoom technique is generally used to photograph a subject that is far from the electronic device, so that the subject has an effect of zooming in or zooming out when imaging. In the prior art, the imaging effect of the electronic equipment using digital zooming is poor when shooting with high magnification.

Disclosure of Invention

The application aims to provide a shooting method and equipment, which can have a good effect of improving imaging resolution when electronic equipment uses digital zooming to carry out high magnification shooting.

The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.

In one possible design, the electronic device opens a first application in response to a first user input; responding to the second user input, entering a first shooting mode, and displaying a first image; detecting a first object in the first image; outputting the second image in response to a third user input; using a super-resolution reconstruction network for the first object in the second image.

For example, the cell phone opens a camera application in response to a user's operation of clicking on a camera icon. The mobile phone responds to the operation of dragging the magnification control by a user, enters a high magnification shooting mode, and displays a preview image in a viewing interface. The mobile phone detects that the preview image comprises a portrait. And the mobile phone detects the operation of clicking the photographing control by the user, outputs the preview image to the gallery, and uses a super-resolution reconstruction network to the image in the gallery to obtain the high-resolution imaging.

In one possible design, the first shooting mode is to shoot at a magnification of 10 times or more for the electronic device.

For example, a mobile phone uses optical zooming at a low magnification (5 times or less), hybrid zooming at a medium magnification (5 times to 10 times), and digital zooming at a high magnification (10 times or more). The imaging effect of digital zooming is poor. The shooting method provided by the application can be realized at any magnification, but the high magnification processing effect is more obvious, and the resolution of the shot image under the high magnification can be obviously improved.

In one possible design, the first object is at least one of: portrait, text, architecture, moon, beach, blue sky, green plant, or general object.

In one possible design, the super-resolution reconstruction network includes at least one of: a portrait super-resolution reconstruction network, a text super-resolution reconstruction network, a building super-resolution reconstruction network, a moon super-resolution reconstruction network, a beach super-resolution reconstruction network, a blue sky super-resolution reconstruction network, a green plant super-resolution reconstruction network, or a universal super-resolution reconstruction network.

It can be understood that, when a user shoots, the user wants to optimize some common and distinguishable objects to be shot, and the portrait, the text, the building, the moon, the beach, the blue sky, the green plant, etc. are common objects to be shot and have a certain distinction. The general object means a subject including no distinction. Through the design, some distinguishable objects can be specifically optimized, and the requirements of users on common shooting object imaging optimization are met. Therefore, specific super-resolution networks are established for the shot objects with the distinctiveness, and a universal super-resolution reconstruction network is established for the shot objects without the distinctiveness, so that the processing effect is improved.

In one possible design, the training data set of the super-resolution reconstruction network includes at least one set of training data, the training data includes a first training image and a second training image, the first training image is captured by a single lens reflex camera, and the second training image is captured by a mobile phone.

Through the design, the super-resolution reconstruction network trained by the training data set is more suitable for mobile phone shooting.

In one possible design, the processing of the training data set further includes: the training data set is classified according to the first object; converting the format of the training data; the training data is registered; the training data set is used for training the super-resolution reconstruction network.

Through format conversion of training data, the RGB format is converted into the YUV format, and the content in the Y channel is input into the network for training, so that the training efficiency of the super-resolution reconstruction network can be improved.

In one possible design, before the electronic device outputs the second image in response to a third user input, the electronic device further includes: a second object in the first image is detected.

In one possible design, after the electronic device outputs the second image in response to a third user input, the electronic device further includes: using the super-resolution reconstruction network on the first object of the second image; or using the super resolution reconstruction network on the second object of the second image; using the super resolution reconstruction network on the first object and the second object of the second image.

By means of the design, the shooting method provided by the application can process the situation that the shot object comprises a plurality of shot objects with distinctiveness.

In one possible design, before the electronic device enters the first shooting mode in response to the second user input, the method further includes: responding to a fourth user input, entering a second shooting mode, and displaying a third image; in response to a fifth user input, initiating capture of the first video; in response to a sixth user input, ending capturing and outputting the first video; detecting a third object in the first video; using a super-resolution reconstruction network for the third object in the third image.

For example, the fourth user input is a swipe gesture acting on a shooting mode control and an operation to adjust a magnification control, and the camera enters a high magnification recording mode and displays a preview image. When the mobile phone detects that the user clicks the shooting control, the mobile phone starts to shoot the video. After a period of time, the mobile phone detects that the user clicks the shooting control again, stops shooting the video and outputs the video to the gallery. The mobile phone uses a super-resolution reconstruction network for the video in the image library.

In one possible design, the present application provides an electronic device comprising: one or more processors; one or more memories; wherein the one or more memories store one or more computer programs comprising instructions that, when executed by the one or more processors, cause the electronic device to perform a photography method provided herein.

In one possible design, the present application provides a computer-readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the method according to any one of claims 1 to 9.

In one possible design, the present application provides a program product including instructions that, when executed on a computer, cause the computer to perform a photographing method provided by the present application.

In one possible design, the present application provides a graphical user interface on an electronic device, wherein the electronic device has one or more memories and one or more processors to execute one or more computer programs stored in the one or more memories, and the graphical user interface includes a graphical user interface displayed when the electronic device performs one of the photographing methods provided by the present application.

Drawings

Fig. 1 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application;

fig. 2 shows a software structure diagram of an electronic device provided in an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a user interface of an electronic device according to an embodiment of the present application;

FIG. 4A is a schematic diagram illustrating a gesture applied to a user interface of an electronic device according to an embodiment of the present application;

FIG. 4B is a schematic view showing an interface of the electronic device provided by the embodiment of the application when shooting;

5A-5D illustrate schematic diagrams of interfaces for processing a portrait using a super resolution reconstruction network provided by an embodiment of the present application;

6A-6D show schematic diagrams of interfaces for processing text by using a super-resolution reconstruction network provided by an embodiment of the application;

7A-7D illustrate schematic diagrams of interfaces for processing a building using a super-resolution reconstruction network according to an embodiment of the application;

FIG. 8 is a diagram illustrating an interface of a gallery provided by an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating an interface of an electronic device using a super-resolution reconstruction network when recording is performed according to an embodiment of the present application;

FIG. 10 shows a training flow diagram of a super-resolution reconstruction network provided by an embodiment of the present application;

FIG. 11 is a flow chart illustrating a method for reconstructing a super resolution network according to an embodiment of the present application;

fig. 12 shows a flowchart of another method for reconstructing a super-resolution network according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described in detail below with reference to the drawings in the following embodiments of the present application.

In some embodiments, zoom capability is a measure of the shooting capability of an electronic device. The zoom technique is generally used to photograph a subject that is far from the electronic device, so that the subject has an effect of zooming in or zooming out when imaging. Common zoom techniques include optical zoom, digital zoom and compound zoom. The optical zooming realizes the enlargement of a shot object by changing the focal length of the lens, and the mode has no great influence on the imaging quality. The digital zoom does not change the focal length of the lens, but amplifies the shot object by selecting the pixel points of the area to be amplified and by a method such as interpolation/pixel point amplification, which has a great influence on the imaging quality. Hybrid zooming is a combination of the two technologies, and the image quality of the image is between optical zooming and digital zooming.

It can be understood that the distance between the subject and the electronic device, and the magnification factor of the subject required by the electronic device, affect the selection of the zoom mode to some extent. In some embodiments, optical zoom is typically employed for low magnification (exemplary low magnification is 1-5 times), resulting in better imaging. For higher magnification, the optical zoom is not suitable for some electronic devices, such as mobile phones, tablet computers, wearable devices, and the like, because the lens module has a large volume. For medium magnifications (exemplary medium magnifications are 5-15 times), these electronic devices optionally use compound zoom. For high magnifications (exemplary high magnifications are greater than 15 times), these electronic devices optionally use digital zoom techniques.

The application provides a shooting method and equipment, which can obtain a good imaging effect when electronic equipment performs high-magnification long-focus shooting, and can be applied to electronic equipment such as mobile phones, tablet computers, wearable equipment (e.g., watches, bracelets, helmets, earphones, necklaces and the like), vehicle-mounted equipment, Augmented Reality (AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs) and the like, and the electronic equipment comprises at least one camera. The embodiment of the present application does not set any limit to the specific type of the electronic device.

Fig. 1 shows a schematic structural diagram of an electronic device 100. As shown in fig. 1, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution. A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transmit data between the electronic device 100 and a peripheral device. The charging management module 140 is configured to receive charging input from a charger. The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150 and antenna 2 is coupled to wireless communication module 160 so that electronic device 100 can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The display screen 194 is used to display a display interface of an application, such as a viewfinder interface of a camera application. The display screen 194 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, with N being a positive integer greater than 1.

The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, software code of at least one application program (e.g., huaye video application, wallet, etc.), and the like. The data storage area may store data (e.g., captured images, recorded videos, etc.) generated during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as pictures, videos, and the like are saved in an external memory card.

The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyroscope sensor 180B.

The gyro sensor 180B may be used for photographing anti-shake. The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C. The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip phone, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set. The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for identifying the posture of the electronic equipment 100, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus. The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there are no objects near the electronic device 100. The electronic device 100 can utilize the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches. The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, photograph the fingerprint, answer an incoming call with the fingerprint, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, electronic device 100 implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device 100 heats the battery 142 when the temperature is below another threshold to avoid the low temperature causing the electronic device 100 to shut down abnormally. In other embodiments, when the temperature is lower than a further threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100. The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization. Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card may be brought into and out of contact with the electronic device 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195.

It is to be understood that the components shown in fig. 1 are not to be construed as specifically limiting for electronic device 100, and that electronic device 100 may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. In addition, the combination/connection relationship between the components in fig. 1 may also be modified.

Fig. 2 shows a block diagram of a software structure of the electronic device 100 according to an embodiment of the present application. As shown in fig. 2, the software structure of the electronic device 100 may be a layered architecture, for example, the software may be divided into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into five layers, which are an application Layer, an application framework Layer (FWK), a system library and an Android runtime (Android runtime), a Hardware Abstraction Layer (HAL), and a kernel Layer from top to bottom.

The application layer may include a series of application packages. As shown in fig. 2, the application layer may include a camera, settings, a skin module, a User Interface (UI), a three-party application, and the like. The three-party application program may include a gallery, a calendar, a call, a map, a navigation, a WLAN, bluetooth, hua music, hua video, a short message, and the like.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer may include some predefined functions. As shown in FIG. 2, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. Such as prompting for text information in the status bar, sounding a prompt tone, the electronic device 100 vibrating, flashing an indicator light, etc.

The Android runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media libraries (media libraries), three-dimensional graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, composition, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The hardware abstraction layer provides a standard interface for hardware of the hardware layer, and the hardware abstraction layer optionally comprises a display driver, a camera driver, an audio driver and a sensor driver.

The kernel layer is a layer between hardware and software. The inner core layer optionally comprises a display driver, a camera driver, an audio driver and a sensor driver.

The hardware layer optionally includes various types of hardware devices, such as sensors for various functions provided in the sensor module 180.

The sensors of the hardware layer are invoked at the electronic device 100, optionally accessing the sensor interface provided by the hardware abstraction layer, and optionally using the sensor drivers provided by the kernel layer.

The following describes exemplary work flows of software and hardware of the electronic device 100, with reference to a shooting method and device according to an embodiment of the present application.

When the touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The raw input events are stored at the kernel layer. And the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and taking a control corresponding to the click operation as a control of a camera application icon as an example, the camera application calls an interface of an application framework layer, starts the camera application, further starts the camera 193 by calling an interface of a kernel layer driver/hardware abstraction layer, and captures a still image or video through the camera 193.

The following embodiments will take the electronic device 100 shown in fig. 1 or fig. 2 as an example of a mobile phone, and describe the technical solutions provided by the embodiments of the present application with reference to the drawings.

Fig. 3 illustrates an exemplary user interface 301 of a cell phone, which user interface 301 may be displayed on the display screen 194 of the cell phone, according to some embodiments.

In some embodiments, the user interface 301 optionally includes the following elements, or a subset or superset thereof:

● fixed to the top status bar and used to indicate the native state, comprising: one or more signal strength (e.g., mobile network, Wi-Fi) indicators 303, a current time 305, a charge level indicator 307 for indicating the charge level of the battery 142;

● Main interface and application icons on the main interface, including: clock 309, calendar 311, gallery 313, memo 315, file management 317, email 319, music 321, wallet 323, Huanjian video 325, sports health 327, weather 329, browser 331, wisdom life 333, settings 335, recorder 337, application mall 339;

● fixed at the bottom application icon: camera 341, address book 343, phone 345, information 347;

● indicates the indicator 349 of the interface at which it is currently located.

It is understood that the main interface includes a clock 309, calendar 311, gallery 313, memo 315, file management 317, email 319, music 321, wallet 323, Huaqi video 325, sports health 327, weather 329, browser 331, wisdom life 333, settings 335, recorder 337, application mall 339. When the user interface 301 displays other interfaces (non-main interfaces) for displaying the application icons, the indicator 349 indicating the interface where the user is currently located optionally points to the switched interface, the icon fixed in the top status bar optionally does not change, and the application icon fixed in the bottom optionally does not change.

It is understood that the icons are only some examples of the application programs, and may be other representations of the exemplary application programs, and may also be icons corresponding to other application programs.

Fig. 4A shows an operational diagram of activating the camera 341 on a cell phone. The operation 401 of the handset detecting that the user clicks the camera 341 in the user interface 301 shows the user interface 403 after the camera 341 is started in fig. 4B.

User interface 403 in fig. 4B includes a viewing interface 405. The viewfinder interface 405 is a viewfinder interface in the photographing mode. At this point camera 193 has been activated, and camera 193 presents the captured picture in viewfinder interface 405.

The user interface 403 also includes a capture mode control 407 that optionally includes an aperture mode, a night mode, a portrait mode, a photograph mode, a video mode, a professional mode, and more modes. The mobile phone can receive a sliding gesture of the user acting on the mode control 405, and switching of the shooting mode is achieved.

The user interface 403 further comprises a shooting mode indicator 409 for indicating the current shooting mode of the phone, the indicator 409 optionally being fixed. As indicated by indicator 409 in fig. 4B, that the phone is currently in a photographing mode.

The user interface 403 further includes a camera switching control 411, and when the mobile phone includes at least two cameras 193, the camera switching control 411 can receive a click operation of a user, and displays pictures acquired by other cameras 193 in the viewfinder interface 405.

The user interface 403 further includes a photographing control 413, and when the mobile phone detects that the user clicks the photographing control 413, the mobile phone generates a final image according to the preview image in the viewing interface 405. The final imaging is optionally viewed by clicking on the gallery control 415 in the user interface 403, and optionally also by clicking on the gallery application 313 of the user interface 301 in FIG. 3.

The user interface 403 also includes a magnification control 417 for displaying the magnification of the subject in the viewing interface 405. Exemplary magnification is represented by x, such as 1x for one magnification, 2x for 2 magnification, 30x for 30 magnification.

In some embodiments, the magnification control 417 has only a display function, displays the magnification of the subject, detects a pinch gesture (ping) applied by the user to the viewing interface 405, and adjusts the magnification of the subject.

In some embodiments, the magnification control 417 is capable of detecting a user click operation to adjust the magnification, for example, the magnification control 417 displays a 1x magnification (1 x magnification), and upon detecting a user click operation, the magnification control 417 displays a 2x magnification (2 x magnification) to adjust the magnification of the subject.

In some embodiments, the magnification control 417 detects a drag gesture by the user, presents a magnification display bar, and adjusts the magnification of the subject based on the position of the user's finger on the magnification display bar.

In some embodiments, magnification control 417 may be a combination of the above embodiments, and may be an alternative embodiment to the above embodiments. The adjustment magnification may be a combination of the above embodiments, or may be an alternative embodiment other than the above embodiments.

The user interface 403 also includes a scene recognition control 419 that applies a scene classification algorithm, identifies the type of the subject, and automatically adjusts the color and brightness of the subject in the viewing interface 405 according to the type of the subject. An exemplary scene recognition control 419 of the present application is an AI photographic master provided by Huawei Technologies co., Ltd, and is capable of recognizing scenes such as stage, beach, blue sky, green plant, text, portrait, and the like. In some embodiments, scene recognition control 419 is always on. In other embodiments, the scene recognition control 419 can receive user operations to enable manual activation/deactivation, and the scene recognition control 419 is optionally displayed in the user interface 403 (as in fig. 4B), optionally displayed in the settings control 421, and optionally shown in other user interfaces.

The user interface 419 also includes a setup control 421 for setting up the camera 341.

The present application provides a shooting method and device, which use a super-resolution imaging (super-resolution imaging) technology to enable the electronic device 100 to obtain a better imaging effect when performing long-focus shooting (high magnification).

1. Training data set for constructing super-resolution reconstruction network

In some embodiments, a common construction approach for training data sets is to use a first image as a high resolution image and then process the first image to obtain a second image of low resolution. And finally, adding the first image and the second image into a training data set as a group of images. Optionally, the processing is performed by processing the first image using a degradation operator, and superimposing the processed image with the noise/blur. Common image degradation modes include linear motion degradation, defocus degradation, gaussian degradation, defocus blur, and the like.

Wherein the first image is optionally an image that is not actually captured. The training data set obtained by the method can play a certain effect on the super-resolution reconstruction of the non-real-shot image, but because the imaging result of the real-shot image is more complex, the super-resolution reconstruction network obtained by the training of the non-real-shot training data set can not well process the noise and the blur of the real-shot image when the super-resolution reconstruction is carried out on the real-shot image, and more pseudo textures can be generated while the image details are enriched, thereby affecting the imaging quality and reducing the visual perception of a user.

In other embodiments, the first image is a real shot, such as taken by a digital camera with a telephoto lens, the first image is processed using a degradation operator, and noise/blur is superimposed to obtain a second image. And finally, adding the first image and the second image into a training data set as a group of training data. The implementation scheme can improve the processing effect of the super-resolution reconstruction network on the live images to a certain extent and improve the visual perception of users to a certain extent. However, the first image of this method is captured by a digital camera with a telephoto lens, and the imaging modes/effects of the digital camera and the mobile phone are different. The super-resolution reconstruction network trained by the training data set has a limited processing effect on the telephoto shooting of the mobile phone.

The present application proposes a method of obtaining a training data set, wherein a high resolution image is taken by an electronic device (e.g., a single lens reflex, a micro single camera, etc.) having a large-size telephoto lens, and a low resolution image is taken by an electronic device (e.g., a mobile phone) not having a large-size telephoto lens (step 1001). The high resolution image and the low resolution image are added to the training data set as a set of training data. The training data set obtained by the method can enable the trained super-resolution reconstruction network to be more suitable for the telephoto shooting of the mobile phone.

The application also provides that different super-resolution reconstruction networks are adopted for different types of shot objects, and the portrait super-resolution reconstruction network, the text super-resolution reconstruction network, the building super-resolution reconstruction network and the moon super-resolution reconstruction network are respectively used for some special shot objects such as portraits, texts, buildings, moons and the like. Different super-resolution reconstruction networks are established for different shot objects, and each shot object can obtain a better and targeted reconstruction effect. For example, the portrait super-resolution reconstruction network uses the portrait as a training data set for training, and can pay attention to facial features recovery; for another example, the text super-resolution reconstruction network can restore the text, sharpen the text and remove noise; and for example, a building super-resolution reconstruction network can train the arc and linear appearance well. For some subjects without a special type, the reconstruction of the subject is optionally implemented using a universal super-resolution reconstruction network.

In some embodiments, when training for a text super resolution reconstruction network, to obtain a better training data set, the text displayed on the display screen 194 of the capture electronic device 100 may be selected as a high resolution image and the image captured by the cell phone may be selected as a low resolution image. This approach makes the high resolution image closer to the actual image. Can effectively improve the training effect.

The process of processing the training data set will be described below.

Firstly, classifying images in a training data set according to the type of a shot object (step 1003), putting the classified images into the training data set corresponding to the type according to the type of the shot object, normalizing and naming the training data, and setting a label for each type of training data set.

Next, the image format of the training data is converted (step 1005). The high resolution image and the low resolution image obtained according to the above method are both in RGB format. In order to improve the training efficiency of the super-resolution network, the images in the training data set can be converted from an RGB format to a YUV format, then the images in the Y channel in the YUV format images are extracted, and the extracted result is used as training data and added into the training data set. The super-resolution reconstruction network is now trained using images in the Y-channel of the images. In actual use, the preview image in the viewing interface 405 is first converted from RGB to YUV format, the image in the Y channel is output to the super-resolution reconstruction network, the result output by the super-resolution reconstruction network is merged with the U channel and the V channel to obtain the image in YUV format, finally the image in YUV format is converted to RGB format, and the final image is output in RGB format.

The images then need to be registered (step 1007). Due to the influence of factors such as angle change of a view field, drift, exposure time, optical distortion and the like, the final imaging effect of a low-resolution image shot by a mobile phone and a high-resolution image shot by a digital camera is greatly different, and the images need to be registered. In the distortion correction, in some embodiments, the internal reference matrix and the external reference matrix of the mobile phone and the digital camera can be determined through the calibration board, and the distortion correction is performed on the image through the internal reference matrix and the external reference matrix; in some embodiments, when the subject is a text, regular patterns (e.g., rectangles) may be added around the text, a mapping function between the regular patterns may be obtained by detecting the regular patterns in the image at the time of distortion correction, and distortion correction may be performed by the mapping function. In registration, in some embodiments, features of two images can be detected, matching of features between the images is achieved, the images are cut according to the size of the feature area, and two high-resolution images and two low-resolution images with the same size are obtained; in other embodiments, since the size of the image shot by the digital camera is larger than that of the image shot by the mobile phone, the size of the image shot by the mobile phone can be checked, and then the image shot by the digital camera is cut to have the same size after being manually adjusted during shooting.

2. Building super-resolution reconstruction network

After the training data set is obtained, the super-resolution reconstruction network is built. The super-resolution network has a variety of network modalities, and the present application exemplarily illustrates a text super-resolution reconstruction network and a portrait super-resolution reconstruction network. It can be understood that the network structure of the super-resolution network is not limited in the application, and other network structures outside the application can be used for building the super-resolution reconstruction network.

First is a super-resolution reconstruction network for text. The method adopts a two-stage network design for super-resolution reconstruction of the text, and adopts an image detail recovery network (such as a U-Net network) in the first stage for recovering the details of characters in the image.

In some embodiments, the size of the image in the training dataset, mxn (M, N representing the number of pixel points, a positive integer greater than or equal to 1), expands the channel after passing through one layer of convolutional layer; then, carrying out downsampling for P (P is a positive integer more than or equal to 1) times by adopting the convolutional layer, wherein the number of downsampling channels is doubled each time; and then, the feature map is up-sampled by using an anti-convolution layer symmetrical to the down-sampling convolution, and the number of up-sampling channels in each time is half of that before the up-sampling. And reducing the number of channels to 1 through the two convolution layers after upsampling and outputting a result. And adding the feature map obtained by each layer of downsampling with the feature map (feature map) after long sampling with the corresponding scale, and transmitting the extracted feature information into the network output layer and the previous layers.

Compared with the commonly used U-Net network, the improvement point of the algorithm is as follows: the number of convolution layers after up/down sampling is reduced, and the efficiency is improved on the premise of ensuring the model effect; and (3) adding operation is adopted in the feature map of the corresponding scale, so that the matching and the computing efficiency of a neural-Network Processing Unit (NPU) are enhanced.

And in the second stage, a multi-layer residual error network structure is used, high-resolution images are used for superposing noise and/or blur to be used as low-resolution images, and the high-resolution images and the low-resolution images are used as a group of training data to be added into a training data set. And the second stage network can realize character sharpening and background denoising to a certain extent.

Then a super-resolution reconstruction network for the portrait. The network consists of two generators and a plurality of discriminators: the two generators are connected in series, the details of the image are enhanced on different scales, and meanwhile, the results of different scales are judged to be true or false by using a plurality of discriminators, so that an optimization direction is provided for the generators.

After the structures of the super-resolution reconstruction networks for different shot objects are built, the super-resolution reconstruction network is trained (1009) by using the training data set, and a super-resolution reconstruction network model is obtained.

3. Using super-resolution reconstruction networks

The process of performing super-resolution reconstruction by the electronic device 100 will be described below with reference to fig. 10.

Firstly, the scheme for using the super-resolution reconstruction network in real time is explained, the mode has large calculation amount, a user can check the processing result of the super-resolution reconstruction network in real time, and the user experience is good.

The electronic device 100 opens (step 1101) the camera application 341 in response to the first user input, at which point the electronic device 100 invokes the camera 193 and displays the screen captured by the camera 193 in the viewfinder interface 405. Wherein the first user input is optionally a click operation to open the camera application.

The electronic device 100 enters (step 1103) a first photographing mode in response to the second user input, and displays the first image. The electronic device 100 receives an operation of the user adjusting magnification control 417, and is in a high magnification shooting state, for example, 10 times and more magnification. The image captured by the camera 193 is now presented in the viewing interface 405.

The electronic device 100 detects (step 1105) an object in the first image. The scene recognition control 419 of the electronic device 100 is in an on state and the processor 110 of the electronic device 100 recognizes the type of subject in the viewing interface 405. It will be appreciated that the scene recognition control 419 is optionally in a default on state, and is also optionally in an on state upon receiving user input. Optionally, the processor 110 identifies the types of all objects in the subject, for example, when the subject includes a portrait, a building and a text, the processor 110 optionally outputs the subject as the portrait, the building and the text. Optionally, the processor 110 outputs one or more objects according to the number of occupied pixels, for example, the number of pixels occupied by the portrait is the largest among the objects, the object is output as the portrait, for example, the number of pixels occupied by the portrait is the largest among the objects, and the number of pixels occupied by the building is the second, the processor 110 outputs the objects as the portrait and the building. Alternatively, the processor 110 outputs the object according to the distance to the object, for example, when the electronic device 100 includes two or more cameras 193, the electronic device measures the distance between the object and the electronic device 100 according to the parallax of the object in the two cameras, and if the image is closest to the electronic device 100, the processor 110 outputs the object as the image. Alternatively, the processor 110 outputs the subject according to a set priority, for example, when the subject includes a portrait, a building, and a text, the output priority of the portrait is higher than that of the building and the text, and the processor 110 outputs the subject as the portrait. Optionally, the processor 110 outputs the object to be shot according to the user's focusing point, for example, when the object to be shot includes a portrait, a building and a text, the user manually selects the focusing point to be the portrait, and the processor 110 outputs the object to be shot to be the portrait.

The electronic device 100 processes (step 1107) the object using a super resolution reconstruction network to display a second image. The electronic device 100 first performs a format conversion on the object, converts the image from RGB format to YUV format, and extracts the image in the Y channel. Optionally, the electronic device 100 uses the super-resolution reconstruction network in real time, and displays the processed result in the viewing interface 405 in real time. Still alternatively, the electronic device displays the processed image in the gallery 313 using the super-resolution reconstruction network after the user clicks the photo control 413. When the identification capability of the electronic device 100 is sufficient, displaying the processed image in real time can bring better user experience. It can be appreciated that when the electronic device 100 uses the super resolution reconstruction network in real time, it is necessary to combine the Y channel input of the super resolution reconstruction network with the images in the U channel and the V channel and convert the images to RGB format.

In some embodiments, the processor 100 detects the type of subject when the electronic device 100 performs high magnification shooting by the camera 193. When the shot object comprises a special scene, using a super-resolution reconstruction network corresponding to the special scene; and when the special scene is not included in the shot object, reconstructing the universal super-resolution network.

In some embodiments, the processor 100 detects the type of subject when the electronic device 100 performs high magnification shooting by the camera 193. When the shot object comprises a special scene, determining the area of the special scene, and using a super-resolution reconstruction network corresponding to the special scene; the remaining part is retained.

In some embodiments, the processor 100 detects the type of subject when the electronic device 100 performs high magnification shooting by the camera 193. When the shot object comprises a special scene, determining the area of the special scene, and using a super-resolution reconstruction network corresponding to the special scene; and inputting the rest part into a super-resolution reconstruction network.

In some embodiments, the electronic device 100 displays the preview image after being processed by the super resolution reconstruction network in the viewing interface 405. The electronic device 100 outputs (step 1109) the second image in response to the third user input. The electronic device 100 receives a third user input from the user on the photographing control 413, converts the preview image into an image, and places the image in a path corresponding to the gallery 313. The user may view in the gallery 313.

Next, a scheme of using a super-resolution reconstruction network in a gallery is described, which is less computationally intensive but not capable of allowing a user to preview a processed result in real time. The specific implementation process has already been stated in the above steps, and is not described in detail here.

The electronic device 100 opens (step 1201) a first application in response to the first user input. The electronic device 100 enters (step 1203) a first photographing mode in response to the second user input, and displays the first image. The electronic device 100 detects (step 1205) a first object in the first image. In response to the third user input, the electronic device 100 outputs (step 1207) the path along which the first image is located to the gallery 313. The first object in the first image is processed (step 1209) using a super resolution reconstruction network to obtain a final image. The user is viewing the images in the gallery 313 as the final imaging.

One embodiment of the present application for performing super-resolution reconstruction of a portrait is described below in conjunction with a UI interface.

Fig. 4A shows that the electronic apparatus 100 detects a user operation of clicking the camera application 341, and opens the camera application 341.

Fig. 5A shows that the electronic apparatus 100 calls the camera 193 at this time, and displays a screen captured by the camera 193 in the viewfinder interface 405 of the user interface 501. As can be seen, the magnification control 417 shows that the current magnification is 30x, which belongs to high magnification shooting; the scene recognition control 419 is in an open state, and the processor 110 of the cell phone is recognizing whether a special scene is included in the preview image displayed in the viewing interface 405 captured by the camera 193.

Fig. 5B shows that the electronic device 100 recognizes that a portrait is included in the subject, and optionally shows the detected portrait on the user interface 503 through the indication frame 505. The electronic device 100 inputs the portrait area in the indication box 505 into the portrait super-resolution reconstruction network. When the electronic device 100 obtains the output of the network, the output result of the network and the remaining part are merged and displayed as a processed preview image in the viewing interface 405 of the user interface 507 in fig. 5C. It can be seen that after the super-resolution reconstruction is performed, the resolution of the image is greatly improved, and the noise and the blur of the local area are removed.

In some embodiments, the electronic device 100 detects that the user clicks the scene recognition control 419, the scene recognition control 419 stops working, the processor 110 stops recognizing the preview image in the viewing interface 405, the indication box 505 stops displaying, and the preview image in the viewing interface 405 returns from the super-resolution reconstruction network processed preview image shown in fig. 5C to the pre-processed preview image in fig. 5B.

Fig. 5D shows that the electronic equipment 100 receives an operation of clicking the photographing control 413 by the user, converts the preview image in the viewing interface 405 into an image, puts the image in a path corresponding to the gallery 313, and displays a thumbnail of the image in the gallery control 415 of the user interface 509.

An embodiment of the present application for performing super-resolution reconstruction of text is described below with reference to a UI interface.

Fig. 6A shows that the electronic apparatus 100 calls the camera 193 at this time, and displays a screen captured by the camera 193 in the viewfinder interface 405 of the user interface 601. As can be seen, the magnification control 417 shows that the current magnification is 30x, which belongs to high magnification shooting; the scene recognition control 419 is in an open state, and the processor 110 of the cell phone is recognizing whether a special scene is included in the preview image displayed in the viewing interface 405 captured by the camera 193.

Fig. 6B shows that the electronic device 100 recognizes that text is included in the subject, and optionally the detected text is shown on the user interface 603 through an indication box 605, and the indication box 605 indicates that the currently recognized special scene is text. The electronic device 100 optionally inputs the preview image in the indication box 605 into a text super-resolution reconstruction network. When the electronic device 100 obtains the output of the text super-resolution reconstruction network, it is displayed as a processed preview image in the viewing interface 405 of the user interface 607 of fig. 6C. It can be seen that after the super-resolution reconstruction is performed, the resolution of the image is greatly improved, and the display effect of the text is enhanced.

In some embodiments, the electronic device 100 detects that the user clicks the scene recognition control 419, the scene recognition control 419 stops working, the processor 110 stops recognizing the preview image in the viewing interface 405, the indication box 605 stops displaying, and the preview image in the viewing interface 405 returns from the super-resolution reconstruction network processed preview image shown in fig. 6C to the pre-processed preview image in fig. 6B. In other embodiments, the electronic device 100 detects a user operation of clicking the "x" icon in the indication box 605, the processor 110 stops recognizing the preview image in the viewing interface 405, the indication box 605 stops displaying, and the preview image in the viewing interface 405 returns from the preview image subjected to the super-resolution reconstruction network processing shown in fig. 6C to the preview image before processing in fig. 6B.

Fig. 6D shows that the electronic equipment 100 receives an operation of clicking the photographing control 413 by the user, converts the preview image in the viewing interface 405 into an image, puts the image in a path corresponding to the gallery 313, and displays a thumbnail of the image in the gallery control 415 of the user interface 609.

One embodiment of the present application for performing super-resolution reconstruction of a building is described below in conjunction with a UI interface.

Fig. 7A shows that the electronic apparatus 100 calls the camera 193 at this time, and displays a screen captured by the camera 193 in the viewfinder interface 405 of the user interface 701. As can be seen, the magnification control 417 shows that the current magnification is 30x, which belongs to high magnification shooting; the scene recognition control 419 is in an open state, and the processor 110 of the cell phone is recognizing whether a special scene is included in the preview image displayed in the viewing interface 405 captured by the camera 193.

Fig. 7B shows that the electronic device 100 recognizes that the building is included in the subject, and optionally the detected building is shown on the user interface 703 through an indication box 705, where the indication box 705 indicates that the currently recognized special scene is a building. The electronic device 100 optionally inputs the preview image in the instruction box 705 into a building super-resolution reconstruction network. When the electronic device 100 obtains the output of the building super-resolution reconstruction network, it is displayed as a processed preview image in the viewing interface 405 of the user interface 707 in fig. 7C. It can be seen that after the super-resolution reconstruction is carried out, the resolution of the image is greatly improved, and the display effect of the shot building is enhanced.

In some embodiments, the electronic device 100 detects that the user clicks the scene recognition control 419, the scene recognition control 419 stops working, the processor 110 stops recognizing the preview image in the viewing interface 405, the indication box 705 stops displaying, and the preview image in the viewing interface 405 returns from the super-resolution reconstruction network processed preview image shown in fig. 7C to the pre-processed preview image in fig. 7B. In other embodiments, the electronic device 100 detects a user operation of clicking the "x" icon in the indication box 705, the processor 110 stops recognizing the preview image in the viewing interface 405, the indication box 705 stops displaying, and the preview image in the viewing interface 405 returns from the preview image subjected to the super-resolution reconstruction network processing shown in fig. 7C to the preview image before processing in fig. 7B.

Fig. 7D shows that the electronic equipment 100 receives an operation of clicking the photographing control 413 by the user, converts the preview image in the viewing interface 405 into an image, puts the image in a path corresponding to the gallery 313, and displays a thumbnail of the image in the gallery control 415 of the user interface 709.

Fig. 8 shows a schematic diagram 801 of an gallery of the electronic device 100 for storing pictures and/or videos taken. The gallery may be viewed by clicking on the gallery control 415 in the user interface 403, and optionally also by clicking on the gallery 313 of the user interface 301 in FIG. 3. Fig. 8 shows thumbnails of the portrait images taken through the processes of fig. 5A-5D, thumbnails of the text images taken through the processes of fig. 6A-6D, and thumbnails of the architectural images taken through the processes of fig. 7A-7D.

Fig. 9 shows an interface schematic diagram of an electronic device for video recording using a super-resolution reconstruction network according to an embodiment of the present application, where the user interface diagram includes a user interface 901. At this time, the shooting mode control 409 displays that the electronic device 100 is currently in the video recording mode, and the magnification control 417 displays that the current magnification is 30 ×. At this time, the processor 110 recognizes the subject as a portrait, and marks the portrait with the indication box 903. The view interface 905 shows the result of the super-resolution network processing.

In some schemes, the super-resolution reconstruction network may not be used in real time during video recording, but may be used in the map library 313 after the video recording is finished. Optionally a super resolution reconstruction network is used for automatic use and optionally also for receiving user instructions.

The terminology used in the above embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two, or more than two; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

In the embodiments provided in the present application, the method provided in the embodiments of the present application is described from the perspective of an electronic device (e.g., a mobile phone) as an execution subject. In order to implement the functions in the method provided by the embodiment of the present application, the terminal device may include a hardware structure and/or a software module, and implement the functions in the form of a hardware structure, a software module, or a hardware structure and a software module. Whether any of the above-described functions is implemented as a hardware structure, a software module, or a hardware structure plus a software module depends upon the particular application and design constraints imposed on the technical solution.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that a portion of this patent application contains material which is subject to copyright protection. The copyright owner reserves the copyright rights whatsoever, except for making copies of the patent files or recorded patent document contents of the patent office.

Claims

1. A shooting method and equipment are applied to electronic equipment, and the method is characterized by comprising the following steps:

opening a first application in response to the first user input;

responding to the second user input, entering a first shooting mode, and displaying a first image;

detecting a first object in the first image;

outputting the first image in response to a third user input;

processing the first object in the first image using a super resolution reconstruction network.

2. The method of claim 1, wherein the first shooting mode is shooting by the electronic device using a magnification of 10 times or more.

3. The method of claim 1, wherein the first object is at least one of: portrait, text, architecture, moon, beach, blue sky, green plant, or general object.

4. The method of claim 1 or 3, wherein the super-resolution reconstruction network comprises at least one of: a portrait super-resolution reconstruction network, a text super-resolution reconstruction network, a building super-resolution reconstruction network, a moon super-resolution reconstruction network, a beach super-resolution reconstruction network, a blue sky super-resolution reconstruction network, a green plant super-resolution reconstruction network, or a universal super-resolution reconstruction network.

5. The method according to any one of claims 1, 3 or 4, wherein the training data set of the super-resolution reconstruction network comprises at least one set of training data, the training data comprising a first training image and a second training image, the first training image being captured by a single lens reflex camera, and the second training image being captured by a mobile phone.

6. The method of any of claims 1, 3 to 5, wherein the processing of the training data set further comprises:

the training data set is classified according to the first object;

converting the format of the training data;

the training data is registered;

the training data set is used for training the super-resolution reconstruction network.

7. A method according to claim 1, wherein prior to outputting the second image in response to a third user input, further comprising:

a second object in the first image is detected.

8. A method according to claim 1 or 7, wherein, in response to a third user input, after outputting the second image, further comprising:

processing the first object of the second image using the super resolution reconstruction network; or

Processing the second object of the second image using the super resolution reconstruction network;

processing the first object and the second object of the second image using the super resolution reconstruction network.

9. A method as claimed in claim 1, wherein, in response to the second user input, prior to entering the first capture mode, further comprising:

responding to a fourth user input, entering a second shooting mode, and displaying a third image;

in response to a fifth user input, initiating capture of the first video;

in response to a sixth user input, ending capturing and outputting the first video;

detecting a third object in the first video;

using a super-resolution reconstruction network for the third object in the third image.

10. An electronic device, characterized in that the electronic device comprises: one or more processors; one or more memories; wherein the one or more memories store one or more computer programs, the one or more computer programs comprising instructions, which when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-9.

11. A computer-readable storage medium, comprising a computer program which, when run on an electronic device, causes the electronic device to perform the method of any of claims 1-9.

12. A program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-9.

13. A graphical user interface on an electronic device with one or more memories, and one or more processors to execute one or more computer programs stored in the one or more memories, the graphical user interface comprising a graphical user interface displayed when the electronic device performs the method of any of claims 1-9.