CN115567633A

CN115567633A - Photographing method, medium, program product and electronic device

Info

Publication number: CN115567633A
Application number: CN202210173024.1A
Authority: CN
Inventors: 张东; 周建东
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2023-01-03

Abstract

The application relates to the technical field of communication, and provides a shooting method, a medium, a program product and electronic equipment, which can automatically generate a plurality of high-quality or user-expected photos in a video recording scene, and can return the plurality of photos to a gallery application in batches. The method comprises the following steps: the electronic equipment starts to record video and generates video stream; the electronic equipment scores video frames in the video stream; the electronic equipment selects a video frame for algorithm optimization based on the score and generates a plurality of photos; and the electronic equipment transmits the hardware abstract layers of the plurality of photos back to the application program layer in batch through the expansion interface. The method is particularly applied to video recording, previewing or storing scenes.

Description

Photographing method, medium, program product, and electronic device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a shooting method, a shooting medium, a shooting program product, and an electronic device.

Background

With the continuous progress of communication technology, users have higher and higher requirements on the operation convenience of mobile terminals such as mobile phones and tablet computers. For example, users are increasingly demanding on the convenience of taking pictures with mobile terminals.

At present, in the video recording process of a mobile terminal, a user can only take a snapshot manually, and once the opportunity is missed, the user cannot take the snapshot and misses the wonderful moment. In addition, manual operation of a user may shake the mobile terminal during snapshot, so that the definition of a snapshot picture is insufficient, and user experience is poor.

Disclosure of Invention

In view of the above, embodiments of the present application provide a shooting method, a program product, a medium, and an electronic device. According to the technical scheme, the content and the change of the video stream can be automatically analyzed in the process of recording the video by the electronic equipment so as to automatically generate a plurality of photos with high quality and meeting the expectation of a user, and the plurality of photos are batch-fed back to the gallery application to provide wonderful moments for the user.

In a first aspect, an embodiment of the present application provides a shooting method applied to an electronic device, where the method includes: the electronic equipment starts to record video and generates a video stream; the electronic equipment scores video frames in the video stream; the electronic equipment selects a video frame for algorithm optimization based on the score and generates a plurality of photos; the electronic equipment transmits the hardware abstraction layers of the plurality of photos back to the application program layer in batches through the expansion interface. In some embodiments, the video frames selected by the electronic device based on the scores may be photos with higher quality and according with the user's expectation, so as to record the highlight.

In addition, the above extended interface is an interface between a hardware abstraction layer and an application layer of the electronic device, and specifically is an interface between a camera hardware abstraction unit (hereinafter, a camera hardware abstraction unit 406) and a camera application (hereinafter, a camera application 401) in the electronic device. While the extended interface interfaces with an interface that is not in the native system architecture of the electronic device. Therefore, batch return of the photos is supported through the expansion interface, and a user can conveniently and uniformly check a plurality of photos automatically generated in the video recording process.

The plurality of photos may be N JPEG photos, that is, the number of the plurality of photos is N, and may be a preset value, such as 5.

In a possible implementation of the first aspect, the batch returning of the multiple photo hardware abstraction layers to the application layer by the electronic device through the extended interface includes: the electronic equipment stores the multiple pictures in a shared memory of the electronic equipment and obtains handle information, wherein the handle information is used for indicating the storage addresses of the multiple pictures in the shared memory; the electronic equipment transmits the handle information from the hardware abstraction layer back to the camera application of the electronic equipment in the application program layer through the expansion interface; the electronic equipment reads a plurality of photos in batch from the shared memory through the camera application according to the handle information, and stores the photos in batch into the gallery application of the electronic equipment.

In a possible implementation of the first aspect, the electronic device stores the video obtained by video recording in a gallery application, and associates the video with a plurality of photos. The electronic equipment is used for associating the video with the photos, so that the user can search the photos through the video or search the video through the photos.

In a possible implementation of the first aspect, the information of the video includes an identifier that uniquely identifies the video, and the information of each of the plurality of photos carries the identifier, where the identifier is used to associate the video with the plurality of photos.

In a possible implementation of the first aspect, the identifier is a universally unique identifier UUID of the video.

In a possible implementation of the first aspect, the method further includes: the method comprises the steps that the electronic equipment detects a first operation of a user on a gallery application in the electronic equipment; the electronic device displays thumbnails of the video and thumbnails corresponding to a plurality of photos associated with the video in an interface of the gallery application based on a first operation. The first operation may be a user clicking on the "smart multi-shot" album shown in fig. 12A, or the like, to trigger the electronic device to provide the user with a viewing portal of the video and the associated multiple photos.

In a possible implementation of the first aspect, the thumbnail of each of the multiple photos carries a preset identifier, and the preset identifier is used to indicate that the photo is obtained by automatically performing algorithm optimization on a video frame in a video. For example, the preset mark may be a mark of a hexagram shown in fig. 12F below

In a possible implementation of the first aspect, the method further includes: the electronic equipment detects a second operation of the user on the thumbnail of the video; and the electronic equipment displays a video playing interface based on the second operation, and displays thumbnails of the plurality of photos on the playing interface. The second operation may include an operation that the user clicks the thumbnail of the video a shown in fig. 12F, and then the user clicks the original video control shown in fig. 12E.

In a possible implementation of the first aspect, the method further includes: the electronic equipment detects a third operation of the user on the playing interface; and the electronic equipment deletes the video from the gallery or deletes the video and the photos based on the third operation. For example, the third operation is an operation in which the user clicks the delete control shown in fig. 12C.

In a possible implementation of the first aspect, the electronic device stores the plurality of photos into the gallery application based on the detected video recording termination instruction; or the electronic equipment stores the plurality of photos in the gallery application based on the fourth operation of the user on the videos already stored in the gallery application. Therefore, the photos can be saved according to the actual requirements of the user, and resource waste caused by saving unnecessary photos is avoided.

In a possible implementation of the first aspect, the extended interface is a hardware abstraction layer interface definition language HIDL interface.

In a possible implementation of the first aspect, the scoring the video frames in the video stream is implemented by: the method comprises the steps that electronic equipment obtains style information, scene change information and multi-dimensional preset information of a video stream, wherein the style information is used for representing the theme and atmosphere of the video, the scene information divides the video into a plurality of video segments of different categories, and the multi-dimensional preset information is information of multiple dimensions for grading video frames; and the electronic equipment scores the video frames according to the multi-dimensional preset information.

As an example, the style information, the scene change information, and the multi-dimensional preset information are gradually decreased in granularity, and may be acquired at the same or different preset intervals. In addition, the score is obtained according to a multi-dimensional preset information decision, and specifically, algorithm optimization can be performed on a video frame with multi-dimensional preset information when the score of the video frame is greater than a first threshold.

In a possible implementation of the first aspect, the selecting a video frame for algorithm optimization based on a score includes: and the electronic equipment selects a plurality of video frames with the highest scores based on the scores and the scene change information. As an example, each transition represented by the scene change information may output a video frame with the highest score.

In a second aspect, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored, and when executed on an electronic device, the instructions cause the electronic device to perform a shooting method as in the first aspect and any possible implementation manner of the first aspect.

In a third aspect, the present application provides a computer program product, where the computer program product includes instructions for implementing the shooting method in the first aspect and any one of the possible implementations thereof.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory for storing instructions for execution by one or more processors of an electronic device, and a processor for performing the photographing method as in the first aspect and any possible implementation thereof when the instructions are executed by the one or more processors.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIGS. 1A and 1B illustrate interface diagrams in a capture scene, according to some embodiments of the present application;

FIG. 2 illustrates a network architecture diagram of a method of photography, in accordance with some embodiments of the present application;

FIG. 3 illustrates a schematic diagram of a handset configuration, according to some embodiments of the present application;

FIG. 4A illustrates a block diagram of a software architecture of a handset, according to some embodiments of the present application;

FIG. 4B illustrates an architectural framework for a photography method application, according to some embodiments of the present application;

FIG. 5A illustrates a flow diagram of a method of photography, according to some embodiments of the present application;

FIG. 5B illustrates a schematic diagram of a highlight frame identification flow, according to some embodiments of the present application;

FIG. 5C illustrates a schematic diagram of a highlight frame identification flow, according to some embodiments of the present application;

FIG. 6 illustrates an architectural diagram of a photography method application, according to some embodiments of the present application;

FIG. 7 illustrates a flow diagram of a method of photography, in accordance with some embodiments of the present application;

FIG. 8A illustrates a flow diagram of a method of photography, in accordance with some embodiments of the present application;

FIG. 8B illustrates an architectural diagram of a photography method application, according to some embodiments of the present application;

FIG. 8C illustrates a flow diagram of a method of capturing, according to some embodiments of the present application;

FIG. 8D illustrates a flow diagram of a method of capturing, according to some embodiments of the present application;

FIG. 9 illustrates a flow diagram of a method of capturing, according to some embodiments of the present application;

10A-10H illustrate interface diagrams of a photographic process, according to some embodiments of the present application;

FIG. 11 illustrates a schematic diagram of a photo viewing flow, according to some embodiments of the present application;

12A-12I illustrate an interface variation diagram of photo viewing and operational flow, according to some embodiments of the present application;

FIG. 13 illustrates a schematic diagram of a photo saving process, according to some embodiments of the present application;

14A-14D illustrate interface change diagrams of a photo saving process, according to some embodiments of the present application;

15A-15C illustrate an interface change diagram for a process of generating photos for stored video, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, a photographing method, medium, program product, and electronic device.

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The user may want to take some highlights while taking a video, or taking a picture. For example, when a user takes a sports scene of a player at a sporting event, some wonderful moments of motion, i.e., wonderful moments, may occur. At this time, the user may want to capture these pictures through the electronic device in time. Since the photographing in the related art requires a manual operation by a user, there is a case of delay.

In other scenarios, the user may be shooting home video, which is often difficult to predict due to the expression and movement of the character during shooting. During the shooting process, pictures with particularly good expressions and artistic feelings of characters can appear, and users often want to record the wonderful moments in the form of shot pictures. If the user does not take a picture manually in time, the user may miss the wonderful pictures.

Illustratively, referring to FIG. 1A, during the process of capturing a video of a person running, when a user desires to capture a photograph of the person running in a desired pose, the user can click on a snapshot control 203 provided in the video capture interface shown in FIG. 1A, triggering the electronic device to process the current video frame to generate the photograph.

However, the timing of the user manually operating the capturing control 203 may be incorrect and thus missed, or the electronic device 10 may shake due to the manual operation of the user, so that the captured photo is blurred, and the captured photo does not meet the user's expectation. For example, the user may desire to take a photograph of the person shown in FIG. 1A with one arm of the person raising the other arm extending rearward, but manually generate a photograph of the person shown in FIG. 1B with one arm of the person raising the other arm hanging down because the user clicks on capture control 203 at a later time, which is not desirable to the user. In view of this, the embodiment of the present application provides a shooting method, in which an electronic device can obtain multiple high-quality photos meeting the expectations of a user by automatically analyzing the content and changes of a video stream or a preview picture during a process of recording a video or taking a photo, and batch-return the multiple photos to a gallery application to show the photos to the user.

It should be understood that different users have different understanding of the highlight and different evaluation criteria, and the embodiment of the present application is not limited thereto. Meanwhile, the solution of the embodiment of the application is to automatically analyze the video frames through an algorithm, so as to determine the video frames with high scores and generate the photos. Therefore, the scheme provided by the embodiment of the application is suitable for any scene shot by the user and is not limited to the dynamic picture. For example, when a user records a video on a stationary object, the scheme provided by the embodiment of the present application may also be applied.

Fig. 2 illustrates a network architecture diagram of a method of capturing, according to some embodiments of the present application. Referring to fig. 2, a plurality of electronic devices, such as the electronic device 10, the electronic device 20, the electronic device 30, and the like, are included therein. The plurality of electronic devices may communicate with each other through a wireless connection or a wired connection, which is not specifically limited in this embodiment of the application.

Specifically, each of the electronic devices may be a desktop computer, a laptop computer, a palmtop computer, a mobile phone, a tablet computer, and the like, which is not limited in this embodiment of the present application. For example, fig. 2 illustrates an example in which the electronic device 10 is a mobile phone, the electronic device 30 is a tablet computer, and the electronic device 30 is a laptop computer.

In some embodiments, any of the plurality of electronic devices may log in to a pre-registered user account to communicate with other electronic devices. As an example, the user accounts logged in by the plurality of electronic devices are the same, that is, the plurality of electronic devices may be different devices of the same user. The user account is an account of a software system or an application such as a camera application/gallery application.

In a possible application scenario, a plurality of electronic devices (e.g., the mobile phone 10) shown in fig. 2 may each execute the shooting method of the present application, and automatically generate a plurality of photos during the process of recording a video.

In one possible multi-device cooperation scenario, some electronic devices in the plurality of electronic devices shown in fig. 2 may invoke cameras of other electronic devices to cooperatively execute the shooting method of the present application. For example, the mobile phone 10 may call a camera of the tablet pc 20 to record a video and obtain multiple photos at the same time.

In another possible multi-device cooperation scenario, after one electronic device of the multiple electronic devices shown in fig. 2 stores multiple photos generated during video recording, the electronic device may cooperate with other electronic devices to synchronize the recorded videos or photos. For example, in a case that the mobile phone 10 and the tablet pc 20 log in the same user account, when the mobile phone 10 stores or deletes files such as photos and videos in the mobile phone, the mobile phone 10 may trigger the tablet pc 20 to synchronously store or delete corresponding files such as photos and videos.

In the following embodiments, the electronic device 10 is mainly taken as an execution subject of the shooting method, and the electronic device 10 is taken as an example of a mobile phone, so as to describe a specific scheme of the shooting method provided in the embodiments of the present application.

Fig. 3 is a schematic structural diagram of a mobile phone. The handset 10 may include a processor 110, an external memory interface 120, an internal memory 121, a sensor module 180, keys 190, a camera 193, a display 194, and the like. The sensor module 180 may include a pressure sensor 180A, a touch sensor 180K, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the mobile phone 10. In other embodiments of the present application, the handset 10 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. Wherein, the different processing units may be independent devices or may be integrated in one or more processors. For example, the processor 110 is configured to execute the photographing method in the embodiment of the present application.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system. As an example, the memory may have a storage address of a photo generated during recording of a video, and the like cached therein.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the mobile phone 10. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function.

As an example, the handset 10 may be connected to an external memory that may store video taken by the handset and photographs generated during the video taking process. For example, the external memory may be a memory in another handset.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the handset 10 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, and an application program required by at least one function (such as a video capture function, a video playback function, and the like).

In addition, the internal memory 121 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like.

As an example, the internal memory may store a computer program for implementing the photographing method provided by the embodiment of the present application.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The handset 10 determines the intensity of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the mobile phone 10 detects the intensity of the touch operation based on the pressure sensor 180A. The cellular phone 10 may calculate the touched position based on the detection signal of the pressure sensor 180A.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation acting thereon or nearby. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the mobile phone 10 different from the position of the display 194.

By way of example, the mobile phone may detect an operation of the user acting on the display screen through the pressure sensor and the touch sensor, for example, an operation of the user clicking a control, and the like.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The handset 10 may receive key inputs, generating key signal inputs relating to user settings and function controls of the handset 10.

The mobile phone 10 implements display functions through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. In some embodiments, the cell phone 10 may include 1 or M display screens 194, M being a positive integer greater than 1.

The camera 193 is used to capture still images or video. In some embodiments, the handset 10 may include 1 or P cameras 193, P being a positive integer greater than 1.

In the implementation of the application, the camera can collect video pictures, the GPU processes the video pictures collected by the camera, and the display screen displays an interface processed by the GPU. The specific content displayed on the display screen can refer to the description in the following embodiments.

The embodiment of the present application does not particularly limit a specific structure of an execution subject of a photographing method as long as communication can be performed by one photographing method provided according to the embodiment of the present application by running a code recorded with one photographing method of the embodiment of the present application. For example, an execution subject of a shooting method provided in the embodiment of the present application may be a functional module in the mobile phone 10, which is capable of calling a program and executing the program, or a processing device, such as a chip, applied to the mobile phone 10.

The software system of the handset 10 is described below. The software system of the handset 10 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention exemplarily illustrates a software structure of the mobile phone 10 by taking an Android system with a layered architecture as an example.

Fig. 4A is a block diagram of the software structure of the mobile phone 10 according to the embodiment of the present invention.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 4A, the application packages may include a Camera application (Camera App) 401, a gallery application (Gallary App) 402, a calendar, a call, a map, navigation, WLAN, bluetooth, music, video, and other applications.

Among them, the gallery application 401 is used to provide video and photo taking functions, and the gallery application 402 is used to provide video and photo storage and display functions.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 4A, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, a camera framework (camera fwk) 403, an extension framework 404, an encoding framework 405, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and answered, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The phone manager is used to provide the communication functions of the handset 10. Such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to notify download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scrollbar text in a status bar at the top of the system, such as a notification of a running application in the background, or a notification that appears on the screen in the form of a dialog window (e.g., pop-up window). For example, text information is prompted in the status bar, a prompt tone is given, the mobile phone vibrates, and an indicator light flickers.

The camera framework 403 receives a request such as a video recording request from the camera application 401, maintains service logic for the request such as a video recording request to be internally circulated, and sends a final result of the request to the camera application 401.

The extension framework 404 is used to receive a request from the camera application 401 to query handle information of a storage address of a photograph generated in a video recording.

The encoding framework 405 is used to encode the transmitted video data stream.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide a fusion of the 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer includes at least a display driver, a camera driver 407 (which may also be referred to as a camera driver), an audio driver, and a sensor driver.

In addition, the architecture of the mobile phone 10 further includes a Hardware Abstraction Layer (HAL). The hardware abstraction layer is an interface layer between the kernel layer shown in fig. 4A and the hardware of the handset 10, which is intended to abstract the hardware.

Further, based on the interaction of modules such as the camera application 401, the gallery application 402, the camera framework 403, the extension framework 404, and the encoding framework 405 in the software architecture shown in fig. 4A, referring to fig. 4B, a structural framework applied to the shooting method provided in the embodiment of the present application is shown.

The camera system architecture shown in fig. 4B includes: camera application 401, camera framework 403, extension framework 404, encoding framework 405, camera hardware abstraction unit (CameraHAL) 406, camera driver (CameraDriver) 407, and camera hardware (CameraHardWare) 408. The camera hardware abstraction unit 406 is located in a hardware abstraction layer of the architecture of the mobile phone 10, and the camera hardware 408 is located in a hardware layer of the architecture of the mobile phone 10.

The camera application 401 includes a Video Module 4011 and a photo storage interface 4012.

The recording module 4011 is configured to initiate a recording request. For example, the user may make a video recording request corresponding to a click operation on a video recording control provided by the camera application 401.

The photo storage interface 4012 provides a function of accessing the local storage, for example, a stored data item can be added, modified or deleted. For example, the photo storage interface 4012 may be configured to read and store a photo into the gallery application 402 according to handle information of a photo storage address or the like. As an example, the photo storage interface 4012 may be a JPEG storage interface (JPEGStorage).

The camera frame 403 includes therein a camera device (CameraDevice) 4031, a configuration unit 4032, and a camera provision (CameraProvider) 4033.

Among them, the camera device 4031 represents one system camera that is turned on.

The configuration unit 4032 is used to configure a video data stream, such as create a data stream and delete a data stream. Illustratively, the configuration unit 4032 may be a configuration stream (ConfigStream).

The camera provisioning 4033 is used to forward the request to the camera hardware abstraction unit 406 through the native camera HIDL interface.

In particular, camera device 4031, configuration unit 4032, camera supply 4033 are used to cooperatively forward a record request from record module 4011 to camera hardware abstraction unit 406.

The extension frame 404 includes therein a photo interface processing unit 4041. Illustratively, the photo interface processing unit may be a JPEG interface processing unit (jpegpos process). The photo interface processing unit 4041 is configured to provide an extended hardware abstraction layer interface definition language (HIDL) interface, and transmit handle information of the photo storage address optimized by the algorithm from the camera hardware abstraction unit 406 back to the camera application 401. It is emphasized that the photo interface processing unit 4041 described above is used to provide an extended HIDL interface.

The encoding frame 405 includes a media recorder (MediaRecorder) 4051, a media codec (MediaCodec) 4052, and a media merger (mediamixer) 4053.

The media recorder 4051 is configured to start recording video.

The media codec 4052 may obtain an underlying media codec library.

The media merger 4053 is used to synthesize recorded audio or video.

Specifically, the coding framework 405 codes the video stream into MP4 formatted video using a media recorder 4051, a media codec 4052, and a media merger 4053, sometimes referred to directly as MP4 for convenience of description hereinafter. It is to be understood that the Video stream Coding format adopted in the Video stream Coding may be HEVC (High Efficiency Video Coding) Coding standard, h.264/AVC Coding standard, h.266, VP8, VP9, or AV1, and the like, which is not limited in this application.

The camera hardware abstraction unit 406 includes a video pipeline (video pipeline) 4061, a highlight frame decision engine 4062, a frame Buffer (frame Buffer) 4063, a photo pipeline (photo pipeline) 4064, a JPEG encoding unit 4065, and a photo Buffer 4066.

Among them, the video pipeline 4061 is used to provide a video recording function, and generate a video stream (i.e., a video stream during a video recording process). The Pipeline is a set of all resources providing a single specific function, maintains all hardware resources and data flow, and is responsible for maintaining software and hardware resources of the whole Pipeline and processing business logic. That is, the video pipeline 4061 is the collection of all resources that provide video recording functionality.

Highlight frame decision engine 4062 is configured to identify the video frames that score higher in the video stream produced by video pipeline 4061.

Frame buffer 4063 is used to temporarily store video frames in the video stream generated by video pipeline 4061.

The photo pipeline 4064 is used to perform algorithmic optimization on the video frames buffered in the frame buffer 4063. Where photo pipeline 4064 is the collection of all resources that provide the functionality of algorithm optimization. The above algorithm optimization may be a photo processing algorithm, for example, including processes such as High Dynamic Rendering (HDR), single frame noise reduction (mfnr), multi frame noise reduction (sfnr), detail enhancement (de), and the like, but is not limited thereto.

The JPEG encoding unit 4065 is configured to JPEG-encode the video frame generated by the photo pipeline 4064. Specifically, the encoding format is a (Joint Photographic Experts Group, JPEG) format, and the suffix name of the image file corresponding to the encoded photo is jpg or JPEG.

The photo buffer 4066 is used to buffer the JPEG-formatted photo encoded by the JPEG encoding unit 4065, and is sometimes directly referred to as JPEG, JPEG photo, or JPEG image hereinafter for convenience of description.

The camera driver 407 is used to drive the units in the camera hardware 408 to operate.

The camera hardware 408 is a physical implementation part of the camera system, and includes three most important modules, namely, a Lens (Lens) 4081, a photoreceptor (Sensor) 4082, and an Image Signal Processing (ISP) 4083, and also includes auxiliary modules such as a focus motor, a flash, a filter, and an aperture. The lens is used for converging light rays, and the incident light rays are converged on the photoreceptor by utilizing the refractivity of light. The photoreceptor is used for photoelectric conversion, and converts received optical signals into electronic signals through the internal photosensitive element, and then the electronic signals are converted into digital signals through the digital-to-electric conversion module and finally transmitted to the ISP. The ISP is responsible for some algorithmic processing of the digital image, such as white balancing, noise reduction, demosaicing, etc.

Next, based on the architecture diagram shown in fig. 4B, referring to fig. 5A, a specific flow of the shooting method provided in the embodiment of the present application is shown, which includes the following steps:

s501: the handset 10 starts recording.

The handset 10 may initiate recording automatically or in response to user action. In addition, the mobile phone 10 may call the camera hardware 408 through the camera application 401 to start video recording, or call the camera hardware 408 through a third party application to start video recording.

In some embodiments, the user may manually open the camera application 401 of the cell phone 10 and initiate video recording via the camera application 401. Illustratively, referring to FIG. 10A, a desktop home interface of the cell phone 10 is shown, including an icon of the camera response 401. After the user clicks the icon of the camera application 401 shown in fig. 10A, the mobile phone 10 displays the camera main interface of the camera application 401 shown in fig. 10B, and the user clicks the shooting control 303 in the video recording mode in the camera main interface to trigger the mobile phone 10 to start video recording.

In other embodiments, in some voice interaction scenarios or gesture interaction scenarios, the user may trigger the mobile phone 10 to open the camera application 401 and start recording through a preset voice instruction (e.g., "turn on the camera to start recording" voice) or a gesture instruction (e.g., a shake gesture for 5 seconds continuously to the mobile phone 10).

In other embodiments, the user may use third party software, such as a social application, to invoke a camera of the handset 10 to initiate recording.

According to some embodiments of the present application, after the user clicks the shooting control 303 shown in fig. 10B, speaks a preset voice instruction, or makes a video recording gesture instruction, referring to fig. 4B, the camera application 401 may send a video recording request to the camera framework 403 through the video recording module 4011. The camera framework 403 sends a record request to the camera hardware abstraction unit 406. The camera hardware abstraction unit 406 starts recording via the video pipeline 4061, generates a video stream, and buffers video frames in the video stream into the frame buffer 4063. As an example, the video frames may be buffered in a first-in, first-out order into the frame buffer 4063. The capacity of the frame buffer 4063 may be fixed or may be dynamically allocated. In some embodiments, the capacity of the frame buffer 4063 may be 30 frames.

In addition, in some embodiments, the JPEG photos generated during the video recording process may be associated with the recorded video through the video recording identifier.

As an example, when the mobile phone 10 starts recording, a Universal Unique Identifier (UUID) may be generated, and then the UUID is stored in the corresponding MP4 file.

In some embodiments, when the user clicks and operates the shooting control 303 shown in fig. 10B, the camera application 401 may generate a UUID, and add an extended TAG (TAG) to the video recording request sent by the video recording module 4011 to the camera framework 403, so as to carry the UUID to the camera hardware abstraction unit 406.

In addition, in some other embodiments, the mobile phone 10 may call camera hardware (e.g., a camera) in another device to record, for example, another mobile phone or a separate camera module.

S502: the handset 10 scores the video frames and performs algorithm optimization on the top scoring highlight frames.

It is understood that the video frame with higher score is referred to as a highlight frame in the present application, but in other embodiments, the video frame with higher score may also be referred to as an optimal frame, a highlight frame, and the like, and the embodiments of the present application are not limited thereto.

In some embodiments, referring to fig. 4B, for the video frames in the frame buffer 4063, the highlight frame decision engine 4062 may score the video frames and determine the video frames with scores greater than a set threshold as highlight frames.

As an example, the threshold may be preset or adjusted as the number of the highlight frames increases, and the setting process will be described in the embodiments of S5021 to S5032, which is not limited herein. In addition, the specific value of the set threshold can be set according to the requirement, and the application is not limited specifically.

As an example, the highlight frame decision engine 4062 works on the Tiny data, and needs to additionally consider the alignment problem of the Tiny stream with the zero-second delay (ZSL) sequence, so that the frame buffer 4063 stores the RAW (RAW) video frame data of the Tiny stream aligned with the zero-second delay sequence.

In some embodiments, the highlight frame decision engine 4062 may perform highlight frame identification for each frame of the video.

In other embodiments, the highlight frame decision engine 4062 may also score portions of the video frames in the video stream at set frame intervals or time intervals to identify highlight frames. For example, the highlight decision engine 4062 may perform highlight recognition once at a time interval of 10 seconds, or may perform highlight recognition once at a frame interval of 10 frames, such as recognizing whether the video frame at the current time is a highlight.

Specifically, the process of identifying the highlight frame will be described in detail in the following embodiments of S5021 to S5032, and will not be described herein again.

S503: the mobile phone 10 performs JPEG encoding on the highlight frame after algorithm optimization to obtain a JPEG photo carrying the score.

In some embodiments, the photo pipeline 4064 performs algorithm optimization on the highlight frames with higher scores, such as the scores greater than a set threshold, and sends the data after algorithm optimization to the JPEG encoding unit 4065 to encode to obtain JPEG photos, i.e., image files (EXIF).

In some embodiments, referring to fig. 4B, when the JPEG encoding unit 4065 performs JPEG encoding, the score of the highlight frame may be saved into EXIF information of the corresponding JPEG so that the JPEG photograph carries the score.

In addition, in some embodiments, the mobile phone 10 may write the UUID of the recorded video into the EXIF information of each automatically generated JPEG photo, respectively, to enable the correlated viewing of the subsequently recorded video and the corresponding JPEG photo.

Specifically, the method for selecting a video frame to perform algorithm optimization to generate a photo will be described in detail below, and will not be described herein again.

S504: the handset 10 caches N JPEG photos.

As an example, N is a preset value, such as 5, but not limited thereto. It will be appreciated that the N JPEG photographs are the photographs generated for the top N highlight frames.

In some embodiments, the JPEG encoding unit 4065 buffers the N JPEG photos into the photo buffer 4066. The capacity of the photo cache 4065 may be fixed or may be dynamically allocated. In some embodiments, the capacity of the frame buffer 4063 may be N photos.

It is understood that after the number of the JPEG photos cached in the photo cache 4066 reaches the preset number N, as the video is recorded, when the highlight frame decision engine 4062 recognizes the latest highlight frame and generates the latest JPEG photo through the JPEG encoding unit 4065, the lowest-scoring JPEG photo in the photo cache 4066 is replaced by the higher-scoring JPEG photo, so that the N highest-scoring JPEG photos are finally cached in the photo cache 4066.

S505: the mobile phone 10 finishes recording and uniformly transmits back the N JPEG pictures to the gallery application 402.

In the present application, the mobile phone 10 may automatically end the recording, or may end the recording in response to a user operation. In addition, the mobile phone 10 may control the camera hardware 408 to end the video recording through the camera application 401, or may control the camera hardware 408 to end the video recording through a third party application.

In some embodiments, after the recording is complete, the user may manually trigger the storing of the N JPEG photos in the gallery application 402 (i.e., the JPEG photos shown in FIG. 4B). In other embodiments, after the recording is finished, the mobile phone 10 may automatically read the N JPEG photos through the photo storage interface 4012 according to the handle information of the storage addresses of the N JPEG photos to store in the gallery application 402.

In some embodiments, the user may manually control the camera application 401 to end the recording. For example, referring to the video recording interface of the camera application 401 shown in fig. 10D, a user clicking the ending control 202 in the video recording interface may trigger the mobile phone 10 to end recording.

In other embodiments, in some voice interaction scenarios or gesture interaction scenarios, the user may trigger the mobile phone 10 to end the recording through the camera application 401 by another preset voice instruction (e.g. a "stop recording" voice) or gesture instruction (e.g. a shake gesture for the mobile phone 10 for 5 seconds).

In other embodiments, when the user invokes a camera of the mobile phone 10 to start recording using third-party software such as a social application, the user may trigger the third-party software to end recording.

According to some embodiments of the present application, as an example, after the user clicks the end control 202 shown in fig. 10D, speaks a preset voice instruction for ending the video recording, or makes a gesture instruction for ending the video recording, the camera application 401 may send a video recording end request to the camera framework 403 through the video recording module 4011. The camera framework 403 sends an end video recording request to the camera hardware abstraction unit 406, and the camera hardware abstraction unit 406 stops video recording.

In some embodiments, referring to fig. 4B, the photo interface processing unit 4041 in the extension framework 404 uniformly returns the handle information of the storage addresses of the N JPEG photos from the camera hardware abstraction unit 406 to the camera application 401, and reads the N JPEG photos according to the handle information of the storage addresses of the N JPEG photos through the photo storage interface 4012 and stores the N JPEG photos in the gallery application 402, thereby implementing batch return.

It can be understood that the current Android native camera framework does not support batch pass back of photos to applications by the camera hardware abstraction unit. That is, the native system HIDL interface in the framework shown in FIG. 4B does not support batch pass back of N JPEG photos from the camera hardware abstraction unit 406 into the gallery application 402.

In the embodiment of the present application, the photo interface processing unit 4041 in the framework shown in fig. 4B provides an extended HIDL interface, that is, a batch return path for photos from the hardware abstraction layer to the application layer (HAL → APP) is added, and batch return of multiple photos from the camera hardware abstraction unit 406 to the gallery application 402 is supported.

S506: after the mobile phone 10 finishes recording, the recording stream is encoded to obtain an MP4 video, and the MP4 video is stored in the gallery application 402.

In some embodiments, the encoding framework 405 performs video encoding on the video recording stream generated by the video pipeline 4061 in the camera hardware abstraction unit 406 to obtain a video in the MP4 format, and stores the video in the MP4 format (i.e., the MP4 video shown in fig. 4B) in the gallery application 402 after the video recording is finished.

In other embodiments, the encoding format of the photos provided by the present application is not limited to JPEG, and the format of the video is not limited to the above-mentioned MP4 format, but may also be other formats.

As an example, the handset 10 may write a UUID in the extension information of the MP4 video. In some embodiments, when the encoding framework 405 encodes the video stream, the UUID may be written into the extension information of the corresponding MP4 file to associate the automatically generated N JPEG photos. Therefore, the gallery application 401 can analyze the EXIF information in the N JPEG photos to obtain the UUID, and analyze the UUID in the extension information of the recorded video, thereby implementing the association relationship between the recorded video and the automatically generated photos, and facilitating the association check of the user.

Similarly, in some other embodiments, for a scene that is manually photographed, the cell phone 10 may also associate the recorded video with the manually photographed photo through the UUID.

Further, in some embodiments, the highlight frame decision engine 4062 may identify highlight frames based on multiple levels of decision information. Specifically, in the highlight frame identification process, semantic classification processing can be performed on video frames in the video recording process, and the processing process is divided into 4 levels (layers), namely LV0, LV1, LV2, and LV3. Wherein, 4 layers are respectively and continuously refined from semantic level, and decision information with different granularities from coarse to fine and from high abstraction to appearance is provided.

Referring to fig. 5B, a process related to highlight frame recognition in the above embodiment will be described. As shown in fig. 5B, the above S502 specifically includes the following steps S5021 to S5032, and the main execution body is a highlight frame decision engine 4062 in the handset 10.

S5021: the method comprises the steps of acquiring style information and scene information based on a first preset interval aiming at a video stream in a video recording process, wherein the style information is used for representing the theme and atmosphere of the video, the scene information divides the video into a plurality of video segments, and the category of each video segment is different.

The style information and the scene information are decision information of LV0 and LV1, respectively.

The style information and the scene information may be obtained together based on a first preset interval. It is to be understood that the first preset interval is not specifically limited by the present application. For example, scene information and style information are acquired every 10 frames. The granularity corresponding to the scene information may be finer than the granularity corresponding to the style information.

As an example, the style information may be obtained by LV0 identifying the style of the entire video. In particular, the LV0 is used to give the style and atmosphere of the whole video, such as, but not limited to, fun, character, spring festival, christmas, birthday, wedding, graduation, gourmet, art, travel, night scene, sport, nature, atmosphere (joyful/minor impairment/dynamic rhythm/leisure).

As an example, the LV1 is used for semantic scene recognition, a video is divided into several segments, and a category of each segment is given, so as to obtain scene information, for example: mountains, portraits, etc.

Table 1 below gives an example of the definition of LV0 and LV 1.

Table 1:

for example, assume that the highlight frame decision engine 4062 knows that the scene of the video is a birthday by identifying that the content in the video is a birthday cake.

For example, if the highlight frame decision engine 4062 recognizes that the content in the video is a wedding element such as a wedding, a wedding dress, a head dress, a chinese wedding dress, or a wedding car, it can be known that the scene of the video is a wedding.

It should be understood that the above examples of the style information and the scene information provided by LV0 and LV1 are only exemplary descriptions, and the present application is not limited thereto.

The process of acquiring the style information and the scene information may be performed in real time during the recording process. In addition, after the recording is finished, unique style information and scene information can be generated by counting the acquired style information and scene information so as to represent the style and atmosphere of the whole video. Wherein the overall theme of the video may be determined by voting. It is understood that the style and atmosphere of the whole video also apply to each video clip, for example, if the theme of the whole video is a birthday, the theme corresponding to each video clip in the whole video is also about the birthday, and of course, the specific scene category of each video clip may be different.

S5022: and acquiring scene change information based on a second preset interval for the plurality of video clips.

The granularity corresponding to the scene change information is finer than the granularity corresponding to the style information and the scene information. Whether the scene of the video changes or not can be known through the scene change information. Whether a scene change occurs may also be expressed as whether a transition occurs. The scene change information may assist the highlight frame decision engine 4062 in the refinement of the final highlight.

The highlight frame decision engine 4062 may be obtained based on a second preset interval. It is to be understood that the second preset interval is not specifically limited by the present application. For example, scene change information is given every 10 frames. As another example, scene change information is given every 3 frames.

Specifically, the highlight frame decision engine 4062 obtains the scene change information by identifying the content in the plurality of video segments, and can know whether the video segments have a scene change and the frame number of the scene change.

Illustratively, whether a transition occurs or not can be judged through LV2, and scene change information is obtained. I.e. the scene change information is the decision information of LV 2.

In particular, the scene change information may give the video transition location (e.g., the frame number where the transition occurred) and the transition type (e.g., character hero switch, fast mirror motion, scene category change, image content change due to other conditions) to prevent the recommended number of similar scenes from being excessive. Scene change information includes, but is not limited to, one or more of the following changes: the main body (or principal) of the character changes, the image content composition changes greatly, the scene changes at the semantic level, and the brightness or color of the image changes.

Wherein, the character subject changes: when the human subject changes, the transition is regarded as one time. The person principal may define the person with the largest proportion of the image. For example, if the human subject in the t-1 th frame image is a, and the human subject in the t-1 th frame image is a but B is added, the transition is not counted. For example, if the human subject in the t-1 th frame image is a and the human subject in the t-1 th frame image is B, the transition is calculated once.

A large change in the picture content composition is considered a transition. For example, if there is more object movement in the recorded picture resulting in a larger change in picture content (e.g., watching a racing race) when the camera is substantially stationary, a transition is considered. Fast moving mirror (e.g. shake from a to B fast). For another example, when the camera slowly and stably operates the mirror, the picture content generally has no obvious transition boundary, but the transition detection frame rate is 2FPS, for example, when the difference between the content of the t-th frame image and the content of the t-16 th frame image is large, the transition is regarded as one transition. For another example, during fast camera motion, the picture blur is severe, and the content change from frame to frame is large, but the whole motion process can only be regarded as one transition.

A change in image brightness or color is considered a transition. For example, in a concert, the scene content changes little, but the color and brightness of the ambience light changes, and the scene is regarded as a transition.

It should be understood that the above example of scene change information given with respect to LV2 is only an exemplary description, and the present application is not limited thereto.

S5023: and acquiring multi-dimensional preset information based on a third preset interval for each video clip. The multi-dimensional preset information indicates whether the video frame has preset multi-dimensional information. The multi-dimensional preset information is information of multiple dimensions for scoring the video frames.

It should be noted that the video frames containing the information in the above-mentioned multi-dimensional preset information are usually high-quality video frames or video frames containing certain human actions or expressions that are in accordance with the user's expectations.

It is understood that the third preset interval is not specifically limited by the present application. For example, obtaining the multi-dimensional preset information based on the third preset interval means: multi-dimensional preset information is acquired every 3 frames.

For example, assuming that the multi-dimensional preset information corresponds to a level LV3, the granularity of the information of LV3 is finer than that of LV 2. The multidimensional preset information given by LV3 is used to determine the multidimensional preset information. I.e. the multi-dimensional preset information is the decision information of LV3.

As shown in table 2 below, examples of different categories and different dimensions of information included in the multi-dimensional preset information are preset multi-dimensions.

Table 2:

it should be understood that the above examples of the multi-dimensional preset information given with respect to LV3 are only exemplary descriptions, and the present application is not limited thereto.

The process of S5021 to S5023 can be understood as follows: as the video is captured, the highlight frame decision engine 4062 may provide different levels of decision information based on a fixed interval (the interval may be preset, and in a specific implementation, a value of the preset interval may depend on hardware resources), so as to decide to generate a highlight frame for entering into algorithm optimization to generate a JPEG photo.

Illustratively, referring to fig. 5C, the decision information is divided into LV0, LV1, LV2 and LV3 in order from coarse to fine according to the granularity. LV0 gives the overall ambience, i.e. the profile, of the entire video. The LV1 divides the video into video segments of 3 categories (i.e., category 1, category 2, and category 3) on the basis of the LV0 to obtain scene information, for example, the categories include three categories, namely portrait (Portrait), landscape (landscapes), and building (building). The LV2 obtains information of scene change (such as frame number of the occurring transition) on the basis of the LV1, and specifically includes scene change information of 3 transitions. LV3 on the basis of LV2, the following highlight frames are obtained: highlight frame 1 (between the first transition and the second transition), highlight frame 2 (between the first transition and the second transition), highlight frame 3 (between the second transition and the third transition), highlight frame 4 (after the third transition). It can be seen that two frames, in which the expression, the action, and the like set in the multi-dimensional preset information may appear, occur between the first transition and the second transition. Certainly, for the highlight frame 1 and the highlight frame 2 in the same scene, in order to avoid the excessive recommended number of photos in similar scenes, the scores of the highlight frame 1 and the highlight frame 2 can be compared during decision making, the highlight frame with higher score is reserved, and subsequent algorithm optimization is performed to generate the JPEG photos.

The video frame decision engine 4062 decides a video frame to enter algorithm optimization (i.e., a photographing algorithm) based on the above different levels of decision information as follows.

S5024: and acquiring the key frames with the multi-dimensional preset information, and judging whether the scores of the key frames are greater than a first threshold value or not so as to identify the wonderful frames. If yes, entering S5025 by taking the key frame as a highlight frame; if not, the process proceeds to S5026.

In some embodiments, the multiple dimensions related to the multi-dimensional preset information include, but are not limited to, dimensions of a base image quality, an image evaluation, a character related feature, and the like of the video frame. It will be appreciated that to enhance algorithm robustness and versatility, the dimensions of the analysis should be as precise as possible. And then, scoring each video frame in each set dimension, setting a weight for the score of each dimension, and further weighting the score of each video frame in each dimension to obtain the final score of each video frame. And judging whether the frame is a wonderful frame or not according to the final score.

It can be understood that based on the decision information of the different levels, it is helpful to more accurately identify the video frame where the picture containing the multi-dimensional preset information is located, so as to obtain a high-quality photo meeting the user's expectation.

The first threshold is configurable. As an example, the first threshold value indicates that an action in multiple dimensions is detected, i.e. the first threshold value is used to isolate a predefined action, such as a running, jumping, etc. action as shown in table 2.

S5025: and performing algorithm optimization on the key frame.

That is, when the score of a key frame is greater than a first threshold, the algorithm is optimized for the key frame.

It will be appreciated that key frames with scores greater than the first threshold are highlight frames.

The relationship between the highlight score of the key frame and the second threshold value can also be judged for the condition that the highlight score of the key frame is less than or equal to the first threshold value. The effect of introducing the second threshold is to generate at least one copy of the negative.

S5026: and judging whether the score of the key frame is larger than a second threshold value. If yes, the process goes to S5027; if not, the process is ended.

That is, when the score of the key frame is less than or equal to the first threshold, it is determined whether the score of the key frame is greater than the second threshold.

As an example, the second threshold may be initialized when recording starts. The initial threshold of the second threshold may be set to 0 in order to ensure that at least one photograph can be generated.

S5027: the second threshold is updated to the score of the keyframes. The purpose of this is to keep the second threshold value always the latest highest value.

Optionally, the highlight frame decision engine 4062 may further determine whether to trigger algorithm optimization on the video frame based on the scene change information, and perform the following S5028.

S5028: and judging whether the current frame has transition or not based on the scene change information of the current frame.

S5029: if the transition occurs, judging whether the time of the current transition from the last transition is greater than a time threshold value. The time threshold can be expressed as a shortest transition time limit threshold, and can be used for avoiding the generation of too frequent photos in the video recording process caused by frequent transition.

S5030: and when the time of the current transition distance from the last transition is greater than a time threshold, judging that the current transition is the transition which can trigger the algorithm optimization of the video frame. If so, the process proceeds to S5031, otherwise, the process ends.

S5031: when transition occurs, whether a key frame exists in a current transition fragment or not and whether a second threshold value is smaller than a first threshold value or not are judged. If yes, go to S5031; if not, the process is ended.

As an example, if the second threshold is smaller than the first threshold, it indicates that the algorithm optimization of the video frame has not been triggered in the transition segment, and there are key frames in the current transition segment, at this time, the key frame with higher score in these key frames may be used as a highlight frame, and the algorithm optimization is entered, that is, step S5025 is executed. This is done in order to enable at least one picture to be output at the transition.

S5032: and after the video recording is finished, outputting N wonderful frames based on the order of scores from large to small.

It can be understood that after the recording is finished, the N JPEG photos are determined in the order of the scores from large to small, so that the photos with high quality or meeting the user expectation are recommended to the user, and the user experience is greatly improved.

Based on the above process, the wonderful moment can be identified more accurately, so that a wonderful photo with higher image quality can be obtained.

Further, in some other embodiments, the handset 10 may record the highlight in the video by identifying the highlight in the stored video through the highlight decision engine 4062 and performing algorithm optimization on the highlight to generate a photo. The method is equivalent to automatically generating the photos in the video at the later stage, and is further convenient for the user to record some wonderful moments in the video. For the method for recognizing the highlight frame and generating the photo for the stored video, reference may be made to the above embodiments in fig. 5A and fig. 5B, and details are not repeated here.

Additionally, in some other embodiments, when the handset 10 captures a preview, for example, displaying the capture interface shown in fig. 10B, a highlight frame in the video frames of the preview may also be identified by the highlight frame decision engine 4062, and the highlight frame is optimized algorithmically to generate a photo. Thus, the highlight is recorded without recording the video. For the method for recognizing the highlight frame and generating the photo for the stored video, reference may be made to the above embodiments in fig. 5A and fig. 5B, and details are not repeated here.

Next, based on the architecture diagram shown in fig. 4B, with reference to the architecture diagram of the shooting method of fig. 6, a description is given of relevant contents of the extended HDIL interface in the shooting method provided by the embodiment of the present application. In comparison to fig. 4B, fig. 6 does not show some of the modules and adds shared memory 409.

Therein, the Camera framework 403 interacts with the Camera hardware abstraction unit 406 using the Andriod standard video HIDL interface (i.e., native Camera system HIDL interface). Specifically, after the camera application 401 instructs the camera framework 403 to start recording, the camera framework 403 instructs the video pipeline 4061 in the camera hardware abstraction unit 406 to start recording through the Andriod standard recording HIDL interface.

In addition, based on a photo interface processing unit (jpegpos process) 4041 in the extension frame 404, a photo interface processing client (jpegpos process client) 4041a is provided in the camera application 401, and a photo interface processing server (jpegpos process server) 4041b is provided in the camera hardware abstraction unit 406. Specifically, a path for returning handle information of the batch of JPEG photos is formed between the photo interface processing client 4041a and the photo interface processing server 4041b.

The shared memory 409 is used for storing N JPEG photos generated in the video recording, so that the camera application 401 can read the N JPEG photos therefrom.

Next, based on the architecture diagram shown in fig. 6, referring to the method flow shown in fig. 7, a process of returning handle information of an automatically generated photo through an extended HDIL interface in the shooting method provided in the embodiment of the present application is described, and the execution main body is the mobile phone 10. The method shown in fig. 7 comprises the following steps:

s701: the handset 10 begins recording.

In some embodiments, camera application 401, and through the record module camera application 4011, instructs camera framework 403 to start recording, and through camera framework 403, records the HIDL interface based on the Andriod standard instructs video pipeline unit 4061 in camera hardware abstraction unit 406 to start recording. S701 is described with reference to the description of S501 above.

S702: the handset 10 identifies the highlight frame.

S702 is described with reference to the description above regarding S502.

S703: the mobile phone 10 selects N highlight frames from the identified highlight frames to perform algorithm optimization, and performs JPEG encoding to obtain N JPEG photos.

In some embodiments, although the frame buffer 4063 is not shown between S702 and S703 to store the identified video frames with higher scores and JPEG encoding, these processes still exist in the actual implementation process.

S704: the handset 10 writes the N JPEG photos into the shared memory 409.

In some embodiments, the camera application 401 writes the N JPEG photos (e.g., photos A1-A5 shown in fig. 6) generated by the photo pipeline 4064 into the shared memory 409 through the photo processing interface server 4041b.

It can be understood that, when the N JPEG photos are written into the shared memory 409, the photo processing interface server 4041b may obtain the handle information of the N JPEG photos written into the shared memory 409, that is, the handle information of the storage addresses of the N JPEG photos.

S705: after the video recording is finished, the mobile phone 10 returns the handle information of the N JPEG photos from the hardware abstraction layer to the application layer based on the extended HIDL interface.

In some embodiments, the camera hardware abstraction unit 406 batch returns handle information for the automatically generated N JPEG photos to the photo interface processing client 4041a through the photo interface processing server 4041b.

S706: the mobile phone 10 reads the N JPEG photos according to the handle information of the N JPEG photos and stores the N JPEG photos in the gallery application 402.

In some embodiments, the camera application 401 reads the N JPEG photos from the shared memory 409 according to the handle information of the N JPEG photos through the photo interface processing client 4041 a.

S707: the mobile phone 10 processes the video stream generated by the video pipeline 4061 to obtain MP4 with fragmented video.

In some embodiments, although the video stream encoding process and the like are not shown in S707, these processes still exist in the actual implementation process.

It is to be understood that the execution sequence of S701-S707 shown in fig. 7 includes, but is not limited to, the above example, and other execution sequences may also be used, and the execution sequence is not limited herein.

Further, based on the structural diagrams shown in fig. 4B and fig. 6, fig. 8A is a schematic flowchart of a shooting method provided in an embodiment of the present application, where the main execution bodies of the method are respective components in the structural diagrams shown in fig. 4B and fig. 6. Specifically, the method shown in fig. 8A includes:

s801: the record module in the camera application 401 the camera application 4011 instructs the camera framework 403 to start recording.

The recording module 4011 may start to execute step S801, that is, start recording after receiving a user' S click operation on the shooting control 302 when the mobile phone 10 shown in fig. 10A is in the shooting mode and in the smart multi-shot mode. As an example, the record module 4011 may initiate a record request to the camera framework 403, which is not described in detail herein.

S802: the camera framework 403 instructs the camera hardware abstraction unit 406 to start the video pipeline 4061 to start recording.

S803: a video pipeline 4061 in the camera hardware abstraction unit 406 instructs a highlight frame decision engine 4062 to analyze the video frames in real-time.

S804: the highlight frame decision engine 4062 in the camera hardware abstraction unit 406 identifies the highlight frame.

The process of identifying the highlight frame may refer to the related content in the above embodiments, which is not described herein again.

S805: a highlight frame decision engine 4062 in the camera hardware abstraction unit 406 instructs the photo pipeline 4064 to select a highlight frame for algorithm optimization.

S806: the photo pipeline 4064 in the camera hardware abstraction unit 406 generates N JPEG photos that are obtained by performing algorithm optimization on the selected highlight frames.

S807: the photo pipeline 4064 in the camera hardware abstraction unit 406 stores the N JPEG photos into the shared memory 409 via the photo interface processing server 4041b.

S808: the record module 4011 in the camera application 401 instructs the camera framework 403 to end recording.

The recording module 4011 may start to execute S808 after receiving a user click operation on the ending control 202 in the smart multi-shooting mode when the mobile phone 10 shown in fig. 10D is in the shooting mode, that is, the recording is ended.

S809: the record module 4011 in the camera application 401 instructs the video pipeline 4061 in the camera hardware abstraction unit 406 to stop recording.

S810: the camera hardware abstraction unit 406 uploads handle information of the N JPEG photos to the camera application 401 through the photo interface processing server 4041b and the photo interface processing client 4041 a.

It is understood that the camera hardware abstraction unit 406 actively uploads the handle information of the N JPEG photos generated in the video recording to the camera application 401 in S810 described above.

S811: the photo interface processing client 4041a in the camera application 401 reads the N JPEG photos from the shared memory 409 according to the handle information of the N JPEG photos.

The two steps S810, S811, and S812 are parallel steps.

It is understood that the camera hardware abstraction unit 406 in the above S811 may upload handle information of N JPEG photos to the camera application 401 under the trigger of the user.

In some embodiments, after the photo interface processing client 4041a in the camera application 401 receives the handle information of the N JPEG photos, the file of the JPEG photos may be read from the shared memory 409 according to the data of the N JPEG photos, and then thumbnails of the photos may be generated and displayed.

S812: the photo interface processing client 4041a in the camera application 401 saves the N JPEG photos into the gallery application 402.

In some embodiments, the photo storage interface 4012 in the camera application 401 calls the photo interface processing client 4041a, reads the N JPEG photos from the shared memory 409 according to the handle information of the N JPEG photos, and stores the N JPEG photos in the gallery application 402.

In some embodiments, the camera application 401 may automatically save the N JPEG photos to the gallery application 402, in other embodiments the user may manually trigger the saving of the N JPEG photos in the gallery application 402. The specific process can be seen in the embodiments shown in fig. 14A-14D below. For example, the mobile phone 10 in the above embodiment automatically saves the photos A1-A5 in all albums, or the user's operation on the thumbnails of the photos A1-A5 in the video playing interface triggers saving of the photos A1-A5.

It is to be understood that the execution sequence of S801 to S812 is only an illustration, and in other embodiments, other execution sequences may be adopted, and partial steps may be split or combined, which is not limited herein.

Next, referring to fig. 8B, another framework diagram of the mobile phone 10 is shown, and a manual Capture Module (Capture Module) 4013 is further included in the camera application 401, and is used for manually triggering generation of a photo in a video recording process. The framework shown in fig. 8 is different from the framework shown in fig. 6 in that some unit modules for performing automatic photo generation in the video are omitted from the framework shown in fig. 8, and a manual snapshot module 4013 is additionally displayed.

Based on the architecture diagram shown in fig. 8B, a process of associating an MP4 video with a JPEG photo in the shooting method provided by the embodiment of the present application is described with reference to the method flow shown in fig. 8C, and the execution subject is the mobile phone 10. The method shown in fig. 8C includes the following flow of steps:

s801a: the handset 10 starts recording and generates a UUID.

The detailed description of S801a may refer to the related description of S501 above.

S802a: the mobile phone 10 performs video stream coding and writes the video stream coding into the UUID to generate an MP4 video. That is, the UUID is carried in the MP4 video.

The detailed description of S801a may refer to the related description in S506 above.

S803a: during the recording process, the mobile phone 10 starts manual photographing.

In some embodiments, the manual snap module 4013 in the camera module 401 may send a take photo request to the camera framework 403, and the camera framework 403 sends the take photo request to the photo pipeline 4064 in the camera hardware abstraction unit 406.

S804a: the mobile phone 10 performs photographing coding and writes the UUID into EXIF information of the JPEG photograph.

In some embodiments, the photo pipeline 4064 performs algorithm optimization on the current frame, encodes it by the JPEG encoding unit 4065, and writes the UUID into the EXIF information of the JPEG photo.

S805a: the handset 10 passes the JPEG photo back to the gallery application 402.

In some embodiments, the camera hardware abstraction unit 406 may pass the JPEG photos back through the photo storage interface 4012 through the HDIL interface of the camera native system and store them in the gallery application 402.

S806a: the mobile phone 10 analyzes the UUID in the MP4 video and the JPEG photo, and realizes the related viewing of the MP4 video and the JPEG photo.

Specifically, the gallery application 402 parses the UUID in the MP4 video and JPEG photo, and displays the associated MP4 video and JPEG photo.

Further, the method for manually taking a picture in a video recording provided by the present application is described with reference to fig. 8D, and the execution main body is each unit in the mobile phone 10, including the following steps:

s801b: the record module 4011 instructs the coding framework 405 to construct an encoder to start recording.

S802b: the recording module 4011 generates a UUID.

S803b: the video recording module 4011 instructs the camera framework 403 to start video recording and streaming, and transmits the UUID as extension information.

S804b: the camera framework 403 instructs the camera hardware abstraction unit 406 to start recording and streaming, passing the UUID as extension information.

S805b: the camera hardware abstraction unit 406 indicates that the dubbing is complete to the record module 4011.

S806b: the camera hardware abstraction unit 406 feeds the real-time video frames into the encoding framework 405.

S807b: the encoding framework 405 encodes the video frames and writes to the MP4 file.

S808b: the record module 4011 instructs the coding framework 405 to start the encoder.

S809b: the recording module 4011 instructs the coding framework 405 to write the UUID as extension information into the MP4 file.

S810b: during the recording process, the manual snapshot module 4013 instructs the camera hardware abstraction unit 406 to start taking a picture.

In some embodiments, for the current frame, the camera hardware abstraction unit 406 initiates a photograph through the photo pipeline 4064.

S811b: the camera hardware abstraction unit 406 instructs the JPEG encoding unit 4065 to JPEG-encode the current frame and write the UUID into EXIF information of the JPEG file.

S812b: the camera hardware abstraction unit 406 passes the JPEG photo back through the photo storage interface 4012 in the camera application 401.

S813b: the photo storage interface 4012 saves the JPEG photo to a JPEG file, for example, to the shared memory 409.

S814b: the gallery application 402 parses the extension information of the MP4 file to extract the UUID.

S815b: the gallery application 402 parses the EXIF information of the JPEG file to extract the UUID.

S816b: the gallery application 402 enables correlated viewing of MP4 video and JPEG photos according to UUIDs.

The shooting method provided by the embodiment of the application is described in terms of human-computer interaction with reference to the accompanying drawings. It should be noted that, the size, the position, and the style of the control icon in the interface schematic diagram shown below in the embodiment of the present application are only used for example, and do not cause any limitation to the present application.

In some embodiments, the camera application of the handset 10 may default to an intelligent multi-shot mode. In other embodiments, the user may be prompted and guided to turn on the smart multi-shot mode in the case that the camera application of the mobile phone 10 does not turn on the smart multi-shot mode, or the smart multi-shot mode may be turned on manually by the user in the settings of the camera application.

It should be noted that the above-mentioned smart multi-beat means: the mobile phone 10 automatically generates a plurality of photos during the recording process to record the highlight moment. In other embodiments, the smart multi-shot may be named other names, such as one-shot more, automatic snapshot, and excellent instant photo, without affecting the essence of the smart multi-shot function.

It is understood that the mobile phone 10 is turned on to operate the smart multi-shot mode, and in the video recording scenario, the method of the embodiments shown in fig. 5A and 5B is executed to record the highlight. On the contrary, when the mobile phone 10 is turned off in the smart multi-shot mode, the method in the embodiments of fig. 5A and 5B will not be executed in the video-recording scene, and the highlight moment will not be automatically recorded.

Referring to fig. 9, a flowchart of a photographing method is shown, in which the mobile phone 10 is used as an execution subject. In the following embodiments, for convenience of description, the camera application 401 is referred to as a camera application, and the gallery application 402 is referred to as a gallery application. The method comprises the following steps:

s901: the mobile phone 10 opens the camera application to display a main photographing interface of the camera application based on the operation 1 of the user opening the camera application.

Referring to fig. 10A, a desktop home interface of the cell phone 10 is shown, including an icon corresponding to the camera and an icon of the gallery application. After the user clicks the corresponding icon 401 shown in fig. 10A (i.e., operation 1), the mobile phone 10 displays a main camera interface of the camera application shown in fig. 10B, where the main camera interface includes an intelligent multi-shot mode control 301, a video recording mode control 302, a shooting control 303, and the like, the video recording mode control 302 is in a selected state, and the intelligent multi-shot control 301 sets the intelligent multi-shot mode to be in the selected state (i.e., activates the intelligent multi-shot mode).

In some embodiments, the camera application of the cell phone 10 may default to the smart multi-shot mode.

In other embodiments, the mobile phone 10 may prompt and guide the user to turn on the smart multi-shot mode in case the camera application of the mobile phone 10 does not turn on the smart multi-shot mode, or the smart multi-shot mode may be turned on manually by the user in the setup interface of the camera application autonomously.

As an example, before the mobile phone 10 starts to shoot a video, the user opens the main shooting interface of the camera application as shown in fig. 10C, and the smart multi-shot control 301 in the main shooting interface sets the smart multi-shot mode to be in an unselected state (i.e., does not turn on the smart multi-shot mode). At this time, if the user clicks the smart multi-shot control 301 shown in fig. 10C, the mobile phone 10 may display the main photographing interface shown in fig. 10B, in which the smart multi-shot mode control 301 is in a selected state (i.e., the smart multi-shot mode is turned on) to prompt that the user has currently turned on the smart multi-shot mode.

S902: the mobile phone 10 starts recording and automatically generates N pictures (e.g., N JPEG pictures above) based on the user's operation 2 of starting recording.

As an example, after the user clicks (i.e., operates 2) on the photographing control 303 shown in fig. 10B, the mobile phone 10 may display a video photographing interface as shown in fig. 10D. The video capture interface shown in fig. 10D differs from the video capture interface shown in fig. 1 in that the video capture interface shown in fig. 10D additionally displays a smart multi-shot mode control 304 in a selected state to prompt the user that the smart multi-shot mode has currently been turned on. Then, the mobile phone 10 can identify the N photos with the highest scores that are automatically generated in the process of shooting the video, so as to realize intelligent multi-shot without manual snapshot. Meanwhile, the mobile phone 10 further supports the user to manually trigger the mobile phone 10 to process the current frame and generate an image by using the snapshot control 303, so as to realize manual snapshot.

In addition, the video capture interface shown in FIG. 10D may include some functionality controls, a pause/continue control 201, an end control 202, and a snap control 203. Of course, in practical applications, other controls may also be included, such as the zoom control 204 shown in FIG. 10D. Wherein the end control 202 is used to end the current video capture process. The pause/continue control 201 is used to display a shooting icon when video shooting is paused, and the user clicks the shooting icon to continue the current video shooting process. The snapshot control 203 is used to automatically generate a photograph without pausing and ending the current video capture process.

In some embodiments, in the smart multi-beat mode, the number N is predefined (e.g., N = 5). In other embodiments, the number N may also be customized in a setting menu of the camera by the user, which is not specifically limited in this embodiment of the application.

S903: the handset 10 displays automatically generated thumbnails of the photos during the video recording process.

It can be understood that, in order to facilitate the user to timely know whether the manually captured photo meets the user's expectations, the mobile phone 10 can timely display the manually captured photo to the user during the manual capturing process.

Additionally, in other embodiments, the cell phone 10 may not display automatically generated photographs during the video recording process.

Referring to fig. 10E, after a photo is automatically generated in the smart multi-shot mode during the process of shooting a video by the mobile phone 10, the mobile phone 10 may float and display the thumbnail of the photo A1 that is just automatically generated in the video shooting interface shown in fig. 10E.

S904: the mobile phone 10 executes the operation instructed by operation 3 based on operation 3 of the user for the thumbnail of the photograph on the video capture interface.

In some embodiments, when the mobile phone 10 displays the photo A1 in a floating manner, the user may slide and operate the thumbnail of the photo A1 to the right (operation 3) as shown in fig. 10F to cancel the floating display of the photo A1, continue the current shooting and the subsequent generation of the photo, and save the photo A1 (i.e., the action indicated by operation 3).

In addition, in some embodiments, if the mobile phone 10 is dissatisfied with the photo A1 while the thumbnail of the photo A1 is displayed in a floating manner, the user may slide the thumbnail of the photo A1 upward (i.e., operation 3) to delete the photo A1 and cancel the display of the thumbnail of the photo A1 as shown in fig. 10G, and the mobile phone 10 will not save the photo A1 (i.e., the action indicated by operation 3).

It is understood that the specific operation of the user on the photo A1 is only an example, and may also be any other operation that can be implemented, and this is not specifically limited by the embodiment of the present application.

In some embodiments, the user can end the video capture by clicking the end control 302 in any of the video capture interfaces described above in fig. 10D-10G, and the cell phone 10 can save the video (e.g., video a) and save a preset number of photos generated during the capture (e.g., five photos A1-A5). Furthermore, if the user wants to view the captured video a and the generated photo, the user may exit the camera application and return to the desktop main interface of the mobile phone 10, and the schematic diagram of the main interface may refer to the interface schematic diagram shown in fig. 10A.

In other embodiments, instead of clicking the end control 302 in any of the video capturing interfaces in fig. 10D to fig. 10G in the video capturing interface, the user performs a top-stroke operation at the bottom of the video capturing interface as shown in fig. 10H, automatically stops the current video capturing, and exits the camera application to return to the desktop main interface of the mobile phone 10, where the schematic diagram of the main interface may refer to the interface schematic diagram shown in fig. 10A.

S905: the cell phone 10 stores the recorded video (e.g., video a) in the gallery application and the automatically generated N photographs (e.g., photographs A1-A5) in the gallery application. In this manner, a user may view recorded videos and automatically generated photographs in a gallery application.

Next, for a scheme that the mobile phone 10 finishes recording and checks the automatically generated N photos in the gallery application, referring to fig. 11, a method for the user to check and operate the automatically generated photos is provided, and the execution subject is still the mobile phone 10, including the following steps:

s1101: the mobile phone 10 opens the gallery application to display the gallery main interface based on the operation 4 of opening the gallery application by the user.

It will be appreciated that in some embodiments, the cell phone 10, after automatically generating the photos, may automatically store the photos in a gallery application, such as an "all photos" album and a separate "smart multi-shot" album stored in the gallery application, to facilitate the user's viewing of the manually-captured photos.

As an example, after the user clicks on the icon of the gallery application shown in fig. 10A, as shown in fig. 12A, the mobile phone 10 displays a gallery home interface including a "camera" album, an "all photos" album, a "video" album, and a "smart multi-shot" album.

The "camera" album stores photos taken by the mobile phone 10 in the photographing mode, that is, thumbnails of the photos are displayed on a preview interface of the "camera" album in reverse order of time, and a first thumbnail in the preview interface is a thumbnail of a latest shot photo. The "video" album stores videos shot by the mobile phone 10 in the video shooting mode by default, that is, the preview interface of the "video" album displays thumbnails of the videos in reverse order of time, and the first thumbnail in the preview interface is the thumbnail of the video shot most recently. The 'all photos' photo album stores all videos and photos that the mobile phone 10 has taken, that is, a preview interface of the 'all photos' photo album displays thumbnails of the videos or the generated photos in reverse time order, and a first thumbnail in the preview interface is a thumbnail of the latest taken video or photo and the generated photo. The photos already generated by the mobile phone 10 are stored in the "smart multi-shot" photo album, that is, the thumbnail of the generated photos is displayed in a preview interface of the "smart multi-shot" photo album according to a time reverse order, and a first thumbnail in the preview interface is a thumbnail of the newly generated photo.

S1102: the mobile phone 10 displays the stored automatically generated photo A1 based on the user operation 5 to thumbnail the photo A1.

As an example, when the user clicks on the "smart multi-shot" album shown in fig. 12A (i.e., the first operation), the mobile phone 10 may display a preview interface of the "smart multi-shot" album shown in fig. 12B, where the preview interface includes thumbnails of the photos A1-A5. Further, after the user clicks on the thumbnail of the photo A1 shown in fig. 12B (i.e., operation 5), the mobile phone 10 may display a preview interface of the photo A1 shown in fig. 12C, where the preview interface further includes controls for sharing, deleting, editing, and more.

Additionally, as an example, after the user clicks on the more controls shown in fig. 12C, the cell phone 10 may display an original video control linking the original video on the preview interface of photo A1 as in fig. 12D. After the user clicks the original video control, the mobile phone 10 may analyze the UUID in the EXIF information of the photo A1, search for a video including the UUID in the extension information according to the UUID, and further display a playing interface of the video a shown in the following text 12E. Namely, the user can jump to the playing interface of the related video through the preview interface of the automatically generated photo, so that the user can conveniently view the related video and the automatically generated photo.

As an example, when the user clicks on the "all photos" album or the "videos" album shown in fig. 12A, the first thumbnail displayed in the preview interface displayed on the mobile phone 10 is the thumbnail of the latest captured video.

For example, the user clicks on the "all photos" album shown in fig. 12A, and the cell phone 10 may display a preview interface as shown in fig. 12E, the first thumbnail of which is the thumbnail of the most recently taken video a. After the user clicks the thumbnail of the video a shown in fig. 12F, the mobile phone 10 displays a video playing interface of the video a as shown in fig. 12E, where the video playing interface further includes an original video extension control. After the user clicks the original video control shown in fig. 12E, the mobile phone 10 may analyze the UUID in the extension information of the video, and search for the photos (i.e., photos A1 to A5) containing the UUID in the EXIF information according to the UUID, and may further display a video playing interface shown in fig. 12G, and display respective thumbnails of the photos A1 to A5 related to the video a. Further, after the user clicks on the thumbnail of the photograph A1 shown in fig. 12G (i.e., operation 5), the preview interface of the photograph A1 shown in fig. 12C may be displayed.

In addition, in the preview interface of the "all photos" photo album shown in fig. 12F, the second thumbnail to the sixth thumbnail in the thumbnails displayed in reverse order of time are thumbnails of the photos A1 to A5, respectively, and the upper right corners of the thumbnails of the photos A1 to A5 are displayed with a star-shaped identifier respectively, which is used to identify that the photos are automatically generated in the video recording process and are different from the photos obtained in the normal photographing mode. Of course, the identifier of the automatically generated photo is not limited to the above example, and may also be in other forms, which is not specifically limited in this embodiment of the application.

S1103: the mobile phone 10 deletes the photo A1 based on the operation 6 of the user on the automatically generated photo A1.

As an example, after the user clicks the delete control shown in fig. 12C, the mobile phone 10 may display a pop-up box as shown in fig. 12H, wherein the pop-up box includes a prompt message of "whether to delete video", a selection box of "delete related wisdom multiple shots", and a determination control and a cancel control. I.e. the associated option in the bullet box is

Wherein, if the user does not select the selection box "delete relevant smart multi-shot" and clicks on the determination control, the cell phone 10 deletes only the video a from the gallery application, and does not delete the relevant photos (i.e., photos A1-A5). If the user selects the selection box "delete related wisdom multiple shots" and clicks on the decision control, the cell phone 10 deletes the video A and all related photos (i.e., photos A1-A5) from the gallery application. If the user clicks the cancel control, video A will not be deleted.

S1104: the mobile phone 10 locally deletes the photo A1 based on the operation 7 of the user on the automatically generated photo A1, and sends an instruction message to the tablet computer 20, where the instruction message is used to instruct the synchronous tablet computer 20 to delete the photo A1.

As another example, after the user clicks the delete control shown in fig. 12C, as shown in fig. 12I, the mobile phone 10 may display a pop-up box, which includes a prompt message of "whether to delete the video", and further includes selection boxes "delete the local related smart multi-shot", selection boxes "delete the cloud related smart multi-shot", and a determination control and a cancellation control. If the user selects the selection box "delete native related wisdom multi-shot" and clicks on the decision control, the cell phone 10 deletes the video A and all related photos (i.e., photos A1-A5) from the gallery application of the cell phone 10. If the user selects the selection box "delete the relevant smart multi-shot of the mobile phone", selects the selection box "delete the relevant smart multi-shot of the cloud" and clicks the determination control, the mobile phone 10 deletes the video a and all the relevant photos (i.e., the photos A1 to A5) from the gallery application of the mobile phone 10, and initiates an instruction to the tablet computer 20, so that the tablet computer 20 deletes the video a and all the relevant photos (i.e., the photos A1 to A5) from the gallery application.

It will be appreciated that after the cell phone 10 deletes the video a and photos A1-A5 from the gallery application, thumbnails of these videos and photos will no longer be displayed in the preview interface of the photo album of the cell phone 10.

Next, referring to fig. 13, for a scheme that the mobile phone 10 finishes recording, and the mobile phone stores the automatically generated N photos in the gallery application under the trigger of the user, there is provided a method for the user to store and operate the automatically generated photos, where the execution subject is still the mobile phone 10, and the method includes the following steps:

s1301: the cell phone 10 displays a video preview interface for the video, including a thumbnail of the automatically generated photograph, based on the user's operation 8 on the video (e.g., video a).

S1302: the cell phone 10 saves the automatically generated photo in the gallery application based on the user operation 9 to thumbnail the automatically generated photo.

As an example, as shown in fig. 14A, a preview interface of an album, that is, a preview interface of an "all photos" album is displayed for the mobile phone 10. This preview interface is different from the preview interface of the "all photos" album shown in fig. 10A described above in that it does not include thumbnails of photos related to video a.

Further, after the user clicks the first thumbnail (i.e., the thumbnail of video a) in the preview interface shown in fig. 14A, the video playing interface displayed by the mobile phone 10 is shown in fig. 14B. The video playback interface is similar to the video playback interface shown in FIG. 10B, except that user manipulation of the thumbnails for photos A1-A5 in the video playback interface shown in FIG. 14A can guide the user in whether to select to save the automatically generated photos in the gallery application.

In some embodiments, after the user performs a long press operation on any one of the thumbnails in the photos A1-A5 in the video playing interface shown in fig. 14B, the mobile phone 10 may display the interface shown in fig. 14C. Fig. 14C shows an interface including a selection box corresponding to each thumbnail, and a save control and a cancel control. After the user selects all of the thumbnails corresponding to the automatically generated pictures A1-A5 via these selection controls, the cell phone 10 may save the pictures in a gallery, such as an "all pictures" album and an "intelligent multiple shot" album in the gallery. Similarly, after the user manually saves the automatically generated pictures A1 to A5 in the gallery, the thumbnails corresponding to the automatically generated pictures A1 to A5 may be displayed on both the preview interface of the "all photos" album and the preview interface of the "smart multi-shot" album, specifically referring to the preview interface of the "all photos" album shown in fig. 10A and the preview interface of the "smart multi-shot" album shown in fig. 10A.

In other embodiments, after the user clicks on the thumbnail of photo A1 in the video playback interface shown in fig. 14B, the cell phone 10 may display a preview interface of photo A1 as shown in fig. 14D. The preview interface for this photo A1 is similar to the preview interface shown in FIG. 12C, described above, except that a save control has been added. After the user clicks on the save control, the cell phone 10 may save the photo A1 in the gallery application. Similarly, the user can operate the thumbnails corresponding to the photos A4, A2, and A3 shown in fig. 14B to save the photos. In this way, automatically generated photos can be stored according to the actual needs of the user.

Further, in some possible implementations, the mobile phone 10 may generate the relevant photos later on for the existing videos in the gallery application, such as videos that do not use the smart multi-shot mode auto-photo.

In some embodiments, the mobile phone 10 may capture a video B without an automatically generated photo during the capturing process, and after the user clicks on the thumbnail of the video B shown in fig. 10A, the mobile phone 10 may display a video playing interface of the video B as shown in fig. 15A. The video playing interface comprises sharing, editing, more controls and the like. After the user clicks on the more controls shown in fig. 15A, the cell phone 10 may display a smart multi-shot control as shown in fig. 15B. Further, after the user clicks the smart multi-shot control, the mobile phone 10 may identify the video frame with higher score for algorithm optimization to generate a photo of the video B, and display the generated photos below the video B as shown in fig. 15C. Therefore, the user can conveniently record the wonderful moment in the existing video.

Embodiments of the present application further provide a computer-readable storage medium, which includes computer instructions, and when the computer instructions are executed on the above-mentioned electronic device, the electronic device is caused to perform various functions or steps that are performed by the above-mentioned method embodiment (for example, by various devices in the mobile phone 10).

Embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to perform the functions or steps performed by the mobile phone 10 (e.g., the devices in the mobile phone 10) in the above method embodiments.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-Only memories (CD-ROMs), magneto-optical disks, read-Only memories (ROMs), random Access Memories (RAMs), erasable Programmable Read-Only memories (EPROMs), electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or tangible machine-readable memories for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet to transmit information in an electrical, optical, acoustical or other form of propagated signals. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A shooting method is applied to electronic equipment, and is characterized by comprising the following steps:

the electronic equipment starts to record video and generates a video stream;

the electronic device scoring a video frame in the video stream;

the electronic equipment selects a video frame for algorithm optimization based on the score and generates a plurality of photos;

and the electronic equipment transmits the photos from the hardware abstraction layer to the application program layer in batch through the expansion interface.

2. The method of claim 1, wherein the electronic device batch passes the plurality of photos from the hardware abstraction layer back to the application layer via the extended interface, comprising:

the electronic equipment stores the multiple photos in a shared memory of the electronic equipment and obtains handle information, wherein the handle information is used for indicating storage addresses of the multiple photos in the shared memory;

the electronic device returns the handle information from the hardware abstraction layer to a camera application of the electronic device in the application layer through the extension interface;

and the electronic equipment reads the multiple photos in batch from the shared memory through the camera application according to the handle information and stores the multiple photos in batch into a gallery application of the electronic equipment.

3. The method of claim 2,

and the electronic equipment stores the video obtained by video recording into the gallery application and associates the video with the plurality of photos.

4. The method according to claim 3, wherein the information of the video includes an identifier uniquely identifying the video, and the information of each of the plurality of photos carries the identifier, and the identifier is used for associating the video with the plurality of photos.

5. The method of claim 4, wherein the identifier is a Universally Unique Identifier (UUID) of the video.

6. The method according to any one of claims 3 to 5, further comprising:

the electronic equipment detects a first operation of a user on a gallery application in the electronic equipment;

the electronic equipment displays thumbnails of the videos and thumbnails corresponding to the multiple photos associated with the videos in an interface of the gallery application based on the first operation.

7. The method of claim 6, wherein the thumbnail of each of the plurality of photos carries a preset identifier, and the preset identifier is used to indicate that the photo is obtained by performing automatic algorithm optimization on video frames in the video.

8. The method of claim 7, further comprising:

the electronic equipment detects a second operation of the user on the thumbnail of the video;

and the electronic equipment displays a playing interface of the video based on the second operation, and displays thumbnails of the photos on the playing interface.

9. The method of claim 8, further comprising:

the electronic equipment detects a third operation of the user on the playing interface;

and the electronic equipment deletes the video from the gallery or deletes the video and the photos based on the third operation.

10. The method of claim 6, wherein the electronic device saves the plurality of photos to the gallery application based on the detected end video recording instruction;

or the electronic equipment stores the multiple photos into the gallery application based on a fourth operation of the video stored in the gallery application by the user.

11. The method according to any one of claims 1 to 10, wherein the extended interface is a hardware abstraction layer interface definition language, HIDL, interface.

12. The method of claim 11, wherein scoring the video frames in the video stream is performed by:

the electronic equipment acquires style information, scene change information and multi-dimensional preset information of the video stream, wherein the style information is used for representing the theme and atmosphere of the video, the scene information divides the video into a plurality of video clips of different categories, and the multi-dimensional preset information is information of a plurality of dimensions for grading video frames;

and the electronic equipment scores the video frames according to the multi-dimensional preset information.

13. The method of claim 12, wherein selecting a video frame for algorithm optimization based on the score comprises:

and the electronic equipment selects a plurality of video frames with the highest scores based on the scores and the scene change information.

14. A computer-readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the photographing method of any one of claims 1 to 13.

15. A computer program product, characterized in that it comprises instructions for implementing a shooting method according to any one of claims 1 to 13.

16. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor for performing the photographing method according to any one of claims 1 to 13 when the instructions are executed by one or more processors.