CN112771612B

CN112771612B - Method and device for shooting image

Info

Publication number: CN112771612B
Application number: CN201980012490.8A
Authority: CN
Inventors: 赵杨; 那柏林; 孙新江; 刘理; 仇芳; 黄枭
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2022-04-05
Anticipated expiration: 2039-09-06
Also published as: WO2021042364A1; CN112771612A

Abstract

The application provides a method and a device for shooting images. The method comprises the following steps: determining a first snapshot mode in a plurality of preset snapshot modes according to the captured multi-frame images; determining a snapshot frame image corresponding to the first snapshot mode in the captured multiple frames of images to be evaluated by using an evaluation strategy corresponding to the first snapshot mode or an evaluation strategy corresponding to the first snapshot category in the first snapshot mode; wherein the evaluation policy used is one of a plurality of preset evaluation policies. Therefore, the output snapshot frame image is more in line with the actual shooting scene, and therefore an ideal snapshot effect is obtained.

Description

Method and device for shooting image

Technical Field

The present application relates to the field of image processing technology, and more particularly, to a method and apparatus for capturing an image.

Background

The intelligent snapshot is an important photographing function of the current intelligent terminal. The intelligent terminal with the intelligent snapshot function is started, and multiple frames of images to be selected can be scored based on a certain scoring rule, so that the image with the highest score is selected and recommended to the user. However, when the intelligent terminal scores the image to be evaluated, the scoring rule is single, and the intelligent terminal usually pays attention to the face information and ignores other information. Even if no human face is detected in the image to be evaluated, only the optical flow information of the front frame and the back frame is used as a grading basis. Such an evaluation system does not have universality, and therefore, the recommended optimal frame may not be ideal, for example, for a picture including an object moving at a high speed, the above method cannot recommend a highlight image at the moment of motion to a user, and the capturing effect is not ideal. Therefore, it is highly desirable to develop a snapshot scheme that is flexible and suitable for more scenes.

Disclosure of Invention

The application provides a method and a device for shooting images, so that the captured images can better accord with actual shooting scenes.

In a first aspect, a method of capturing an image is provided. The method comprises the following steps: determining a first snapshot mode in a plurality of preset snapshot modes according to the captured multi-frame images; determining a snapshot frame image corresponding to the first snapshot mode in the captured multiple frames of images to be evaluated by using an evaluation strategy corresponding to the first snapshot mode; the evaluation policy is one of a plurality of preset evaluation policies.

Therefore, according to the embodiment of the application, the snapshot mode can be determined according to the actual shooting scene by presetting a plurality of different snapshot modes and corresponding evaluation strategies. And may select one evaluation policy corresponding to the first snap-shot mode from among a plurality of evaluation policies preset to determine a snap-shot frame image using the evaluation policy. Therefore, the obtained snapshot image is more suitable for actual shooting scenes, an ideal snapshot effect is favorably obtained, flexibility is improved, and the method is suitable for more scenes.

With reference to the first aspect, in certain implementations of the first aspect, the plurality of snapshot modes include one or more of: expression snapshot mode, closed photo snapshot mode, motion snapshot mode, multi-person motion snapshot mode, pet snapshot mode, and landscape snapshot mode.

It can be seen that a plurality of different snapshot modes and corresponding evaluation strategies are preset to be suitable for different shooting scenes, so that the obtained snapshot image conforms to the actual shooting scene.

It should be understood that the above-listed snapshot modes are only examples and should not constitute any limitation to the present application. The present application is not limited to what the above-described plurality of capturing modes specifically include.

With reference to the first aspect, in certain implementations of the first aspect, each of the plurality of capturing modes corresponds to at least one evaluation strategy of the above preset plurality of evaluation strategies, and each evaluation strategy includes one or more scoring parameters for image scoring and a mode weight of each scoring parameter. The method for determining the snapshot frame image corresponding to the first snapshot mode in the captured multiple frames of images to be evaluated by using the evaluation strategy corresponding to the first snapshot mode comprises the following steps: calculating the grade of each frame of image to be evaluated in a plurality of frames of images to be evaluated by using one or more grade parameters in one of at least one evaluation strategy corresponding to the first snapshot mode and the mode weight of each grade parameter; and determining a snapshot frame image corresponding to the first snapshot mode in the multiple frames of images to be evaluated according to the multiple scores of the multiple frames of images to be evaluated.

Therefore, different scoring parameters can be assigned to different snapshot modes, and each scoring parameter can be weighted differently, so that scoring results obtained by scoring the same image based on different snapshot modes are different. And after the first snapshot mode is determined, selecting an evaluation strategy corresponding to the first snapshot mode to score multiple frames of images to be evaluated, so as to determine the captured frame images. The acquired snapshot frame image is combined with the evaluation strategy corresponding to the first snapshot mode, so that the requirement of the first snapshot mode can be met, and the actual shooting scene is met.

With reference to the first aspect, in certain implementations of the first aspect, the snap frame image has a highest score among the plurality of images to be evaluated.

After the images to be evaluated are scored based on the evaluation strategy corresponding to the first snapshot mode, the selected images with the highest scores are the images which are selected from the multiple frames of images to be evaluated and most meet the requirements of the first snapshot mode, and certainly the images most accord with the actual shooting scene.

With reference to the first aspect, in some implementations of the first aspect, scoring parameters included in different evaluation strategies corresponding to different snapshot modes are the same, and mode weights included in the different evaluation strategies are different.

The different evaluation strategies described herein include different mode weights, which may specifically mean that the mode weights applied to the same scoring parameter in the different evaluation strategies are different. And, under the condition that the evaluation strategy comprises a plurality of scoring parameters, the mode weight applied by different evaluation strategies to at least one scoring parameter is different.

That is, under different capturing modes, different weights can be applied to the same scoring parameter in combination with different requirements of interest of different capturing modes. For example, in the motion capture mode and the expression capture mode, different weights can be applied to the scoring parameter of the expression intensity. A lower weight may be applied in the motion capture mode and a higher weight may be applied in the expression capture mode. Conversely, different weights may be applied to the scoring parameter of pose height. In the motion capture mode, a higher weight may be applied, while in the expression capture mode, a lower weight may be applied.

Thus, the different evaluation strategies for different snap-shot modes may each comprise different mode weights corresponding to the same scoring parameter.

It should be understood that the above examples are for illustrative purposes only and should not be construed as limiting the present application in any way.

With reference to the first aspect, in certain implementations of the first aspect, each snapshot mode includes one or more snapshot categories, and each snapshot category corresponds to one evaluation policy; in at least one evaluation policy corresponding to the first capturing mode, each evaluation policy includes one or more scoring parameters corresponding to the first capturing mode and a mode weight of each scoring parameter and a category weight corresponding to one capturing category.

The determining a first snap-shot mode among a plurality of preset snap-shot modes further comprises: and determining a first snapshot category in the first snapshot mode according to the multi-frame image.

The calculating the score of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more scoring parameters in one of at least one evaluation strategy corresponding to the first snapshot mode and the mode weight of each scoring parameter comprises the following steps: and calculating the grade of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more grading parameters corresponding to the first snapshot mode and the mode weight of each grading parameter and the class weight of each grading parameter corresponding to the first snapshot class.

In order to find the snapshot frame image which can best meet the shooting scene, the method and the device not only provide the evaluation strategy corresponding to the snapshot mode, distribute different grading parameters and mode weights for different snapshot modes, but also further provide the class weight corresponding to the snapshot class in the snapshot mode. Namely, different snapshot categories in the snapshot mode are further refined, and details concerned by the different snapshot categories are further weighted higher. Therefore, the selected snapshot frame image can meet the requirement of the first snapshot mode, the snapshot category of the shot object is also considered, and the image at the wonderful moment of snapshot can be better shown.

With reference to the first aspect, in some implementations of the first aspect, scoring parameters included in different evaluation policies corresponding to different snapshot categories are the same, and category weights included in the different evaluation policies are different.

The different evaluation strategies described herein include different category weights, which may specifically mean that the category weights applied to the same scoring parameter in the different evaluation strategies are different. And, under the condition that the evaluation strategy comprises a plurality of scoring parameters, the different evaluation strategies have different category weights applied to at least one scoring parameter.

That is to say, under different snapshot categories in the same snapshot mode, different weights can be applied to the same scoring parameter in combination with different requirements concerned by different snapshot categories. For example, in the motion capture mode, the focus of motion is different between jumping and shooting, and jumping is more focused on the leg bending angle, so that when the capture category is jumping, higher weight is applied to the leg bending angle; while shooting is more concerned with the arm flexion angle, so when the snapshot category is shooting, a higher weight is applied to the arm flexion angle.

Thus, different evaluation strategies for different snap-shot categories may each include different category weights corresponding to the same scoring parameter.

Different weights may be applied to the scoring parameter expression intensity. A lower weight may be applied in the motion capture mode and a higher weight may be applied in the expression capture mode. Conversely, different weights may be applied to the scoring parameter of pose height. In the motion capture mode, a higher weight may be applied, while in the expression capture mode, a lower weight may be applied.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: calling at least one detection model corresponding to the first snapshot mode to perform image recognition on multiple frames of images to be selected so as to output recognition results; a value of one or more scoring parameters is determined based on the recognition result.

In an embodiment of the application, the image to be evaluated is subjected to image recognition by using at least one detection model. The at least one detection model may include, for example, one or more of a face attribute detection model, a body frame detection model, a scene recognition model, a pose point estimation model, and a motion detection model. Different points of interest of the image can be detected through the detection models, so that the numerical value of each scoring parameter is determined according to the identification result.

The detection model may be obtained by machine learning training, for example. In one possible design, the detection model may be a model embedded in a neural-Network Processing Unit (NPU). This is not a limitation of the present application.

Optionally, when the first capturing mode is a motion capturing mode or a multi-person motion capturing mode, the at least one detection model includes a posture estimation model and a motion detection model.

Optionally, the first snapshot mode is an expression snapshot mode or a contract snapshot mode, and the at least one detection model includes a face attribute detection model.

It should be understood that the above-listed detection models corresponding to different capturing modes are only examples, and should not constitute any limitation to the present application. For example, when the capturing modes are different, the same plurality of detection models may be called to perform image recognition on the image to be evaluated. In the scoring process, different weights can be applied to each scoring parameter according to different snapshot modes, so that similar effects are achieved.

With reference to the first aspect, in certain implementations of the first aspect, the determining, from the captured multi-frame image, the first capturing mode among a plurality of preset capturing modes includes: and under the video recording mode or the preview mode, determining a first snapshot mode in the multi-function snapshot modes according to the multi-frame images.

That is to say, the method provided by the application can be applied to the intelligent snapshot mode and can also run synchronously with other modes. For example, in the video recording mode, if it is detected that the image satisfies the trigger condition of the first snapshot mode, the intelligent snapshot mode may be simultaneously run in the background, and the first snapshot mode is entered. For another example, in the preview mode, if it is detected that the image satisfies the trigger condition of the first capture mode, the first capture mode may be automatically enabled. Therefore, the equipment can be automatically switched among a plurality of modes, and is favorable for obtaining an ideal snapshot frame image.

With reference to the first aspect, in certain implementations of the first aspect, the determining, from the captured multi-frame image, the first capturing mode among a plurality of preset capturing modes includes: the method includes performing pattern detection on a captured multi-frame image based on a first frame rate to determine a first snap-in mode among a plurality of preset snap-in modes. The calling of at least one detection model corresponding to the first snapshot mode for image recognition of multiple frames of images to be selected comprises the following steps: calling at least one detection model corresponding to the first snapshot mode, and carrying out image recognition on multiple frames of images to be selected at a second frame rate; wherein the first frame rate is less than the second frame rate.

That is, before determining the first capturing mode, mode detection may be performed based on a lower frame rate. This approach may be applied to the video recording mode or the preview mode described above. A lower frame rate may be used for mode detection before entering the smart snap mode. Once the first snap-shot mode is determined, i.e. the smart snap-shot mode is entered, a higher frame rate may be used for image recognition. Therefore, the mode detection can be carried out based on the low frame rate before the intelligent capturing mode is not entered, and the power consumption caused by the high frame rate is saved.

Of course, in the embodiment of the present application, the pattern detection and the image recognition may be performed at the same frame rate. For example, in the smart snap mode, the mode determination is made at a higher frame rate. And after entering the first capturing mode, the image recognition is still performed in a higher mode. This is not a limitation of the present application.

In addition, the specific value of the frame rate is not limited in the present application.

With reference to the first aspect, in certain implementations of the first aspect, after determining the first snap-shot mode, the method further includes: determining a second capturing mode which is a capturing mode different from the first capturing mode among the plurality of capturing modes, based on the newly captured multi-frame image; and switching to a second snapshot mode.

The detection of newly captured images may also continue after the first snap-shot mode is entered. The switching to the second snap-shot mode may be automatic upon detecting that the newly captured image satisfies a trigger condition of another snap-shot mode, such as the second snap-shot mode. Optionally, the switching to the second capturing mode includes: and switching to a second snapshot mode under the condition that the running time of the first snapshot mode exceeds a preset protection period.

In order to avoid switching back and forth among multiple snapshot modes and avoid false touch, a protection period can be preset for each snapshot mode. In the guard period, even if it is detected that the newly captured image satisfies the trigger condition of another snapshot mode, mode switching is not performed. After the guard period is exceeded, mode switching may be performed if it is detected that the newly captured image satisfies a trigger condition of another snap-shot mode.

In a second aspect, an apparatus for capturing an image is provided, which includes various modules or units for performing the method of any one of the possible implementations of the first aspect.

In a third aspect, there is provided an apparatus for capturing images, comprising a processor, a memory, the memory being configured to store a computer program, the processor being configured to call up and run the computer program from the memory, so that the apparatus for waking up a screen performs the method for capturing images in the first aspect and various possible implementations thereof.

Optionally, the number of the processors is one or more, and the number of the memories is one or more.

Alternatively, the memory may be integral to the processor or provided separately from the processor.

In a fourth aspect, an electronic device is provided, which comprises the apparatus for capturing an image according to the second or third aspect.

In a fifth aspect, there is provided a computer program product comprising: computer program (also called code, or instructions), which when executed, causes a computer to perform the method of any of the possible implementations of the first aspect described above.

In a sixth aspect, a computer-readable medium is provided, which stores a computer program (which may also be referred to as code or instructions) that, when executed on a computer or at least one processor, causes the computer or the at least one processor to perform the method of any one of the possible implementations of the first aspect.

In a seventh aspect, a chip system is provided, where the chip system includes a processor, and is configured to support the chip system to implement the functions according to any one of the possible implementations of the first aspect.

Drawings

Fig. 1 is a schematic diagram of an electronic device provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for capturing images provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a cell phone interface provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a method of capturing images provided by another embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a method of capturing images provided by yet another embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of a method of capturing images provided by yet another embodiment of the present application;

fig. 7 is a schematic block diagram of an apparatus for capturing an image according to an embodiment of the present disclosure.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings. The method for shooting the image provided by the embodiment of the application can be applied to electronic devices such as a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and the like, and the embodiment of the application does not limit the specific type of the electronic devices at all. The apparatus for capturing an image according to the embodiment of the present application may be configured in the above-listed electronic devices, or may be configured in the above-listed electronic devices. This is not a limitation of the present application.

Fig. 1 shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110. The processor 110 may include one or more processing units. For example: the processor 110 may include one or more of a Central Processing Unit (CPU), a neural Network Processing Unit (NPU), an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), and a baseband processor. The different processing units may be separate devices or may be integrated into one or more processors.

For example, the controller may be a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

The NPU is a neural-network (NN) processor, which processes input information quickly by referring to a biological neural network structure, for example, by referring to a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face detection, body frame detection, scene detection, gesture point detection, motion detection, and the like.

In the embodiment of the present application, one or more detection models, such as one or more of a face attribute detection model, a pose estimation model, a motion detection model, a body frame detection model, a scene detection model, and the like, which are described below, may be embedded in the NPU. Each detection model can be obtained by algorithm training based on machine learning. For example, the training is performed based on a support vector machine (VSM), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), or the like. It should be understood that the present application is not limited to the particular manner of training.

Each detection model may correspond to a processor in the NPU; alternatively, each detection model may correspond to a processing unit in the NPU, and the functions of multiple detection models may be implemented by multiple processing units integrated in a processor. This is not a limitation of the present application.

The NPU may also have a communicative coupling with one or more other processors in processor 100. For example, the NPU may have a communication connection with a GPU, ISP, application processor, and the like. This is not a limitation of the present application.

Optionally, the electronic device 100 further comprises a memory 120. The memory 120 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the memory 120. The memory 120 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. Further, the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

In some possible embodiments, a memory may be provided in the processor 110. For example, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

Of course, the memory may also exist separately from the processor 110, as shown by memory 120. This is not a limitation of the present application.

Optionally, the electronic device 100 further comprises a transceiver 130. In addition, to further improve the functionality of the electronic device 100, the electronic device 100 may further include one or more of an input unit 160, a display unit 170, an audio circuit 180, a camera 190, a sensor 101, and the like, which may be further coupled to a speaker 182, a microphone 184, and the like.

The electronic apparatus 100 implements a display function by the GPU, the display unit 170, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display unit 170 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display unit 170 is used to display images, videos, and the like. The display unit 170 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a mini led (miniled), a Micro led (Micro led), a Micro OLED (Micro-OLED), a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the electronic device 100 may include one or more display units 170.

The electronic apparatus 100 may implement a photographing function through the ISP, the camera 190, the video codec, the GPU, the display unit 160, and the application processor, etc.

The ISP is used to process the data fed back by the camera 190. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 190.

The camera 190 is used to capture still images or moving video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. In some embodiments, electronic device 100 may include one or more cameras 190.

For example, in the method of capturing images provided herein, the camera 190 may be used to capture images and display the captured images in the capture interface. The photosensitive element converts the collected optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for relevant image processing.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The application processor outputs a sound signal through an audio device such as a speaker 182, etc., or displays an image or video through the display unit 170. Optionally, the electronic device 100 may further include a power supply 150 for supplying power to various devices or circuits in the terminal device.

It should be understood that the electronic device 100 shown in fig. 1 is capable of implementing the processes of the method embodiments shown in fig. 2 and 4-6. The operations and/or functions of the respective modules or units in the electronic device 100 are respectively for implementing the corresponding flows in the above-described method embodiments. Reference may be made specifically to the description of the above method embodiments, and a detailed description is appropriately omitted herein to avoid redundancy.

It should be understood that fig. 1 exemplarily shows each module or unit in the electronic device and a connection relationship between the modules or units for convenience of understanding only, but this should not constitute any limitation to the present application. The present application is not limited to the specific modules, units and their connection relationships included in the electronic device.

The method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings. Fig. 2 is a schematic flowchart of a method 200 for capturing an image according to an embodiment of the present disclosure. The method 200 provided in fig. 2 may be performed by an electronic device or a processor in an electronic device. Hereinafter, for convenience of description, embodiments of the present application will be described with an electronic device as an execution subject.

The various steps in the method 200 shown in fig. 2 are described in detail below. As shown, method 200 may include steps 210 through 240. In step 210, a first snap-shot mode is determined among a plurality of preset snap-shot modes based on a captured multi-frame image.

Specifically, the electronic device may periodically detect the captured image, and determine a snapshot mode suitable for the current shooting according to the detection result. For the sake of distinction and explanation, the snapshot mode applicable to the current shooting is taken as the first snapshot mode. It should be understood that the images captured by the electronic device may be stored in a cache. The cache may be, for example, a part of a storage space in a camera module in the electronic device, or may exist independently of the camera module, which is not limited in this application.

The electronic device may continuously retrieve a plurality of frames of images from the buffer. The plurality of frames of images may be input to one or more detection models. The electronic equipment can call one or more detection models to detect the multi-frame images, and determines the first snapshot mode from a plurality of preset snapshot modes based on the detection result output by the detection models. Based on the determination of the first snap-shot mode, the electronic device may enable the first snap-shot mode.

Wherein the preset plurality of snap-shot modes include, by way of example and not limitation, one or more of: expression snapshot mode, closed photo snapshot mode, motion snapshot mode, multi-person motion snapshot mode, pet snapshot mode, and landscape snapshot mode.

The above-described plurality of capturing modes are defined as smart capturing modes in the embodiments of the present application. That is, the first snapshot mode is a smart snapshot mode. The electronic equipment can enter an intelligent capturing mode in advance, and then a first capturing mode is determined in multiple preset capturing modes according to a captured multi-frame image; the first snap mode may also be automatically determined based on detection of a captured image in a photo mode or a video mode. This is not a limitation of the present application.

The preset multiple capturing modes may specifically be that the electronic device pre-stores an evaluation policy corresponding to each capturing mode in the multiple capturing modes. When the electronic equipment determines to use one of the capturing modes, the electronic equipment can call a corresponding evaluation strategy to evaluate the captured multiple frames of images to be evaluated. The process of evaluating multiple frames of images to be evaluated by the electronic device using the evaluation strategy will be described in detail in conjunction with step 220, and is omitted here for the sake of brevity.

The specific process of determining the first capture mode from the multiple frame images is described in detail below by way of specific examples.

For example, in a photographing mode or a video recording mode, the electronic device may call one or more detection models of a face attribute detection model, a body frame detection model, and a scene recognition model, detect captured multi-frame images, and determine a first snapshot mode according to detection results output by the detection models.

The current intelligent terminal generally has a photographing function and a video recording function, and the face attribute detection model, the human body frame detection model and the scene recognition model are configured in the intelligent terminal. Therefore, in the photographing mode or the video recording mode, the electronic device may detect the image by calling an existing model to determine the first capturing mode.

For another example, in the smart snap-shot mode, the electronic device may detect the captured multi-frame image by calling one or more of the face attribute detection model, the pose estimation model, and the motion detection model, and determine the first snap-shot mode according to the detection result output by each detection model.

The above-listed face attribute detection model, body frame detection model, scene recognition model, pose estimation model, and motion detection model may all be models trained based on machine learning algorithms. Different models are defined based on different functions. Based on different functions, the human face attribute detection model can be further divided into a human face characteristic point detection model, an open-close eye detection model and the like. This is not a limitation of the present application. The names of the detection models are merely exemplary for understanding, and this application does not exclude the possibility of replacing the above-listed detection models with other names to achieve the same or similar functions.

It should be understood that the specific functions of these detection models are also implemented by a processor executing its corresponding computer instructions. The number and the form of the processors for realizing the detection models are not limited. In one possible design, the face attribute detection model, the body frame detection model, the scene recognition model, the pose estimation model, and the motion detection model may be embedded in the NPU.

The determination of the first capturing mode may be determined based on a detection result of a certain detection model on the image, or may be determined based on detection results of a plurality of detection models on the image. When the electronic device invokes multiple detection models to determine the first snap-shot mode, the multiple detection models may run simultaneously or alternately. The electronic equipment can comprehensively consider the detection result of the image according to a plurality of detection models. When the detection result of the one or more detection models on the image meets the trigger condition of a certain capturing mode, the capturing mode can be determined as the first capturing mode. The specific process of the electronic device determining the first snap-shot mode by invoking one or more detection models is described below by way of a number of examples.

Optionally, the electronic device may invoke a face attribute detection model to detect a face in the image, and determine the first snapshot mode according to the detection result. The face attribute detection model can be obtained through training of a machine learning algorithm. When the human face attribute detection model detects a human face in the image, each feature point of the human face can be detected. The electronic equipment can exclude passerby entry scenes and motion scenes according to the detected face position and depth information. In this case, the detection result of the face attribute detection model on the image meets the triggering condition of the expression snapshot mode, and it can be determined that the first snapshot mode is the expression snapshot mode.

When the human face attribute detection model detects a plurality of human faces in the image, the electronic equipment can exclude passerby entry scenes and motion scenes according to the position and the depth information of the human faces. In this case, the detection result of the face attribute detection model on the image meets the trigger condition of the snapshot mode, and it can be determined that the first snapshot mode is the snapshot mode.

Optionally, the electronic device may invoke a scene recognition model to detect a shooting scene of the image, and determine the first capturing mode according to a detection result. The scene recognition model can be obtained by training based on a plurality of predefined scenes through a machine learning algorithm. When the scene detected in the image by the scene recognition model is one of a plurality of predefined motion scenes, the motion scene is output. The electronic equipment can determine that the shooting object is in a motion state according to the motion scene detected by the scene recognition model. In this case, the detection result of the scene recognition model on the image satisfies the trigger condition of the motion capture mode, and it can be determined that the first capture mode is the motion capture mode. By way of example and not limitation, the above-described motion scene may include: a court (e.g., including a basketball court, a soccer field, etc.), a swimming pool, or a race track, etc.

Further optionally, the electronic device may also invoke the scene recognition model and the human body frame detection model to detect the image, and synthesize results output by the scene recognition model and the human body frame detection model to determine the first snapshot mode. For example, when the scene in the image is detected to be a predefined motion scene through the scene recognition model and a plurality of human frames are detected in the image through the human frame detection model, the detection of the image by the scene recognition model and the human frame detection model meets the triggering condition of the multi-person motion snapshot mode, and the first snapshot mode can be determined to be the multi-person motion snapshot mode.

Optionally, the electronic device may invoke a body frame detection model to detect a body frame in the image. The human body frame detection model can be obtained through algorithm training of machine learning, and is used for detecting the human body frame in the image. The electronic device may also invoke other motion region detection algorithms to determine motion regions in the image. For example, a motion region in an image may be determined based on optical flow information or the like. This is not a limitation of the present application. When the overlapping area between the motion area in the front frame image and the motion area in the rear frame image in the image and the human body frame is large, if the proportion of the overlapping area in the whole human body frame is higher than a certain preset threshold, the motion area in the image can be determined to be foreground motion, but not background motion or relative motion of a camera. Under the condition, the coincidence degree of the human body frame and the motion area in the image meets the triggering condition of the motion capture mode, and the first capture mode can be determined to be the motion capture mode. Further, the electronic device may further determine that the first capturing mode is a multi-person motion capturing mode when a plurality of human frames are detected by the human frame detection model.

Optionally, the electronic device may invoke a pose estimation model to detect a plurality of pose points of the human body in the image. The pose estimation model can be trained based on a plurality of predefined pose points (or feature points) through a machine learning algorithm. The plurality of predefined gesture points include, for example: head, shoulder, neck, elbow, crotch, leg, knee, ankle, etc. For the sake of brevity, this is not to be enumerated here. The pose estimation model may be used to detect a plurality of pose points in an image and determine coordinate information for each pose point.

In one implementation, the coordinate information of each pose point may be represented by two-dimensional coordinates of a corresponding pixel point in the image. For example, a pixel (u, v) represents the pixel in the u-th row and v-th column of the two-dimensional image. In another implementation, the coordinate information of each pose point may be represented by the three-dimensional coordinates of the corresponding pixel point in the image. For example, the pixel point (u, v) may further carry depth information d, and the three-dimensional coordinate of the pixel point may be represented as (u, v, d). Wherein the depth information is used to indicate the distance of the pixel point from the camera. It should be understood that the representation of the coordinate information of the pose point by the two-dimensional coordinates (u, v) or the three-dimensional coordinates (u, v, d) of the pixel point is only one possible implementation manner, and should not constitute any limitation to the present application.

The pose estimation model may be based on a plurality of pose points of the human body, and may further estimate a human body frame. And connecting a plurality of posture points on the same frame of image to obtain the skeleton frame of the human body. From the coordinate information of the plurality of posture points, the position and size of the human body frame can be estimated. Therefore, the electronic equipment can detect the contact degree of the determined motion areas according to the human body frame estimated by the posture estimation model and the motion area algorithm, so that whether the triggering condition of the motion snapshot mode is met or not can be determined. The specific method is similar to the method described above in connection with the human frame detection model, and for the sake of brevity, it is not repeated here.

In another implementation manner, the electronic device may determine whether the photographic subject is in a motion state according to coordinate information of each posture point in the front multi-frame image and the back multi-frame image. When the shooting object is in a moving state, the coordinate information of part or all of the attitude points in the front multi-frame image and the back multi-frame image can be changed relatively. The change of each posture point of the human body in the motion state can be obtained through the relative change of the coordinate information of the posture points. Therefore, when the electronic device determines that the trigger condition of the motion capture mode is met according to the coordinate information of the plurality of attitude points in each frame of image detected by the attitude estimation model, the first capture mode can be determined to be the motion capture mode.

Further optionally, the electronic device may invoke a pose estimation model and a motion detection model to identify a motion category of a photographic subject in the image. As described above, the pose estimation model may be used to determine a plurality of pose points in an image and coordinate information for each pose point. Coordinate information of a plurality of pose points in each frame image may be input to the motion detection model for determining a motion category of the photographic subject. The motion detection model may be derived by a machine learning algorithm trained based on a plurality of predefined motion classes. The motion detection model may determine a motion type of the photographic subject based on the training sample and the coordinate change of the above-described posture point. The action categories include, for example: running, jumping, shooting, kicking, rock climbing, swimming, diving, skating, and the like.

If the coordinate change of the gesture point determined by the motion detection model is the same as or approximately the same as the coordinate change of the gesture point in a predefined motion category, the motion category can be determined as the motion category of the shooting object. The motion detection model may output a motion class of the photographic subject. For example, when the motion detection model detects that the human body performs a specific motion (such as the motion category listed above) in the image, the electronic device may determine that the image satisfies the trigger condition of the motion capture mode, and thus may determine that the first capture mode is the motion capture mode.

A number of examples of the electronic device determining the first snap-shot mode have been listed above in connection with the functionality of the models, but it should be understood that these examples should not constitute any limitation to the present application. The multiple models can be used in combination, and the first snapshot mode suitable for the current shooting is determined based on the trigger conditions predefined for various snapshot modes.

The electronic equipment can also sequentially call the corresponding models according to the priorities of the plurality of snapshot modes. By way of example and not limitation, a motion capture mode is prioritized over an expression capture mode. Based on the above listed priority ordering of the plurality of capturing modes, the electronic device may sequentially invoke the detection models to determine the first capturing mode based on the detection models corresponding to the different capturing modes.

For example, the motion capture mode may be determined by detecting the captured multi-frame image by calling a human frame detection model, or a scene recognition model, or a posture estimation model and a motion detection model. The expression snapshot mode can be determined by calling a plurality of frames of images captured by the face attribute detection model for detection. It should be understood that the relationships between the modes and the models listed herein are only examples, and should not limit the present application in any way. As described above, a snap-shot mode may be determined by the detection results of a plurality of models in common, or a snap-shot mode may be determined by calling a plurality of models to detect a captured multi-frame image. The application does not limit the models respectively corresponding to various snapshot modes.

Since the priority of the motion capture mode is higher than that of the face attribute detection model, the electronic device can call the human body frame detection model or the scene recognition model first. When the captured multi-frame image is determined to meet the triggering condition of the motion capture mode, the first capture mode can be directly determined to be the motion capture mode. The electronic equipment can not call the face attribute detection model any more, so that the time for determining the snapshot mode can be saved, and the power consumption caused by the running of the model can be saved.

It should be understood that the process of the electronic device sequentially invoking the models according to the prioritization of the plurality of snapshot modes is described in detail herein in connection with a prioritization example. However, this is merely an example for ease of understanding and should not be construed as limiting the present application in any way. The priority ordering among the plurality of snapshot modes is not limited in the present application.

In step 220, a snapshot frame image corresponding to the first snapshot mode is determined among the captured plurality of frames of images to be evaluated, using the evaluation policy corresponding to the first snapshot mode.

Wherein the evaluation strategy is one of a plurality of preset evaluation strategies. Each evaluation policy may be used to define a rule or manner of determining capture frame images among a plurality of frames of images to be evaluated. The snapshot frame image corresponding to the first snapshot mode may specifically be an image that is determined based on the first snapshot mode and is most capable of exhibiting a highlight of the first snapshot mode. For example, if the first capturing mode is a motion capturing mode, the capturing frame image may be, for example, an image that best reflects a highlight moment of the captured subject among the plurality of captured images to be evaluated. For another example, the first snapshot mode is an expression snapshot mode, such as smile snapshot, and the snapshot frame image may be, for example, an image at the moment when the smile of the photographed subject is most brilliant in the captured multiple frames of images to be evaluated. For another example, if the first capturing mode is a contract capturing mode, the capturing frame image may be, for example, an image at the best moment of the expression, or the composition of each of the captured multiple frames of images to be evaluated.

It should be noted that the frames of images to be evaluated described herein and the frames of images described in step 210 above may not overlap or partially overlap. The overlapping may specifically mean that the image of the certain frame in step 210 and the image to be evaluated in step 220 are the same frame of image, or images captured at the same time point. Such as having the same time stamp. For example, the plurality of frames of images to be evaluated may be a plurality of frames of continuous images captured after the plurality of frames of images captured in step 210, or may be a plurality of frames of images further including at least some of the plurality of frames of images captured in step 210. For another example, the plurality of frames of images to be evaluated may be a plurality of discontinuous images following the plurality of frames of images captured in step 210. For another example, the plurality of frames of images to be evaluated may be a plurality of frames of images subsequent to and discontinuous from the plurality of frames of images captured in step 210.

Specifically, the captured multi-frame image in step 210 may include, for example, a preview image captured before the user performs a photographing operation or a video recording operation. Typically, after turning on the camera, the electronic device defaults to a photographing mode. Although the user does not perform the photographing operation, the preview image can still be seen through the photographing interface. The plurality of frames of images may include a plurality of frames of preview images captured in a photographing mode. Thereafter, the electronic device may trigger the first snap mode based on a manual adjustment of the user or a detection result of the detection model. That is, the electronic device enters the smart snap mode. In the smart snap mode, although the user may not perform a photographing operation, a plurality of frames of preview images may still be captured. The multi-frame image may also include a multi-frame preview image captured after entering the smart snap mode. The electronic device may store the plurality of frames of images in a buffer for subsequent use, such as to determine the first snap-shot mode in step 210.

Further, in the photographing mode or the smart snap mode, since the user does not perform the photographing operation, a preview image may still be captured for determining the first snap mode. In the embodiment of the present application, in the photographing mode, the smart snap mode, or another photographing mode, a mode in which the photographing operation is performed before may be referred to as a preview mode. The user can observe the photographed object in the preview mode to select an appropriate timing to perform the photographing operation. It should be understood that the preview mode is not necessarily a mode in which the photographing operation is preceded, and a mode in which the photographing operation is performed between two consecutive photographing operations may also be referred to as a preview mode.

The multiple frames of images to be evaluated in step 220 may be, for example, N frames of images before and after the moment of pressing the shutter, where N is a positive integer. For example, if N is 10, the multiple frames of images to be evaluated may be the first 10 frames of images and the last 10 frames of images at the moment of pressing the shutter, and the total number of the images is 20. In this case, the plurality of images to be evaluated may not overlap with the plurality of images in step 210, or may partially overlap with each other. This may depend on the length of time between the point in time when the shutter is pressed and the point in time between the electronic device entering the first snap-shot mode, the value of N, etc. For example, the value of N is large, and after the user takes a picture after entering the first capture mode, the multiple frames of images to be evaluated may include part or all of the multiple frames of images described in step 210. The present application does not limit the relationship between the plurality of frames of images to be evaluated in step 210 and the plurality of frames of images to be evaluated in step 220. Alternatively, the multiple images to be evaluated may also be the first N frames of images or the last N frames of images at the moment of pressing the shutter, which is not limited in the embodiment of the present application.

It should be understood that, during the photographing process, the user may press the shutter by clicking a photographing control in the user interface or another button for controlling photographing, for example, and the application is not limited to the specific operation of pressing the shutter by the user. It should also be understood that the above values of N are merely examples for ease of understanding and should not be construed as limiting the present application in any way. The specific value of N is not limited in the present application.

In addition, the multi-frame image in step 210 may further include a video image saved in a video recording mode. For example, if the user turns on the camera function, the video recording mode is used for video recording. When a user performs video recording by a camera, although a photographing operation may not be performed, in fact, a video image may be a sequence of consecutive multi-frame images. The electronic device may save the images in a buffer for subsequent use, such as determining a first snap-shot mode in step 210 and determining a snap-shot frame image corresponding to the first snap-shot mode in step 220.

In the video recording mode, the multiple images to be evaluated in step 220 may be, for example, the first N frames of images, the last N frames of images, or the front and back N frames of images at the moment of pressing the shutter, or an image with a scoring result exceeding a preset threshold that is evaluated based on the evaluation policy provided in the embodiment of the present application. This is not a limitation of the present application.

In one implementation, the snap-shot frame image corresponding to the first snap-shot mode may be an optimal frame image determined from a plurality of captured images to be evaluated. For example, the captured multiple frames of images to be evaluated may be scored based on the evaluation policy corresponding to the first capturing mode, and one frame of image with the highest scoring result may be selected as the optimal frame of image.

Optionally, before step 220, the method 200 further comprises:

step 230, performing image recognition on the captured multiple frames of images to be evaluated to output recognition results;

and 240, determining the numerical value of each scoring parameter based on the identification result.

Specifically, the electronic device may invoke one or more detection models described above to perform image recognition on a plurality of captured images to be selected. The electronic equipment can use the evaluation strategy corresponding to the first snapshot mode to score the captured multi-frame image to be evaluated according to the identification result of the image to be evaluated, so that the snapshot frame image corresponding to the first snapshot mode can be determined according to the scoring result.

Optionally, step 210 above may specifically include: a first capturing mode is determined among a plurality of preset capturing modes according to a captured multi-frame image based on a first frame rate. Optionally, step 230 may specifically include: and performing image recognition on the captured multiple frames of images to be evaluated based on the second frame rate to output a recognition result. The first frame rate may be equal to the second frame rate, or may be less than the second frame rate. This is related to the shooting mode used by the electronic device. The embodiments shown in fig. 4 to 6 will be described in detail later.

The specific process of determining a snap-shot frame image corresponding to the first snap-shot mode in a captured plurality of frames of images to be evaluated by using the evaluation policy corresponding to the first snap-shot mode is described in detail below.

Optionally, each of the plurality of capturing modes corresponds to at least one evaluation strategy of a plurality of preset evaluation strategies, each evaluation strategy comprising one or more scoring parameters for image scoring and a mode weight of each scoring parameter. Step 220 may specifically include: calculating the grade of each frame of image to be evaluated in a plurality of captured frames of images to be evaluated by using one or more grade parameters in one of at least one evaluation strategy corresponding to the first snapshot mode and the mode weight of each grade parameter; and determining a snapshot frame image corresponding to the first snapshot mode in the multiple frames of images to be evaluated according to the multiple scores of the multiple frames of images to be evaluated.

Specifically, the plurality of capturing modes may correspond to a plurality of preset evaluation strategies. Each snapshot mode may correspond to one or more of a plurality of preset evaluation policies. Wherein each evaluation strategy may comprise a set of scoring parameters corresponding to the first snap-shot mode. Each set of scoring parameters may include one or more scoring parameters. For example, scoring parameters corresponding to a motion capture mode may include one or more of: the posture extension degree, the posture height and the like. For another example, the scoring parameters corresponding to the expression snapshot mode may include: expression intensity, facial occlusion, open and closed eyes, facial angles, etc.

The posture extension degree may also be referred to as a posture extension degree, and specifically may refer to a bending degree of limbs and a relative distance from the trunk. The posture extension degree can be obtained by weighting the included angles of all joints of the body, and the included angle parameters of all joints of the body, which are strongly related to the motion of the human body, can be preset. The pose-extension may be determined, for example, by a pose estimation model and a motion detection model. The gesture extension may further include, for example, the included angle of each joint of the human body, such as but not limited to, a wrist joint included angle, an elbow joint included angle, an arm bending included angle, a leg bending included angle, a knee joint included angle, an ankle joint included angle, and the like, corresponding to different joints. For the sake of brevity, this is not to be enumerated here. In the present embodiment, the included angle of each joint may exist as a scoring parameter. The gesture extension can be understood as the upper generalization of the included angle of each joint point. The posture height may specifically refer to a height position of the center height of the body in the image. The expression intensity may specifically indicate an intensity level when the photographic subject exhibits a certain expression. The expression intensity can be determined by a human face attribute detection model, or can be calculated through feature points. The expression intensity can be obtained by weighting each local feature of the face. The expression intensity may further include, for example: the magnitude of the break, the degree of the mouth angle rising, the degree of the eyes opening, and the like. For the sake of brevity, this is not to be enumerated here. In the embodiment of the present application, each local feature included in the expression intensity listed above may be present as a scoring parameter. The expression intensity can be understood as a general summary of the local features. The open and closed eyes specifically refer to whether or not the subject closes the eyes. The open and closed eyes may be determined by a face attribute detection model, for example. The face occlusion specifically refers to whether or not a face of a photographic subject is occluded and the degree of occlusion. The face mask can be calculated, for example, from the feature points. The face angle specifically refers to whether or not the face of the photographic subject is inclined and the inclination angle. The face angle may be determined, for example, by a face attribute detection model.

In addition to the above-listed scoring parameters, the scoring parameters may include sharpness, exposure, and composition, etc. in different capture modes. The definition may specifically refer to the definition of each detail shadow and its boundary on the image. Sharpness is a parameter used to describe the quality of an image. The exposure may specifically be a process in which a photosensitive element of the camera receives external light and forms an image. The brightness of the picture is directly influenced by the amount of the external light received by the photosensitive element. Depending on the degree of light reception by the photosensitive element, there are roughly three cases, underexposure, correct exposure, and overexposure. Composition may particularly refer to the process of determining and organizing elements to produce a harmony photo. The composition may specifically include, but is not limited to, a three-division composition (or referred to as a squared figure), a symmetric composition, a frame-type composition, and the like, which is not limited in this application.

It should be understood that the scoring parameters corresponding to the above-listed snapping modes are only examples, and should not limit the application in any way. The specific content and name of the scoring parameter corresponding to each snapshot mode are not limited in the application. Alternatively, the scoring parameters included in different evaluation strategies corresponding to the same snapshot mode may be the same. However, the scoring parameters included in different evaluation strategies are not necessarily the same in different capture modes.

Optionally, scoring parameters included in different evaluation strategies corresponding to different snapshot modes are the same, and mode weights included in the different evaluation strategies are different.

In other words, the mode weights corresponding to the same scoring parameter may be different in different evaluation strategies under different capture modes. Alternatively, the different evaluation strategies for different capture modes may each include different mode weights corresponding to the same scoring parameter.

For example, the scoring parameters included in the evaluation strategy corresponding to the motion capture mode may include pose height, pose extension, sharpness, exposure, and composition; alternatively, the scoring parameters included in the evaluation strategy corresponding to the motion capture mode may also include a posture height, a posture extension, an expression intensity, an eye opening and closing, a facial occlusion, a face angle, a sharpness, an exposure, and a composition, but the mode weight applied to the expression intensity, the eye opening and closing, the facial occlusion, and the face angle is small, for example, zero or close to zero. Thus, the substance of both statements is the same.

It should be understood that the above-listed scoring parameters are only examples and should not be construed as limiting the present application in any way. As long as the scoring parameter corresponding to the motion capture mode includes any one of the gesture height and the gesture stretching degree, the scoring parameter should fall within the protection scope of the embodiment of the present application. The scoring parameters corresponding to the motion capture mode may further include, for example, rotation, and the like.

For another example, the scoring parameters included in the evaluation strategy corresponding to the expression snapshot mode may include expression intensity, open/close eyes, facial occlusion, face angle, sharpness, exposure, and composition; alternatively, the scoring parameters included in the evaluation strategy corresponding to the expression snapshot mode may also include expression intensity, eye opening and closing, facial occlusion, face angle, pose height, pose extension, sharpness, exposure, and composition, but the mode weight applied to the pose height and the pose extension respectively is smaller, for example, zero or close to zero. Thus, the substance of both these terms is the same.

It should be understood that the above-listed scoring parameters are only examples and should not be construed as limiting the present application in any way. As long as the scoring parameters corresponding to the expression snapshot mode include any one of the scoring parameters of expression intensity, eyes open and closed, facial occlusion and face angle, the scoring parameters are all within the protection scope of the present application. The scoring parameters corresponding to the expression snapshot modes and the mode weights of the scoring parameters are not limited.

As can be seen from the above listing of the scoring parameters and their mode weights in different capturing modes, the mode weights of the same scoring parameter are different corresponding to the different capturing modes. For example, the height of the gesture and the stretching of the gesture listed above are weighted higher in the evaluation strategy corresponding to the motion capture mode, and are weighted lower or not weighted (weight 0) in the evaluation strategy corresponding to the expression capture mode; for another example, the expression intensity, the eyes opened and closed, the facial occlusion, and the face angle listed above are weighted higher in the evaluation policy corresponding to the expression snapshot mode, and are weighted lower or are not weighted (weight 0) in the evaluation policy corresponding to the motion snapshot mode.

In addition, the mode weights of sharpness, composition, and exposure are not so much related to the snap shot mode, and therefore the same these mode weights can be defined in different snap shot modes. It should be understood that the above description is only for convenience of understanding of the different scoring parameters and their mode weights in connection with the different capturing modes, but this should not be construed as limiting the present application in any way.

It should be noted that the evaluation policy corresponding to the snapshot mode may be predefined. Once the evaluation strategy corresponding to the first snapshot mode is determined, that is, the scoring parameters and the mode weights of the scoring parameters used for scoring the image to be evaluated are determined.

For example, the scoring of the image to be selected may be by a formula

To be determined. Wherein G represents a scoring result, and G is more than 0; i represents the number of items of the grading parameter, and is an integer greater than or equal to 1; i represents the ith scoring parameter in the I scoring parameters, I is more than or equal to 1 and less than or equal to I, and I is an integer; t is_iA value representing the ith scoring parameter, T_i≥0；α_iMode weight, alpha, representing the ith scoring parameter_i≥0。

In the first capturing mode, the score of each frame image can be based on the score parameter T corresponding to the first capturing mode_iAnd mode weight alpha of each scoring parameter_iAnd weighting the scores of all the scoring parameters to obtain a result. When a certain snapshot mode corresponds to multiple evaluation strategies, the electronic device may select one of the multiple evaluation strategies to perform scoring, for example, select a default evaluation strategy, or call a default evaluation strategy, a general evaluation strategy, or the like; or, the multiple evaluation strategies may also correspond to multiple capturing categories, and the electronic device may also select the corresponding evaluation strategy according to the capturing category determined by the previous detection models for detecting the multiple frames of images, and the like.

For ease of understanding, the process of scoring the image to be selected using one of the evaluation strategies corresponding to the first snap-shot mode will first be described in detail below. One evaluation strategy corresponding to the first snap-shot mode may be, for example, the default evaluation strategy described above.

The following describes in detail a specific process of determining, by the electronic device, a snapshot frame image corresponding to the first snapshot mode from multiple frames of images to be evaluated, taking different snapshot modes as examples.

Firstly, the electronic device can call a detection model corresponding to the first snapshot mode to perform image recognition on an image to be evaluated so as to obtain a recognition result.

For example, if the first capturing mode is a motion capturing mode, the model corresponding to the first capturing mode may include a posture estimation model and a motion detection model, for example.

The electronic equipment can call the posture estimation model to perform image recognition on the captured multiple frames of images to be evaluated. The posture estimation model can perform image recognition on each frame of image to be evaluated to obtain coordinate information of a plurality of posture points in each frame of image to be evaluated. And the coordinate information of each attitude point in each frame of image to be evaluated is the recognition result output by the attitude estimation model. Since the specific process of determining the coordinate information of each pose point by the pose estimation model has been described in step 210, details are not described herein for brevity.

Optionally, the electronic device may further invoke a motion detection model to perform image recognition on multiple frames of images to be evaluated. The motion detection model may be combined with a pose estimation model for identifying a motion category of the photographic subject. The action category identified according to each frame of image to be evaluated may be an identification result output by the action detection model. Since the specific process of determining the motion category by the motion detection model has been described in step 210, it is not described herein again for brevity.

The motion detection model may specifically indicate the identified motion category by an index of motion types or other information that may be used to uniquely indicate one motion category. The present application is not limited to a specific form of the recognition result output by the motion detection model.

It should be understood that in the present embodiment, since the evaluation policy does not necessarily correspond to the action category. Therefore, the electronic device may invoke the motion detection model to identify the motion type, or may not invoke the motion detection model, which is not limited in the present application.

For another example, if the first snapshot mode is an expression snapshot mode, the model corresponding to the first snapshot mode is a face attribute detection model. The electronic equipment can call a face attribute detection model to perform image recognition on the captured multiple frames of images to be evaluated. The facial attribute detection model may establish a classification model of facial attributes, such as the above-described facial feature point detection model, open/close eye detection model, and the like, for performing image recognition on each frame of the image to be evaluated, and output information such as the facial category of the subject, whether the eyes are closed, whether the face is occluded, and age, based on the facial attributes including, for example, expression categories (such as happy, angry, sad, and the like), open/close eyes, feature points, age, and the like. In other words, information such as the expression category, whether the eyes are closed, whether the face is occluded, and the age of the subject in each frame of the image to be evaluated is the recognition result output by the face attribute detection model based on the recognition of each frame of the image to be evaluated.

The face attribute detection model can detect the expression of the shot object based on a plurality of pre-trained expression categories. When it is determined that the expression of the photographic subject belongs to any one of a plurality of expression categories trained in advance, the expression category may be determined as the expression category of the photographic subject. Furthermore, different expression categories may correspond to different priorities. When the expression categories determined by the face attribute detection model are multiple, the expression categories with higher priority can be determined as the expression categories of the shot object according to the predefined priority sequence.

After the face attribute detection model completes the image recognition of the image to be selected, the recognition result can be output. Specifically, for the expression categories, the expression categories may be indicated by indexes of expression types or other information that can be used to uniquely indicate one expression category; for open and closed eyes, "closed eye" and "open eye" may be indicated by, for example, binary values "0" and "1", respectively; for the feature points, the positions of the respective feature points may be indicated by coordinate information; for age, it may be indicated by a specific numerical value.

It should be understood that, in the present embodiment, since the evaluation policy does not necessarily correspond to the expression category, the face attribute detection model may not output the expression category. The electronic equipment can determine information such as expression intensity, whether eyes are closed and whether the face is shielded according to information such as the position and age of each feature point in each frame of image to be evaluated, and uses an evaluation strategy corresponding to the first snapshot mode to grade each frame of image to be evaluated.

It should also be understood that the above illustrates just a few possible implementations for indicating the detection result for ease of understanding, and should not constitute any limitation to the present application. The specific indication mode of the detection result is not limited in the application. It should also be understood that the scoring parameters corresponding to the above-listed expression snapshot modes are only examples, and should not limit the application in any way. As long as the scoring parameters corresponding to the expression snapshot mode include any one of the parameters of expression intensity, facial occlusion, eyes opening and closing, and face angle, the scoring parameters are all within the protection scope of the present application. Of course, when the first snapshot mode is a group-shot snapshot mode, the scoring parameters may also include one or more of expression intensity, facial occlusion, open and closed eyes, and facial angle. For brevity, this is not repeated hereafter.

It should be noted that, when the electronic device invokes one or more detection models for image recognition, the electronic device may invoke the detection model corresponding to the first snapshot category for image recognition, as exemplified above. The snapshot category may be a specific sub-pattern or classification in the snapshot mode obtained by further dividing the snapshot mode. The electronic equipment can also call a plurality of predefined detection models for image recognition, such as a face attribute detection model, a posture estimation model and an action detection model for image recognition. Since the pattern weight applied to each scoring parameter is different depending on the different snapshot patterns, the electronic device applies different pattern weights depending on the snapshot patterns when scoring based on the result of image recognition although a plurality of detection models are called. Therefore, the final evaluation result is not influenced. Therefore, the present application is not limited to a specific model used for image recognition.

After obtaining the recognition result of the image to be evaluated, the electronic device may obtain the numerical value of each scoring parameter based on the recognition result. For example, if the first snapshot mode is a motion snapshot mode, the electronic device may determine parameters such as a height of a human skeleton and an included angle of each joint according to coordinate information of the gesture point.

The electronic device may determine a value for a scoring parameter of pose height based on human bone height. For example, the center point of the human bone, the highest point of the human bone, and the like may be used as the numerical values of the posture height. It should be appreciated that when the electronic device selects a point (e.g., the center point of a human bone) as the value of the pose height, the same point is selected for all images to be evaluated to determine the value of the pose height.

The electronic device may determine a pose-extension based on the angle of each joint point. Specifically, the posture extension degree may be determined by the bending degree of each joint point of the human body, and thus the posture extension degree may include a numerical value of the bending degree of each joint point. As mentioned above, the value of the pose extension can be weighted by the included angle of each joint. The weight of the angle of each joint point may be predefined, i.e. the mode weight described above. Of course, the determination of the numerical value of each scoring parameter may be determined in other ways, which are not specifically defined herein. For example, an existing algorithm, such as an action intensity algorithm, may be invoked to determine a value for each scoring parameter.

It should be understood that the height and the extension of the posture are just one possible way of expression, and the application does not exclude the possibility of expressing the same or similar meaning by other possible ways of expression. For example, the gesture extension degree can be replaced by the action amplitude. For another example, if the first snapshot mode is an expression snapshot mode, the electronic device may determine a numerical value of each scoring parameter according to a plurality of feature points of the face.

Taking the scoring parameter of the expression intensity as an example, the electronic device may determine a numerical value of the expression intensity according to the plurality of feature points. Specifically, the expression intensity may be determined by various local features of the human face, such as, but not limited to, features including the size of the grin, the degree of the mouth opening, the degree of the eyes opening, and the like. The numerical value of the expression intensity may include the numerical value of each local feature. As mentioned above, the expression intensity can be obtained by weighting the numerical values of the respective local features of the face. The weight of each local feature may be predefined, i.e. the pattern weight described above.

Wherein the degree to which the eyes are open or closed can be determined, for example, by the ratio of the vertical distance and the horizontal distance of the eyes. The amount of the grinning of the mouth can be determined, for example, by the ratio of the sum of the distance between the upper and lower lips of the mouth and the horizontal distance of the mouth angle to the eye spacing. The degree of the rise of the mouth angle can be determined, for example, by the distance between the horizontal line connecting the mouth angle and the lip under the mouth.

Taking the scoring parameter of facial occlusion as an example, the electronic device may determine whether there is a feature point missing according to the detected feature points, and if so, may determine that the face of the photographic object is occluded. For facial occlusion, its value may be determined, for example, by the ratio of the detected feature points to the predefined feature points.

In addition to the expression intensity, facial mask listed above, the electronic device may also determine values for scoring parameters such as open and closed eyes, face angle, etc. For example, the electronic device may also invoke an existing algorithm, such as an expression intensity algorithm, to determine a value for each of the scoring parameters.

In addition, in different snapshot modes, the electronic equipment can further load grading parameters such as definition, exposure, composition and the like, and determine the numerical values of the grading parameters. Wherein, for different snapshot modes, the value of the scoring parameter can be composed based on different patterning methods. For example, in an expression snapshot mode, a nine-grid composition may be loaded; in a snap-shot mode, a symmetrical method composition can be loaded; in landscape capture mode, a horizontal line composition may be loaded. The definition of the mode weights for composition this scoring parameter can be defined based on different snap-shot modes.

Taking the snap-shot mode as an example, since the symmetrical composition can be loaded, the distance from the center of all people in the image to the center of the picture and the distance between two adjacent people can be calculated. The two distances are weighted respectively to obtain a weighted sum. The weighted sum may be used as a numerical value for the scoring parameter of the composition.

It should be understood that the specific manner for determining the numerical value of each scoring parameter listed above is merely an example and should not be construed as limiting the application in any way. Since the determination of the numerical value of each scoring parameter can be made by referring to the prior art, it is not illustrated here for the sake of brevity.

It should be noted that, in the case that the scoring parameter is obtained by weighting multiple parameter values, the parameter values may be normalized to the same magnitude, and then weighted. After determining the values of the scoring parameters, the electronic device may determine the score of each frame of the image to be evaluated using the evaluation strategy corresponding to the first capturing mode.

As described above, each snapshot mode may correspond to one or more evaluation strategies, and each evaluation strategy has a scoring parameter and a mode weight defined therein. In other words, in each evaluation strategy, the scoring parameters and their pattern weights are preset. After the evaluation strategy is determined, both the scoring parameters and their pattern weights may be determined. The electronic equipment can obtain the score of each frame of image to be evaluated by substituting the numerical value of the score parameter determined for each frame of image to be evaluated.

In this embodiment, the electronic device may use an evaluation policy corresponding to the first capturing mode to substitute the value of the scoring parameter previously determined for each frame of image to be evaluated to calculate the score of each frame of image to be evaluated. The electronic device may score each frame of the image to be selected, for example, by the above-listed formula

And (4) calculating. The meaning of the parameters in the formula has already been explained above and, for the sake of brevity, is not repeated here. When the scores of the plurality of scoring parameters are at different levels, the scores of the scoring parameters can also be normalized to the same level.

It should be understood that the formulas for calculating scores listed herein are only examples and should not be construed as limiting the application in any way. The application does not exclude the possibility of calculating the score of the image to be selected by other calculation means. After determining the score of each frame of image to be evaluated based on the above formula, the electronic device may determine a snapshot frame image corresponding to the first snapshot mode according to the score of each frame of image to be evaluated. Here, the snap-shot frame image corresponding to the first snap-shot mode may be, for example, one frame image having the highest score determined based on the scores of a plurality of frames of images to be evaluated. In other words, in the plurality of frames of images to be evaluated, the score of the snap-shot frame image corresponding to the first snap-shot mode is higher than the score of any one frame of images to be evaluated except the snap-shot frame image. The electronic device may store the snapshot frame image in the electronic device after determining the snapshot frame image corresponding to the first snapshot mode, or output the snapshot frame image to a display unit or the like of the electronic device. This is not a limitation of the present application.

In another implementation, each snap-shot mode may correspond to multiple evaluation strategies. The plurality of evaluation strategies in each of the plurality of capture modes may further correspond to a plurality of capture categories. For example, the one or more snapshot categories to which the motion snapshot mode corresponds may include at least one of: shooting, running, jumping, swimming, kicking, etc. The one or more snapshot categories corresponding to the expression snapshot modes comprise at least one of the following: happy, angry, heart hurt, fun, etc. It should be understood that the above list of the corresponding capturing categories of each capturing mode is only an example, and should not limit the present application in any way. The specific snapshot category corresponding to each snapshot mode is not limited in the present application.

In the embodiment of the application, each snapshot category may correspond to one evaluation policy. In at least one evaluation strategy corresponding to the first capturing mode, each evaluation strategy includes one or more scoring parameters corresponding to the first capturing mode, a mode weight of each scoring parameter, and a category weight corresponding to one capturing category. In other words, in the first snap-shot mode, each scoring parameter may be defined with a mode weight. Each scoring parameter may further be defined with a category weight under each of the capturing categories corresponding to the first capturing mode.

Optionally, scoring parameters included in different evaluation strategies corresponding to different snapshot categories are the same, and category weights included in the different evaluation strategies are different.

The different evaluation strategies described herein include different category weights, which may specifically mean that the category weights applied to the same scoring parameter in the different evaluation strategies are different. And, under the condition that the evaluation strategy comprises a plurality of scoring parameters, the different evaluation strategies have different category weights applied to at least one scoring parameter. In other words, different evaluation strategies for different snap-shot categories may each include different category weights corresponding to the same scoring parameter.

For convenience of explanation, the snapshot category determined by the electronic device according to the captured multi-frame image is the first snapshot category. Based on the difference of the first snapshot categories, the category weights corresponding to the same scoring parameter in the same snapshot mode are also different.

For example, the scoring of the image to be selected may be by a formula

To be determined. Wherein, beta_iClass weight, β, representing the ith scoring parameter_i≥0。T_iAnd alpha_iThe meaning of (a) has been explained above and is not repeated here for the sake of brevity.

In the same snapshot mode, the category weights may be different for the same scoring parameter even though the mode weights are the same. For example, the first capturing mode is a motion capturing mode. The corresponding scoring parameters in the motion capture mode may include: attitude height, attitude extension, etc. The sport snapping mode can comprise various motion categories of shooting, diving, swimming, running and the like.

When the first action category is shooting, the category weight of the posture height is higher than that of the posture extension degree, and in the posture extension degree, the category weights of the elbow joint angle and the arm bending angle are higher than those of other angles (such as the knee joint angle, the ankle joint angle and the like). When the first action category is diving, the category weight of the knee joint angle is higher than that of the posture height, and the category weight of the posture height is higher than that of the arm bending angle. When the first motion category is swimming, the category weight of the leg bending angle is higher than the category weight of the arm bending angle, and the category weight of the arm bending angle is higher than the category weight of the posture height. When the first motion category is a default category, the larger the joint angle is, the more violent the human motion is, and the moment can be considered as a wonderful moment. The class weights for the leg flexion angle and the arm flexion angle are therefore higher than the class weight for the stance height. It should be understood that these examples are for ease of understanding only and are not intended to limit the present application in any way. The present application does not limit the assignment of the category weight of each scoring parameter under each snapshot category.

Optionally, the step 210 may further include: a first snapshot category in a first snapshot mode is determined from a captured multi-frame image. The step 220 may further include: and calculating the grade of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more grading parameters corresponding to the first snapshot mode and the mode weight of each grading parameter and the class weight of each grading parameter corresponding to the first snapshot class.

That is, the electronic device may further determine the first snapshot category by calling a motion detection model or a face attribute detection model, in addition to determining the first snapshot mode. The following exemplifies a specific process of determining a first capturing category based on the first capturing mode and scoring multiple frames of images to be evaluated according to an evaluation strategy corresponding to the first capturing category.

First, the electronic device may determine a first snap shot category based on a first snap shot mode.

For example, if the first capturing mode is a motion capturing mode, the electronic device may invoke at least a pose estimation model and a motion detection model to perform image recognition on the image to be selected. As described above, the motion detection model may perform the identification of the motion category based on the coordinate information of the plurality of pose points in each frame of the image to be evaluated input by the pose estimation model. The action detection model can construct coordinate changes of all the attitude points in the motion process based on the coordinate information of the plurality of attitude points in each frame of image to be evaluated, and determines a first action category.

Since the specific process of determining the motion category by the motion detection model has been described in step 210, it is not described herein again for brevity. It should be noted that, if the motion type determined by the motion detection model according to the coordinate information of the pose point in the image to be evaluated does not belong to the predefined motion type, the motion type may be determined as a default type, or called as a default type. In the embodiment of the present application, when the motion detection model does not detect a specific motion category, the motion category may be classified as a default category. The motion detection model may output the motion category of the photographic subject as a default category, more specifically, a default category in the motion capture mode.

If the first snapshot mode is an expression snapshot mode, the electronic equipment can at least call the face attribute detection model to perform image recognition on the image to be selected. As previously described, the facial attribute detection model may determine the expression category based on an analysis of feature points of each frame of the image to be reviewed.

Since the specific process of determining the expression category by the face attribute detection model has been described in step 210, details are not described here for brevity. It should be noted that, if the expression category determined by the face attribute detection model according to the feature point in the image to be evaluated does not belong to the predefined action category, the expression category may be determined as a default category, or called as a default category. In this embodiment of the application, when the facial attribute detection model does not detect a specific expression category, the expression category may be classified as a default category. The face attribute detection model may output the expression category of the photographic subject as a default category, more specifically, a default category in an expression snapshot mode. The electronic device determines the values of the scoring parameters in the above description in detail with reference to examples, and for the sake of brevity, the description is not repeated here.

After determining the first snapshot category in the first snapshot mode, the electronic device may substitute the respective scoring parameters using the corresponding evaluation policy to determine a score for each frame of the image to be evaluated. In this embodiment, as the first snapshot mode is further refined, when the electronic device weights the values of the scoring parameters, the electronic device may further apply a category weight to each scoring parameter in combination with the attention of the snapshot category to different details to calculate a score more suitable for the first snapshot category in the first snapshot mode.

For example, if the first capturing mode is a motion capturing mode and the first capturing category is shooting, the elbow joint angle and the arm bending angle may be weighted higher than the other scoring parameters (such as knee joint angle and ankle joint angle), and the posture height may be weighted higher than the posture extension degree.

In this embodiment, the electronic device may substitute, using the evaluation policy corresponding to the first snapshot category in the first snapshot mode, the value of the scoring parameter previously determined based on each frame of the image to be evaluated to calculate the score of each frame of the image to be evaluated. The electronic device may score each frame of the image to be selected, for example, by the above-listed formula

It should be understood that the formulas for calculating scores listed herein are only examples and should not be construed as limiting the application in any way. The application does not limit the possibility of calculating the score of the image to be selected by other calculation methods.

After determining the score of each frame of image to be evaluated based on the above formula, the electronic device may determine a snapshot frame image corresponding to the first snapshot mode according to the score of each frame of image to be evaluated. Here, the snap-shot frame image corresponding to the first snap-shot mode may be, for example, one frame image having the highest score determined based on the scores of a plurality of frames of images to be evaluated. In other words, in the plurality of frames of images to be evaluated, the score of the snap-shot frame image corresponding to the first snap-shot mode is higher than the score of any one frame of images to be evaluated except the snap-shot frame image.

The electronic device may store the snapshot frame image in the electronic device after determining the snapshot frame image corresponding to the first snapshot mode, or output the snapshot frame image to a display unit or the like of the electronic device. This is not a limitation of the present application. It should be noted that, in the same snapshot mode, the category weights of some scoring parameters may also be defined as zero or close to zero corresponding to different snapshot categories. When the class weight of a certain scoring parameter is defined to be zero or close to zero, it can also be understood that the scoring parameter is not included in the scoring parameter corresponding to the action class. From this point of view, the scoring parameters included in the plurality of evaluation strategies corresponding to the same snapshot mode are not necessarily the same. It should be understood that the above-listed examples for determining the score of each frame of the image to be selected are only for ease of understanding and should not constitute any limitation to the present application. Those skilled in the art can change or replace the above steps based on the same concept to achieve the same effect. The specific process of calculating the score of each frame of image to be evaluated by the electronic equipment is not limited in the application.

As mentioned above, the multiple frames of images to be evaluated are the first N frames of images, the last N frames of images, or the front and back N frames of images at the moment when the shutter is pressed by the electronic device, or one or more frames of images with scores exceeding a preset threshold in the recording process. These images are stored in the electronic device and may also be sent to a display unit for presentation to the user. This is not a limitation of the present application.

In one implementation, each frame of image may correspond to a timestamp. The electronic device, after determining the snap-shot frame image corresponding to the first snap-shot mode, may search for and acquire an image matching a timestamp of the snap-shot frame image from the camera module. The camera module can push the image to the display unit after encoding the image, and the image is presented to a user. Meanwhile, the image captured based on the photographing operation of the user can be stored in the user album after being encoded. For the convenience of user distinction, the snap frame image and/or the frame image actually photographed by the user may be distinguished by marks, for example, a "best moment" or other similar mark is left on the snap frame image, or different marks are left on the two frame images to indicate distinction.

In fact, the electronic device may also pre-process the captured multi-frame image before determining the first snap-shot mode from the captured multi-frame image, so as to obtain a more accurate estimation result. That is, the image input to the one or more models may be an original image or an image after being subjected to preprocessing, and the present application is not limited thereto.

Optionally, the method 200 further comprises: and carrying out image preprocessing on the captured multi-frame image. Specifically, an image processing module in the electronic device, such as an ISP, may perform image preprocessing on an image. Image pre-processing may include, for example: cutting the size of the image of each frame to accord with the input size of the attitude estimation model; also for example, pre-processing of the image of each frame may include, for example, averaging, normalization, data enhancement (e.g., rotation, etc.). The present application is not limited to the specific content and implementation method of image processing. The preprocessed image may conform to the input dimensions of the aforementioned multiple models, while the diversity of the data may be enhanced to prevent over-fitting of the models.

Based on the technical scheme, the snapshot mode can be determined according to the actual shooting scene by presetting a plurality of different snapshot modes and corresponding evaluation strategies. And may select one evaluation policy corresponding to the first snap-shot mode from among a plurality of preset evaluation policies and determine a snap-shot frame image using the evaluation policy. For example, scoring parameters such as pose extension and pose height are introduced in a motion snapshot mode and a multi-person motion snapshot mode, scoring parameters such as expression intensity, eye opening and closing, facial shielding and human face angles are introduced in an expression snapshot mode and a multi-person close-up mode, so that the determined snapshot frame image corresponding to the first snapshot mode can be selected based on a corresponding evaluation strategy, and higher mode weight is applied to different scoring parameters concerned by different snapshot modes, so that the snapshot image is more consistent with an actual shooting scene, an ideal snapshot effect is favorably obtained, and flexibility is improved and is suitable for more scenes.

In addition, the technical scheme provided by the application can further determine the weight of each scoring parameter based on different snapshot categories. For example, since different motion types focus on different points, the plurality of scoring parameters corresponding to the motion capture mode have different category weights for the scoring parameters arranged for the different motion types. In the motion capture mode, the multiple frames of images to be evaluated are scored based on the category weight of each scoring transmission corresponding to the motion category, and the ideal capture frame image is favorably obtained. Compared with the method for recommending the capturing frame images based on the optical flow information, the method provided by the application focuses more on the action per se, so that a better capturing effect can be obtained. As the electronic device continuously detects the captured multiple frames of images. While the image captured by the electronic device changes based on changes in the camera running time, it may be possible to switch to another snap-shot mode upon detecting that the image satisfies a trigger condition for another snap-shot mode different from the first snap-shot mode.

Optionally, the method further comprises: determining a second capturing mode which is a capturing mode different from the first capturing mode among the plurality of capturing modes preset on the basis of a plurality of newly captured images; and switches to the second snapshot mode. In order to avoid ping-pong effect caused by frequent switching of the electronic device between the plurality of snapshot modes, a guard period may be set for each of the snapshot modes. The duration of the guard period may be a predefined value. The protection periods of the snapshot modes may be the same duration or different durations, which is not limited in the present application. Therefore, before the electronic device is switched to the second capturing mode, whether the running time of the first capturing mode exceeds the preset protection period or not can be judged. If the running time of the first snapshot mode is still within the time range of the protection time period, no matter whether the detection of the newly captured multi-frame image meets the triggering condition of the second snapshot mode, the mode is not switched, and the first snapshot mode is kept running. However, if the running time length of the first capturing mode exceeds the time length range of the protection period, the switching to the second switching mode can be performed based on the detection of the newly captured multi-frame image.

When a user uses the electronic device to take a picture, the camera in the electronic device can be turned on first, and the shooting mode is manually selected. Fig. 3 shows an example of a handset interface. Fig. 3 (a) shows interface contents 301 displayed by a screen display system of the mobile phone in the unlocked state of the mobile phone. The interface content 301 may include a plurality of icons, which may correspond to a plurality of applications (apps), such as pay, microblog, photo album, camera, WeChat, etc., to name but a few.

If the user wishes to take a picture or record a video via the cell phone, the camera application may first be launched by clicking on the "camera" icon 302 on the user interface. The interface display of the mobile phone after the camera application is started may be as shown in (b) of fig. 3, and the interface may be referred to as a shooting interface of the camera. The capture interface may include a view box 303, an album icon 504, a capture control 305, and the like. The viewfinder 303 is used to capture an image for shooting preview, and can display the preview image in real time. It should be understood that the preview images described above are not necessarily saved in an album. However, in this embodiment of the present application, the preview image may be stored in a cache of the mobile phone or in another storage unit, which is not limited in this application.

Album icon 304 is used to quickly enter the album. After the mobile phone detects that the user clicks the album icon, the shot photos or videos and the like can be displayed on the screen. The capture control 305 may be used to capture a photograph or video. If the camera is in the photographing mode, when the mobile phone detects that the user clicks the photographing control, the mobile phone executes photographing operation, and stores the photographed picture, namely the photographing stream. If the camera is in a video recording mode, when the mobile phone detects that the user clicks the shooting control, the mobile phone executes video shooting operation; and when the mobile phone detects that the user clicks the shooting control again, the video shooting is finished. The mobile phone can store the recorded video. In one implementation, the video may be stored over successive frames of images, i.e., the video stream described above.

Further, the photographing interface may further include a function control 306 for setting a photographing mode, such as a portrait mode, a photographing mode, a video recording mode, a panorama mode, and the like, illustrated in (b) of fig. 3. The user can switch the shooting mode by clicking the function control. Optionally, the shooting interface may further include a camera rotation control 307, such as shown in fig. 3 (b). The camera rotation control 307 may be used to control the switching of the front camera and the rear camera.

It should be understood that fig. 3 is only for ease of understanding, and the process of opening the photographing function or other functions by the user through operation is described in detail by taking a mobile phone as an example. However, it should be understood that the cell phone interface shown in fig. 3 is only an example and should not limit the present application in any way. Different operating systems, different brands of mobile phones, and different mobile phone interfaces may be available. Moreover, the embodiment of the application can also be applied to other electronic equipment which can be used for photographing besides the mobile phone. The interfaces shown in the drawings are only examples and should not be construed as limiting the application in any way.

For a better understanding of the methods provided herein, reference will now be made in detail to several specific examples. Fig. 4 is a schematic flowchart of a method for capturing an image according to another embodiment of the present application. Fig. 4 illustrates a method in which a user may manually adjust a photographing mode to a smart snap mode. In response to a user's operation, the electronic device enters a smart snap mode. In other words, the embodiment shown in fig. 4 mainly describes a method for the electronic device to capture an image in the smart snap mode.

It should be understood that the method illustrated in fig. 4 may be performed by an electronic device or a processor in an electronic device. In step 401, a captured multi-frame image is periodically detected at a high frame rate. Specifically, the electronic device may continuously acquire a plurality of frames of images captured by the electronic device from a cache, and call the detection models (such as a face attribute detection model, a pose estimation model, and a motion detection model) to periodically detect the captured images to determine a first snapshot mode suitable for the current shooting. In the smart snap mode, the electronic device may use a higher frame rate for detection, such as 30 frames per second. The detection models may be run alternately to reduce power consumption. It should be appreciated that this step 401 may be through the entire flow of steps 401 through 410 before exiting the smart snap mode. It should also be understood that the detection of multiple frames of images by the electronic device calling the detection model is still performed by the electronic device, and therefore, in this embodiment, for brevity, no particular description is made on the process of calling the detection model to detect images.

In step 402, it is determined that a trigger condition of the first snap-shot mode is satisfied. Specifically, the electronic device may determine that a trigger condition of a certain snapshot mode is satisfied according to detection of multiple frames of images. For example, the trigger condition of the first snapshot mode is satisfied. At this point, step 403 may be performed to enable the first capture mode. When the image does not satisfy the trigger condition of any one of the capturing modes, the step 401 may be continuously executed to perform high frame rate periodic detection on the captured multi-frame images.

Since different trigger conditions have been listed for different capturing modes in the above method 200, the trigger conditions for determining whether a certain capturing mode is satisfied are described in detail in conjunction with the face attribute detection model, the pose estimation model, and the motion detection model. For brevity, no further description is provided herein. It should be noted that it does not have any influence on the captured image as to which capture mode the electronic device is specifically enabled. The electronic device simply evaluates and recommends images based on the particular snap-shot mode using the corresponding evaluation strategy.

In step 404, the electronic device may remain operating in the first snap-shot mode and perform high frame rate periodic detection of newly captured images. The electronic device may remain operating in the first snap-shot mode until it is determined from the detection of a newly captured image that a trigger condition for another snap-shot mode (e.g., noted as a second snap-shot mode) is satisfied. It should be noted that step 404 is provided for convenience of describing the following embodiments, and does not indicate that the electronic device performs a new operation. After the first capture mode is enabled in step 403, the electronic device may remain in the first capture mode and continue to periodically detect newly captured images at a high frame rate. It should be further noted that, after the electronic device activates the first capturing mode, the electronic device may run the first capturing mode in the background without prompting the user through the shooting interface, so that the user may not feel the first capturing mode; the first snapshot mode can also be operated in the foreground, and the user can perceive through the shooting interface. This is not a limitation of the present application.

In step 405, it is determined that a trigger condition of the second snapshot mode is satisfied. As the camera continues to operate, the electronic device may continue to detect newly captured images. If it is not detected that the newly captured multi-frame image satisfies the trigger condition of another capturing mode (e.g., the second capturing mode), i.e., it is determined that the trigger condition of the second capturing mode is not satisfied, step 404 may be executed to keep in the first capturing mode while continuously performing high-frame-rate periodic detection on the newly captured image.

If it is detected that the newly captured multi-frame image meets the trigger condition of the second capturing mode, step 406 may be executed to determine whether the running duration of the first capturing mode exceeds the preset protection period. If the operation duration of the first capturing mode does not exceed the preset protection period, step 404 may be executed. If the operation duration of the first capturing mode exceeds the preset protection period, step 407 may be executed to enable the second capturing mode. That is, the electronic device switches to the second capture mode. The electronic device remains operating in the second capture mode after the second capture mode is enabled until it is determined from the newly captured image that a trigger condition for another capture mode (e.g., noted as a third capture mode) is satisfied. At the same time, high frame rate periodic detection of newly captured images is maintained. For simplicity, the steps of the electronic device determining that the triggering condition of the third snap-shot mode is satisfied are not shown. However, it can be understood that, in the case that the trigger condition of the third capturing mode is satisfied, the operation performed by the electronic device may be similar to the operation performed in the case that the trigger condition of the third capturing mode is satisfied in the figure, and for brevity, the description is not repeated here.

It should be understood that the first and second capturing modes are different capturing modes, and the third and second capturing modes are different capturing modes, but the first and third capturing modes may be the same capturing mode or different capturing modes, and the application does not limit this. The electronic device operates in the smart snap mode to continuously detect captured images at a high frame rate regardless of whether the second snap mode is switched.

If the electronic device detects a photographing operation of the user, for example, the user clicks the photographing control to perform the photographing operation, step 408 may be executed to photograph and store the image in response to the photographing operation of the user. The saved image can be presented to the user through the display unit after being subsequently processed by encoding and the like. If the electronic device does not detect the photographing operation of the user, the electronic device can continue to operate in the intelligent capturing mode, and continuously perform high-frame-rate periodic detection on more newly captured images. It is to be understood that the photographing operation may be an operation performed in the first capturing mode or an operation performed in the second capturing mode depending on whether the electronic apparatus is switched to the second capturing mode before the photographing operation is performed. This is not a limitation of the present application.

In step 409, the selected multi-frame image is scored using the corresponding evaluation strategy based on the currently running snap shot mode. Specifically, the electronic device may score the first N frames of images, the last N frames of images, or the front and back N frames of images at the moment of shutter pressing based on a photographing operation of the user. Where the value of N may be predefined. The specific value of N is not limited in the present application. If N is 20, 20 frames (i.e., 40 frames in total) of images before and after the instant of shutter pressing may be scored.

The electronic equipment can call the corresponding detection model to perform image recognition on the multi-frame image according to the current running snapshot mode, and outputs the recognition result. The electronic equipment can grade each frame of image to be evaluated according to the detection result. Alternatively, the models related to the motion capture mode and the multi-person motion capture mode may include, for example, a posture estimation model and a motion detection model. Alternatively, the detection models related to the expression snapshot mode and the coauthoring mode may include, for example, a face attribute detection model. The face attribute model may include, for example, but is not limited to, a face feature point detection model, an open-closed eye model, and the like. This is not a limitation of the present application.

Assuming that the current running snapshot mode is a motion snapshot mode, the electronic device may call a pose estimation model and a motion detection model to perform image recognition on the captured image to be evaluated, so as to obtain coordinate information and a motion category of the pose point. The electronic equipment can determine an evaluation strategy according to the motion capture mode and the motion category, and score each captured frame of image to be evaluated by using the evaluation strategy.

Assuming that the currently running snapshot mode is an expression snapshot mode, the electronic device may invoke a face attribute detection model to perform image recognition on the captured image to be evaluated, so as to obtain a recognition result of one or more of expression intensity, eyes opened and closed, facial occlusion, and facial angle. The recognition result may include expression categories and associated parameters that may be used to characterize one or more of expression intensity, open and closed eyes, facial occlusion, and facial angle, among others. The electronic equipment can determine an evaluation strategy according to the expression snapshot mode and the expression category, and use the evaluation strategy to score each captured image to be evaluated.

In step 410, a snapshot frame image corresponding to the currently running snapshot mode is determined. The electronic equipment can determine the snapshot frame image corresponding to the current running snapshot mode according to the score of each frame of image to be selected. The electronic device can further acquire the image matched with the time stamp from the cache, and the image is processed and then presented to the user through the display unit.

After that, the electronic device can clear the occupied cache space during scoring and release the occupied cache space. Alternatively, the user may exit the smart snap mode by manual adjustment, or may directly exit the camera function. The electronic device also exits the smart snap mode in response to the user's operation.

Optionally, the electronic device exiting the smart snap mode may still remain in the photo mode, and the images may be periodically detected at a low frame rate. Alternatively, the electronic device exiting the camera function may stop acquiring images and stop detecting. The detection models may also be shut down.

It should be understood that the above only illustrates the first and second capturing modes for ease of understanding, but this should not constitute any limitation to the present application. As the camera run length increases, the electronic device can continuously acquire and continuously detect newly captured images. Therefore, as long as the electronic device does not have the camera function, some or all of the above steps 404 to 410 may be performed in a loop. Note that after the first capturing mode is switched to the second capturing mode, the second capturing mode becomes a new first capturing mode.

It should be further understood that fig. 4 illustrates an example of the application of the method for capturing images provided by the embodiment of the present application to a specific scene. The steps in the figures are shown only for ease of understanding. It is not necessary that every step in the flowchart be performed, for example, some steps may be skipped or some steps may be combined. The order of execution of the steps is not fixed or limited to that shown in fig. 4. The order of execution of the various steps should be determined by their function and inherent logic.

Fig. 5 is a schematic flow chart of a method for capturing an image according to another embodiment of the present application. Fig. 5 illustrates a method in which the user may turn on the smart snap mode without manual operation, for example, the user may set the photographing mode to the photographing mode. The electronic device may be based on periodic detection and may automatically initiate a smart snap mode. In other words, the embodiment shown in fig. 5 mainly describes a method for the electronic apparatus to capture an image in the photographing mode.

It should be understood that the method illustrated in fig. 5 may be performed by an electronic device or a processor in an electronic device. In step 501, the captured image is periodically detected at a low frame rate. Specifically, the electronic device may continuously obtain multiple frames of images captured by the electronic device from the cache, and call the detection models (e.g., the face attribute detection model, the body frame detection model, and the scene recognition model) to perform periodic detection to determine whether a trigger condition for entering a certain snapshot mode is satisfied. In the photographing mode, the electronic device may use a lower frame rate for detection, such as 15 frames per second, to save power consumption. It is to be understood that this step 501 may continue before entering the smart snap mode, as in step 503. This step 501 may also continue after exiting the smart snap mode, as after step 509. It should also be understood that the detection of multiple frames of images by the electronic device calling the detection model is still performed by the electronic device, and therefore, in this embodiment, for brevity, no particular description is made on the process of calling the detection model to detect images.

In step 502, it is determined that a trigger condition of the first snap-shot mode is satisfied. Specifically, the electronic device may determine that a trigger condition of a certain snapshot mode is satisfied according to detection of multiple frames of images. For example, the trigger condition of the first snapshot mode is satisfied. In the above method 200, different triggering conditions have been listed for different capturing modes, and the triggering conditions for determining whether a certain capturing mode is satisfied are described in detail in combination with the face attribute detection model, the body frame detection model, and the scene recognition model. For brevity, no further description is provided herein.

If it is determined that the detected multi-frame image does not satisfy the trigger condition of a certain capture mode, the operation in the capture mode may be continued, that is, step 501 is continued. If it is determined that the detected multi-frame image meets the trigger condition of the first capturing mode, step 503 may be executed, the intelligent capturing mode is entered, and the first capturing mode is enabled. Enabling the first snap-shot mode means that the electronic device switches to the smart snap-shot mode. Thus, entering the smart snap mode and enabling the first snap mode refer to the same operation. Furthermore, the electronic device may also perform high frame rate periodic detection of newly captured images in step 503. In other words, the electronic device switches the detection of the image from the low frame rate to the high frame rate. For example, the electronic device may detect the image at a frame rate of 30 frames per second. It should be appreciated that step 501 may be performed continuously, with high frame rate periodic detection of newly captured images, before the electronic device exits the smart snap mode.

In step 504, the electronic device remains operating in the first snap-shot mode. The electronic device may remain operating in the first snap-shot mode until it is determined from the detection of a newly captured image that a trigger condition for another snap-shot mode (e.g., noted as a second snap-shot mode) is satisfied. It should be noted that step 504 is provided for convenience of describing the following embodiments, and does not indicate that the electronic device performs a new operation. After the first capture mode is enabled in step 503, the electronic device may remain in the first capture mode and continue to periodically detect newly captured images at a high frame rate.

In step 505, it is determined that a trigger condition of the second snap-shot mode is satisfied. As the camera continues to operate, the electronic device may continue to detect newly captured images. If it is not detected that the newly captured image satisfies the trigger condition of another snapshot mode (e.g., the second snapshot mode), i.e., it is determined that the trigger condition of the second snapshot mode is not satisfied, step 504 may be performed to remain in the first snapshot mode. While continuously high frame rate periodic detection of the image. If it is determined that the newly captured image is detected to satisfy the trigger condition of the second capturing mode, whether to switch to the second capturing mode may be considered.

In order to avoid ping-pong effect caused by frequent switching of the electronic device between the plurality of snapshot modes, a guard period may be set for each of the snapshot modes. The duration of the guard period may be a predefined value and step 506 is performed to determine if the duration of operation of the first snap-shot mode exceeds the guard period. If the running time of the first capturing mode does not exceed the protection period, that is, the electronic device detects that the image meets the triggering condition of the second capturing mode in the protection period, step 504 may be executed to keep in the first capturing mode. The electronic device remaining in the first snap-shot mode may still continue to periodically detect images at a high frame rate.

If the running duration of the first capturing mode exceeds the protection period, that is, the electronic device detects that the newly captured image meets the trigger condition of the second capturing mode outside the protection period, step 507 may be executed to enable the second capturing mode, or to switch from the first capturing mode to the second capturing mode. The second snapshot mode enabled electronic device may also continuously perform high frame rate periodic detection of newly captured images. The electronic device may remain operating in the second capture mode after the second capture mode is enabled until it is determined from the newly captured image that a trigger condition for another capture mode (e.g., noted as a third capture mode) is satisfied. For simplicity, the steps of determining that the trigger condition of the third snap-shot mode is satisfied are not shown in the figure. However, it can be understood that, in the case that the trigger condition of the third capturing mode is satisfied, the operation performed by the electronic device may be similar to the operation performed in the case that the trigger condition of the third capturing mode is satisfied in the figure, and for brevity, the description is not repeated here. Whether the electronic device is switched to the second snapshot mode or not, the captured images can be continuously detected at a high frame rate as long as the electronic device operates in the smart snapshot mode.

In step 508, it is determined whether a photographing operation is detected within a preset period. Since the detection of the newly captured image in the smart snap mode is a high frame rate detection, power consumption is large. In order to reduce power consumption, the electronic device can automatically exit the intelligent snapshot mode when the photographing operation is not detected for a long time. The electronic device exiting the smart snap mode may revert to the photo mode. The user may not perceive that the electronic device is exiting the smart snap mode.

Specifically, in a preset time period after the first capturing mode or the second capturing mode is enabled, if the photographing operation is not detected, step 509 may be executed to exit the smart capturing mode and return to the photographing mode. And then step 501 and the following steps may be repeatedly performed until the user exits the camera function. The duration of the preset period may be a predefined value. For example, the timing may be started while the electronic device enables the first snap-shot mode at step 503 or the second snap-shot mode at step 507. For example, a timer is started, and the running time of the timer may be the preset time period. When the photographing operation of the user is not detected within the preset time period, for example, the timer runs out of time, step 509 may be executed to exit the smart snap mode and return to the photographing mode. If the running time of the timer is not reached, the intelligent snapshot mode can be continuously operated all the time. If the photographing operation of the user is detected within the preset time period, step 510 may be executed to photograph and store the image in response to the photographing operation of the user.

In step 511, based on the currently running snapshot mode, multiple frames of images to be evaluated are scored using the corresponding evaluation strategy. In step 512, a snapshot frame image corresponding to the currently running snapshot mode is determined. The specific process of steps 510 to 512 is the same as the specific process of steps 406 to 408 in the above embodiment, and since the detailed description of steps 406 to 408 has already been made above, it is not repeated here for the sake of brevity. Thereafter, if the user performs an operation to exit the camera, the electronic device may exit the camera function in response to the user's operation. And the electronic equipment quitting the camera function stops acquiring the image and stops detecting. The detection models may also be shut down.

It should be understood that the duration of the guard period in step 506 may be the same as or different from the duration of the preset period in step 508, and this application does not limit this. If the two are the same, one timer can be shared; if the two are different, separate timers may be used. Of course, the timing by the timer is only one possible implementation manner, and should not constitute any limitation to the present application.

It should be understood that some steps and descriptions in fig. 5 may refer to the relevant descriptions in fig. 4, and are not repeated here for brevity. It should be further understood that fig. 5 illustrates an example of the application of the method for capturing images provided by the embodiment of the present application to a specific scene. The steps in the figures are shown only for ease of understanding. It is not necessary that every step in the flowchart be performed, for example, some steps may be skipped, for example, steps 505 to 507, or some steps may be combined, for example, steps 503 and 504. The order of execution of the steps is not fixed or limited to that shown in fig. 5. The order of execution of the various steps should be determined by their function and inherent logic.

Fig. 6 is a schematic flow chart of a method for capturing an image according to another embodiment of the present application. In the method shown in fig. 6, the user may not turn on the smart snap mode through manual operation, for example, the user may set the shooting mode to the video recording mode. The electronic device may be based on periodic detection and may automatically initiate a smart snap mode. In other words, the embodiment shown in fig. 6 mainly describes a method for the electronic device to capture an image in the video recording mode.

It should be understood that the method illustrated in fig. 6 may be performed by an electronic device or a processor in an electronic device. In step 601, a low frame rate periodic detection is performed on the captured multi-frame images. In step 602, it is determined that a trigger condition of the first snap-shot mode is satisfied. If the electronic device determines that the trigger condition of the first capturing mode is satisfied according to the detection of the captured multi-frame image, step 603 may be executed to enable the first capturing mode while recording the video. The first capture mode may be a mode that operates in the background. From the point of view of the shooting interface, the electronic device continues to record video. Enabling the first snap-shot mode means that the electronic device is enabled with the smart snap-shot mode. In other words, the video recording mode and the smart snap mode operate in parallel. Further, the electronic device starts high frame rate periodic detection of the newly captured image. That is, the electronic device switches the detection of the image from the low frame rate to the high frame rate. For example, the electronic device may detect the preview image at a frame rate of 30 frames per second. And the newly captured image may be continuously detected periodically at a high frame rate before the electronic device exits the smart snap mode.

It should also be understood that the detection of multiple frames of images by the electronic device calling the detection model is still performed by the electronic device, and therefore, in this embodiment, for brevity, no particular description is made on the process of calling the detection model to detect images. In step 604, the electronic device remains running the first snap shot mode in the background. The electronic device may keep running the first snap-shot mode in the background until it is determined from the detection of a newly captured image that a trigger condition for another snap-shot mode (e.g., noted as a second snap-shot mode) is satisfied. It should be noted that step 604 is provided for convenience of describing the following embodiments, and does not indicate that the electronic device performs a new operation. After the first capture mode is enabled in step 603, the electronic device may keep running the first capture mode in the background and continuously perform high frame rate periodic detection on the newly captured image.

In step 605, it is determined that the trigger condition of the second snap-shot mode is satisfied. As the camera continues to operate, the electronic device may continue to detect newly captured images. If it is not detected that the newly captured image satisfies the trigger condition of another capturing mode (e.g., the second capturing mode), i.e., it is determined that the trigger condition of the second capturing mode is not satisfied, step 604 may be executed to keep the first capturing mode running in the background while continuously performing high frame rate periodic detection on the newly captured image. If it is detected that the newly captured image satisfies the trigger condition of the second capturing mode, whether to switch to the second capturing mode may be considered.

In order to avoid ping-pong effect caused by frequent switching of the electronic device between the plurality of snapshot modes, a guard period may be set for each of the snapshot modes. The duration of the guard period may be a predefined value and step 606 is performed to determine whether the duration of operation of the first snap-shot mode exceeds the guard period. If the running time of the first capturing mode does not exceed the protection period, that is, the electronic device detects that the newly captured image meets the trigger condition of the second capturing mode in the protection period, step 604 may be executed to keep in the first capturing mode. The electronic device remaining in the first snap-shot mode may continue to periodically detect newly captured images at a high frame rate. If the running duration of the first capturing mode exceeds the protection period, that is, the electronic device detects that the image meets the trigger condition of the second capturing mode outside the protection period, step 607 may be executed to enable the second capturing mode, or to switch from the first capturing mode to the second capturing mode. The electronic device may also continuously perform high frame rate periodic detection of newly captured images while the electronic device is enabled in the second capture mode. The electronic device may, after enabling the second snap-shot mode, keep running the second snap-shot mode in the second snap-shot mode until it is determined from the newly captured image that a trigger condition of another snap-shot mode (e.g. noted as a third snap-shot mode) is fulfilled. For simplicity, the steps of determining that the trigger condition of the third snap-shot mode is satisfied are not shown in the figure. However, it can be understood that, in the case that the trigger condition of the third capturing mode is satisfied, the operation performed by the electronic device may be similar to the operation performed in the case that the trigger condition of the third capturing mode is satisfied in the figure, and for brevity, the description is not repeated here. Whether the second snapshot mode is switched or not, as long as the intelligent snapshot mode still runs in the background of the electronic device, the captured images can be continuously detected at a high frame rate.

In step 608, it is determined whether a photographing operation is detected within a preset period. Since the detection of the newly captured image in the smart snap mode is a high frame rate detection, power consumption is large. In order to reduce power consumption, the electronic device can automatically exit the intelligent snapshot mode when the photographing operation is not detected for a long time. The electronic device that exits the smart snap mode may still continue recording. The user may not perceive that the electronic device is exiting the smart snap mode. In a preset time period after the second snapshot mode is enabled, if the photographing operation is not detected, step 609 can be executed, the intelligent snapshot mode is exited, and the video recording mode keeps running. And then step 601 and the following steps may be repeatedly performed until the user exits the camera function. In a preset time period after the first capturing mode or the second capturing mode is enabled, if a photographing operation is detected, step 610 may be executed to perform photographing and save an image in response to the photographing operation of the user.

In step 611, based on the currently running snapshot mode, multiple frames of images to be selected are scored using the corresponding evaluation strategy. In step 612, a snapshot frame image corresponding to the currently running snapshot mode is determined.

In another implementation, after the first capturing mode is enabled in step 603 or the second capturing mode is enabled in step 607, the electronic device may continuously perform image recognition and scoring on each frame of image, and recommend the frame of image to the user when the scoring result exceeds a preset threshold. When the image exceeding the preset threshold is more than one frame, the image with the highest score can be recommended to the user. In this case, steps 610 to 612 may not be performed. In this case, the image to be evaluated may refer to all images acquired by the electronic device during the video recording process.

It should be understood that the steps in fig. 6 are similar to the steps described above in conjunction with fig. 5, and the specific processes thereof may refer to the above description, and are not repeated here for brevity. It should be further understood that fig. 6 illustrates an example of the application of the method for capturing images provided by the embodiment of the present application to a specific scene. The steps in the figures are shown only for ease of understanding. It is not necessary that every step in the flowchart be performed, for example, some steps may be skipped, for example, steps 605 to 607, 610 to 612, or some steps may be combined, for example, steps 603 and 604. The order of execution of the steps is not fixed or limited to that shown in fig. 6. The order of execution of the various steps should be determined by their function and inherent logic.

In the above, the method for capturing an image provided by the embodiment of the present application is described with reference to various possible scenes. It should be understood that these scenarios are not intended to constitute any limitation on the scenarios applicable to this application. When the electronic device can select a specific snapshot mode through manual adjustment, the process of scoring the preview image and recommending the optimal frame based on the scoring parameters corresponding to the snapshot modes such as the motion snapshot mode, the expression snapshot mode, the multi-person motion snapshot mode, the close-up snapshot mode and the like provided in the embodiment of the application can be used independently.

For example, in the motion capture mode, the image to be evaluated is evaluated and recommended based on the scoring parameters and the mode weights corresponding to the motion capture mode, such as the posture height, the posture extension degree, and the like. And further determining the category weight of each scoring parameter according to the motion category so as to determine the snapshot frame images matched with different motion categories. The determined snapshot frame image is more likely to find out a highlight image at the moment of motion because the user pays more attention to the motion details, and the highlight image is recommended to the user. Therefore, the captured image is more suitable for shooting scenes, and the capturing effect is better.

Based on the technical scheme, by presetting a plurality of different snapshot modes and one or more corresponding evaluation strategies, the images to be evaluated can be scored by adopting different scoring parameters and mode weights according to the different snapshot modes. For example, scoring parameters such as gesture extension degree, gesture height and body shielding are introduced in a motion snapshot mode and a multi-person motion snapshot mode, and scoring parameters such as expression intensity, eye opening and closing, facial shielding and face angle are introduced in an expression snapshot mode and a multi-person close-up mode, so that snapshot frame images recommended to a user can be selected based on evaluation strategies corresponding to different snapshot modes.

In addition, the technical scheme provided by the application can further determine the category weight of each scoring parameter based on different snapshot categories. For example, since different capturing categories have different emphasis points, the category weights of the same scoring parameter configured for different capturing categories are different among the plurality of scoring parameters corresponding to the capturing mode. Therefore, it is advantageous to obtain an ideal snap-shot frame image.

Particularly, in the motion snapshot mode, higher mode weight is given to scoring parameters such as the posture height and the posture extension degree for expressing motion details. And the category weights of the same scoring parameter configured for different action categories can be different according to different attention details of different action categories. Compared with the method for recommending the optimal frame based on the optical flow information, the method provided by the application focuses more on the action per se, so that a better motion snapshot effect can be obtained.

The method for capturing images according to the embodiment of the present application is described in detail with reference to fig. 2 to 6. Hereinafter, an apparatus for capturing an image according to an embodiment of the present application will be described in detail with reference to fig. 7.

Fig. 7 is a schematic block diagram of an apparatus 700 for capturing an image according to an embodiment of the present disclosure. As shown in fig. 7, the apparatus 700 may include a mode determining unit 710 and a snap frame determining unit 720.

Specifically, the mode determination unit 710 is configured to determine a first snap-shot mode among a plurality of preset snap-shot modes based on a captured multi-frame image; the snap-shot frame determination unit 720 is configured to determine a snap-shot frame image corresponding to the first snap-shot mode from among the captured multiple frames of images to be evaluated, using the evaluation policy corresponding to the first snap-shot mode; the evaluation policy is one of a plurality of preset evaluation policies.

Optionally, the plurality of snap-shot modes include one or more of: expression snapshot mode, closed photo snapshot mode, motion snapshot mode, multi-person motion snapshot mode, pet snapshot mode, and landscape snapshot mode. Optionally, each of the plurality of capturing modes corresponds to at least one evaluation strategy, each evaluation strategy comprising one or more scoring parameters for image scoring and a mode weight for each scoring parameter.

Optionally, the snap-shot frame determination unit 720 is configured to calculate a score of each frame of the image to be evaluated in the multiple frames of images to be evaluated, using one or more scoring parameters in at least one evaluation policy of multiple preset evaluation policies corresponding to the first snap-shot mode and a mode weight of each scoring parameter; and the snapshot frame image corresponding to the first snapshot mode is determined in the multiple frames of images to be evaluated according to the multiple scores of the multiple frames of images to be evaluated.

Optionally, the snap frame image has the highest score among the plurality of images to be evaluated. Optionally, scoring parameters included in different evaluation strategies corresponding to different snapshot modes are the same, and mode weights included in the different evaluation strategies are different.

Optionally, each snapshot mode includes one or more snapshot categories, and each snapshot category corresponds to one evaluation policy; in at least one evaluation policy corresponding to the first capturing mode, each evaluation policy includes one or more scoring parameters corresponding to the first capturing mode and a mode weight of each scoring parameter and a category weight corresponding to one capturing category. Optionally, the mode determining unit 710 is further configured to determine a first capturing category in the first capturing mode according to the multi-frame image; the snap-shot frame determination unit 720 is configured to calculate a score of each frame of the image to be evaluated in the plurality of frames of images to be evaluated, using one or more scoring parameters corresponding to the first snap-shot mode and the mode weight of each scoring parameter, and the category weight of each scoring parameter corresponding to the first snap-shot category. Optionally, scoring parameters included in different evaluation strategies corresponding to different snapshot categories are the same, and category weights included in the different evaluation strategies are different.

Optionally, the snapshot frame determining unit 720 is further configured to invoke at least one detection model corresponding to the first snapshot mode to perform image recognition on the multiple frames of images to be evaluated, so as to output a recognition result; and for determining a value of the one or more scoring parameters based on the recognition result.

Optionally, when the first capturing mode is a motion capturing mode or a multi-person motion capturing mode, the at least one detection model includes a posture estimation model and a motion detection model. Optionally, the first snapshot mode is an expression snapshot mode or a contract snapshot mode, and the at least one detection model includes a face attribute detection model.

Optionally, the mode determining unit 710 is further configured to perform mode detection on the captured multi-frame image based on the first frame rate to determine a first capturing mode among a plurality of preset capturing modes; the snapshot frame determining unit 720 is further configured to invoke at least one detection model corresponding to the first snapshot mode, and perform image recognition on the multiple frames of images to be evaluated at a second frame rate; wherein the first frame rate is less than the second frame rate.

In particular, the apparatus 700 may include means for performing the method performed by the electronic device in the embodiment of the method 200 in fig. 2. The mode determination unit 710 may be configured to perform step 210 of the method 200, and the snap frame determination unit 720 may be configured to perform steps 220 to 240 of the method 200. The apparatus 700 may also include one or more detection models. In a specific implementation process, the mode determining unit 710 may invoke the one or more detection models to perform image detection; the snap frame determination unit 720 may also invoke the one or more detection models for image recognition.

The apparatus 700 may also be used to perform the methods performed by the electronic device in the embodiments of fig. 4-6. Also, the units and other operations and/or functions described above in the apparatus 700 are respectively for implementing the corresponding flows of the embodiments in fig. 4 to fig. 6. For the sake of brevity, no further details are provided here.

It should be understood that the apparatus 700 for capturing an image may correspond to at least part of an electronic device in method embodiments according to embodiments of the present application. For example, the apparatus 700 may be the electronic device, or a component in the electronic device, such as a chip or a system of chips. Specifically, the functions implemented by the apparatus for capturing images 700 may be implemented by one or more processors executing corresponding programs.

The application also provides an electronic device or an apparatus 700 therein. The electronic device or apparatus 700 may include one or more processors for implementing the functions of the apparatus 700 for capturing images described above. The one or more processors may include or execute, for example, the mode determination unit, the snap frame determination unit, and the one or more detection models, etc., described in the above embodiments. The one or more processors may correspond to, for example, processor 110 in electronic device 100 illustrated in FIG. 1. The mode determination unit, the snap frame determination unit and the one or more detection models may be software, hardware or a combination thereof. The software may be executed by a processor and the hardware may be embedded in the processor.

Optionally, the electronic device or apparatus 700 also includes one or more memories. The one or more memories are used to store computer programs, and/or data, such as images captured by a camera, etc. The one or more memories may correspond, for example, to memory 120 in electronic device 100 shown in FIG. 1. Optionally, the electronic device may further include a camera, a display unit, and the like. The camera may correspond to, for example, camera 190 in electronic device 100 shown in fig. 1. The display unit may for example correspond to the display unit 170 in the electronic unit 100 shown in fig. 1. The processor may retrieve a computer program stored in the memory to perform the method flows referred to by the above embodiments. The memory further includes the one or more preset detection models so that the processor can retrieve the one or more detection models from the memory.

The present application further provides a computer storage medium, in which computer instructions are stored, and when the computer instructions are run on an electronic device, the electronic device executes the above related method steps to implement the method for capturing an image in the above embodiment. The computer storage medium may correspond to, for example, memory 120 in electronic device 100 shown in FIG. 1. The mode determination unit, the snap frame determination unit, and the one or more detection models according to the embodiment of fig. 7 may be in the form of software and stored in the computer storage medium.

The present application further provides a computer program product, which can be stored in the computer storage medium, and when the computer program product runs on a computer, the computer executes the above related steps to implement the method for capturing images in the above embodiments.

The apparatus, the electronic device, the computer storage medium, the computer program product, or the chip for capturing an image provided in the embodiment of the present application are all configured to execute the corresponding method provided above, and therefore, beneficial effects achieved by the apparatus, the electronic device, the computer storage medium, the computer program product, or the chip can refer to the beneficial effects in the corresponding method provided above, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as computer software, electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of capturing an image, comprising:

determining a first snapshot mode in a plurality of preset snapshot modes according to the captured multi-frame images;

determining a snapshot frame image corresponding to the first snapshot mode in a plurality of captured images to be evaluated by using an evaluation strategy corresponding to the first snapshot mode, wherein the evaluation strategy is one of a plurality of preset evaluation strategies;

each of the plurality of capturing modes corresponds to at least one evaluation strategy in the plurality of preset evaluation strategies, and each evaluation strategy comprises one or more scoring parameters for image scoring and a mode weight of each scoring parameter; and

the determining, by using an evaluation policy corresponding to the first capturing mode, a capturing frame image corresponding to the first capturing mode from among a plurality of captured frames of images to be evaluated includes:

calculating the grade of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more grade parameters in one of at least one evaluation strategy corresponding to the first snapshot mode and the mode weight of each grade parameter;

and determining the snapshot frame image corresponding to the first snapshot mode in the multiple frames of images to be evaluated according to the multiple scores of the multiple frames of images to be evaluated.

2. The method of claim 1, wherein the plurality of snap-shot modes comprise one or more of: expression snapshot mode, closed photo snapshot mode, motion snapshot mode, multi-person motion snapshot mode, pet snapshot mode, and landscape snapshot mode.

3. The method of claim 2, wherein the capture frame image has a highest score among the plurality of images to be evaluated.

4. The method of claim 3, wherein scoring parameters included in different evaluation strategies corresponding to different snapshot modes are the same, and mode weights included in the different evaluation strategies are different.

5. The method of claim 3, wherein each snapshot mode includes one or more snapshot categories, each snapshot category corresponding to an evaluation policy; in at least one evaluation policy corresponding to the first capturing mode, each evaluation policy including one or more scoring parameters corresponding to the first capturing mode, a mode weight for each scoring parameter, and a category weight corresponding to one capturing category; and

the determining a first snap-shot mode among a plurality of preset snap-shot modes further comprises: determining a first snapshot category in the first snapshot mode according to the multi-frame image;

the calculating the score of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more scoring parameters in one of at least one evaluation strategy corresponding to the first snapshot mode and the mode weight of each scoring parameter comprises:

and calculating the grade of each frame of image to be evaluated in the plurality of frames of images to be evaluated by using one or more grading parameters corresponding to the first snapshot mode, the mode weight of each grading parameter and the category weight of each grading parameter corresponding to the first snapshot category.

6. The method of claim 5, wherein scoring parameters included in different evaluation strategies for different snapshot categories are the same, and category weights included in the different evaluation strategies are different.

7. The method of claim 1, wherein the method further comprises:

calling at least one detection model corresponding to the first snapshot mode to perform image recognition on the multiple frames of images to be selected so as to output recognition results;

determining a value of the one or more scoring parameters based on the recognition result.

8. The method of claim 7, wherein when the first capture mode is a motion capture mode or a multi-person motion capture mode, the at least one detection model comprises a pose estimation model and a motion detection model; or

The first snapshot mode is an expression snapshot mode or a closed photo snapshot mode, and the at least one detection model comprises a human face attribute detection model.

9. The method according to claim 7 or 8, wherein the determining of the first capturing mode among the plurality of preset capturing modes based on the captured plurality of frame images comprises:

performing mode detection on the captured multi-frame images based on a first frame rate to determine a first capturing mode in a plurality of preset capturing modes;

the calling of at least one detection model corresponding to the first snapshot mode for image recognition of the multiple frames of images to be selected comprises the following steps:

calling at least one detection model corresponding to the first snapshot mode, and carrying out image recognition on the multiple frames of images to be selected at a second frame rate;

wherein the first frame rate is less than the second frame rate.

10. An apparatus for capturing an image, comprising:

a mode determination unit configured to determine a first snap-shot mode among a plurality of preset snap-shot modes based on a captured multi-frame image;

the snapshot frame determining unit is used for determining a snapshot frame image corresponding to the first snapshot mode in a plurality of captured images to be evaluated by using an evaluation strategy corresponding to the first snapshot mode, wherein the evaluation strategy is one of a plurality of preset evaluation strategies;

each of the plurality of capturing modes corresponds to at least one evaluation strategy in the plurality of preset evaluation strategies, and each evaluation strategy comprises one or more scoring parameters for image scoring and a mode weight of each scoring parameter;

the capturing frame determining unit is specifically configured to calculate a score of each frame of images to be evaluated in the multiple frames of images to be evaluated by using one or more scoring parameters in one of at least one evaluation strategy corresponding to the first capturing mode and a mode weight of each scoring parameter; and the snapshot frame image corresponding to the first snapshot mode is determined in the multiple frames of images to be evaluated according to the multiple scores of the multiple frames of images to be evaluated.

11. The apparatus of claim 10, wherein the plurality of snap-shot modes comprise one or more of: expression snapshot mode, closed photo snapshot mode, motion snapshot mode, multi-person motion snapshot mode, pet snapshot mode, and landscape snapshot mode.

12. The apparatus of claim 10, wherein the capture frame image has a highest score among the plurality of images to be evaluated.

13. The apparatus of claim 10, wherein scoring parameters included in different evaluation strategies corresponding to different capturing modes are the same, and the mode weights included in the different evaluation strategies are different.

14. The apparatus of claim 10, wherein each snapshot mode includes one or more snapshot categories, each snapshot category corresponding to an evaluation policy; in at least one evaluation policy corresponding to the first capturing mode, each evaluation policy including one or more scoring parameters corresponding to the first capturing mode, a mode weight for each scoring parameter, and a category weight corresponding to one capturing category;

the mode determining unit is further used for determining a first capturing category in the first capturing mode according to the multi-frame image;

the snapshot frame determination unit is further used for calculating the grade of each frame of image to be evaluated in the multiple frames of images to be evaluated by using one or more grading parameters corresponding to the first snapshot mode, the mode weight of each grading parameter and the category weight of each grading parameter corresponding to the first snapshot category.

15. The apparatus of claim 14, wherein scoring parameters included in different evaluation strategies for different snapshot categories are the same, and category weights included in different evaluation strategies are different.

16. The apparatus of claim 10, wherein the apparatus further comprises one or more detection models;

the snapshot frame determining unit is further used for calling at least one detection model corresponding to the first snapshot mode to perform image recognition on the multiple frames of images to be selected so as to output recognition results; and for determining a value of the one or more scoring parameters based on the recognition result.

17. The apparatus of claim 16, wherein when the first snap-shot mode is a motion snap-shot mode or a multi-person motion snap-shot mode, the at least one detection model comprises a pose estimation model and a motion detection model; or

18. The apparatus according to claim 16 or 17, wherein the mode determination unit is specifically configured to perform mode detection on the captured multi-frame image based on a first frame rate to determine a first capturing mode among a plurality of preset capturing modes;

the capturing frame determining unit is specifically configured to invoke at least one detection model corresponding to the first capturing mode, and perform image recognition on the multiple frames of images to be selected at a second frame rate;

wherein the first frame rate is less than the second frame rate.

19. An apparatus for capturing images, comprising a processor for retrieving and executing a computer program from a memory to perform the method of any one of claims 1 to 9.

20. A computer-readable storage medium, comprising a computer program which, when run on an electronic device or a processor, causes the electronic device or the processor to perform the method of any of claims 1 to 9.