CN117336597A

CN117336597A - Video shooting method and related equipment

Info

Publication number: CN117336597A
Application number: CN202311128121.XA
Authority: CN
Inventors: 常玲丽; 杜远超; 张博; 崔瀚涛
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2024-01-02
Also published as: CN115802144A; CN115802144B

Abstract

The application provides a video shooting method and related equipment, wherein the method comprises the following steps: acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face or not; if the video frame data contains a human face, calculating the proportion of the size of the human face area in the video frame data; judging whether the proportion is larger than or equal to a first preset value; if the ratio is greater than or equal to a first preset value, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode; if the proportion is smaller than the first preset value, judging whether the proportion is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value; if the ratio is smaller than or equal to a second preset value, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode. According to the embodiment of the application, when the portrait video is shot, the video mode recommendation can be performed based on the size proportion of the face area, and the display effect of the portrait video is effectively optimized.

Description

Video shooting method and related equipment

Technical Field

The application relates to the technical field of intelligent terminals, in particular to a video shooting method and related equipment.

Background

With the development of terminal technology, users have increasingly demanded video capturing functions of electronic devices. Currently, an electronic device may set a plurality of video modes according to different shooting scenes, so that a user may select to perform video shooting. However, in actual use, the user may not determine the current shooting scene, and does not know which video mode to select, or even if the shooting scene is determined, how to select the corresponding video mode. For example, when the shot content includes a portrait, the user may select a portrait mode, so that the electronic device may perform optimization processing on a portrait effect in the video, however, when the portrait is far away from the electronic device, a proportion of the face occupied in the shooting interface may be small, and even if the portrait is processed, the optimization effect of the portrait may be poor, resulting in poor shooting effect of the portrait video, thereby affecting the user experience.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a video shooting method and related apparatus, which solve the technical problem that when shooting a portrait video scene, the shooting effect of the portrait video is poor due to the fact that the video mode selected by the user is not matched with the portrait.

In a first aspect, the present application provides a video capturing method, the method including: acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face or not; if the video frame data contains a human face, calculating the proportion of the size of a human face area in the video frame data; judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value or not; if the proportion of the size of the face area in the video frame data is larger than or equal to the first preset value, determining that a recommended video mode is a portrait mode, and performing video shooting based on the portrait mode; if the proportion of the size of the face area in the video frame data is smaller than the first preset value, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value; if the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode. Through the technical scheme, when the photographed human image video is photographed, the video mode recommendation can be performed based on the human face proportion, when the human face proportion is large, the video photographing is performed in the human image mode, the display effect of the human image video is effectively optimized, when the human face proportion is small, the video photographing is performed in the principal angle mode, and the display effect of the principal angle human image is effectively optimized.

In one possible implementation manner, the calculating a proportion of the size of the face area in the video frame data includes: identifying the face area identified in the video frame data by adopting a rectangular frame; determining the size of the face area based on the rectangular frame; and calculating the proportion between the size of the face area and the size of the video frame image to obtain the proportion of the size of the face area in the video frame data. Through the technical scheme, the size proportion of the face area can be accurately calculated based on the rectangular frame for identifying the face area.

In one possible implementation manner, the determining the size of the face area based on the rectangular box includes: and determining the width value of the rectangular frame in the video frame image, which marks the face area, as the width value of the face area, and determining the height value of the rectangular frame as the height value of the face area. By the technical scheme, the width value and the height value of the size of the face area can be accurately determined based on the size of the rectangular frame for identifying the face area.

In one possible implementation, the calculating a ratio between the size of the face region and the size of the video frame image includes: and calculating the ratio between the width value of the face area and the width value of the video frame image. By the technical scheme, the size proportion of the face area can be rapidly and accurately calculated.

In one possible implementation, the calculating a ratio between the size of the face region and the size of the video frame image includes: and calculating the ratio between the height value of the face area and the height value of the video frame image. By the technical scheme, the size proportion of the face area can be rapidly and accurately calculated.

In one possible implementation, the calculating a ratio between the size of the face region and the size of the video frame image includes: and calculating the ratio between the area of the face area and the area of the video frame image. By the technical scheme, the size proportion of the face area can be rapidly and accurately calculated.

In one possible implementation, the identifying whether the video frame data includes a face includes: performing format conversion on each video frame image in the video frame data to obtain a video stream; carrying out face recognition on each video frame image in the video stream, and judging whether the video frame data contains a face or not; and if the video frame images with the continuous preset number are identified to contain faces, determining that the video frame data contain faces. By the technical scheme, whether the video frame data contains the human face can be accurately identified.

In one possible implementation manner, the capturing video based on the portrait mode includes: and carrying out blurring processing on the video frame data shot by the camera. Through the technical scheme, when the size proportion of the face area is large, the figure is highlighted, and the display effect of the figure is optimized.

In one possible implementation manner, the blurring processing on the video frame data shot by the camera includes: carrying out portrait matting on the video frame image, and extracting a portrait region in the video frame image; blurring the background area of the video frame image; and fusing the extracted portrait region with the virtual background region. By the technical scheme, background blurring processing can be accurately carried out on the portrait video, so that the portrait is highlighted, and the display effect of the portrait is optimized.

In one possible implementation manner, the blurring the background area of the video frame image includes: and carrying out Gaussian blur processing on the background area to obtain the blurred background area. By the technical scheme, the background blurring processing efficiency can be improved.

In one possible implementation manner, the video capturing based on the principal angle mode includes: shooting panoramic video and main angle portrait video; and displaying the video frame data of the main angle portrait video in the video frame data of the panoramic video in a picture-in-picture mode. Through the technical scheme, when the size proportion of the face area in the portrait video is smaller, the portrait video can be amplified and displayed, and the display effect of the portrait video is effectively optimized.

In a second aspect, the present application provides an electronic device comprising a memory and a processor: wherein the memory is used for storing program instructions; the processor is configured to read and execute the program instructions stored in the memory, and when the program instructions are executed by the processor, cause the electronic device to execute the video capturing method described above.

In a third aspect, the present application provides a chip coupled to a memory in an electronic device, where the chip is configured to control the electronic device to perform the video capturing method described above.

In a fourth aspect, the present application provides a computer storage medium storing program instructions that, when executed on an electronic device, cause the electronic device to perform the video capturing method described above.

In addition, the technical effects of the second aspect to the fourth aspect may be referred to in the description related to the method designed in the method section, and are not repeated here.

Drawings

Fig. 1 is an interface schematic diagram of a camera application of an electronic device according to an embodiment of the present application.

Fig. 2 is another interface schematic diagram of a camera application of the electronic device according to an embodiment of the present application.

Fig. 3 is a software architecture diagram of an electronic device according to an embodiment of the present application.

Fig. 4 is a flowchart of a video capturing method according to an embodiment of the present application.

Fig. 5 is a schematic architecture diagram of a video capturing system according to an embodiment of the present application.

Fig. 6 is an effect diagram of recognizing a face image using a cascade classifier according to an embodiment of the present application.

Fig. 7 is a flowchart of calculating a proportion of a size of a face area in video frame data according to an embodiment of the present application.

Fig. 8 is a flowchart of blurring processing on video frame data shot by a camera according to an embodiment of the present application.

Fig. 9 is a schematic architecture diagram of a video capturing system according to another embodiment of the present application.

Fig. 10 is a flowchart of a video capturing method according to another embodiment of the present application.

Fig. 11 is a flowchart of a video capturing method according to another embodiment of the present application.

Fig. 12 is a partial flowchart of a video capturing method according to another embodiment of the present application.

Fig. 13 is a partial flowchart of a video capturing method according to another embodiment of the present application.

Fig. 14 is a partial flowchart of a video capturing method according to another embodiment of the present application.

Fig. 15 is a partial flowchart of a video capturing method according to another embodiment of the present application.

Fig. 16 is a flowchart of a video capturing method according to another embodiment of the present application.

Fig. 17 is a schematic diagram of decision factors of intelligent scene detection according to an embodiment of the present application.

Fig. 18 is a schematic diagram of video specification of a video mode according to an embodiment of the present application.

Fig. 19 is a flowchart of a video capturing method according to another embodiment of the present application.

Fig. 20 is a hardware architecture diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The terms "first" and "second" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It should be understood that, "/" means or, unless otherwise indicated herein. For example, A/B may represent A or B. The term "and/or" in this application is merely an association relationship describing an association object, and means that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist simultaneously, and B exists alone. "at least one" means one or more. "plurality" means two or more than two. For example, at least one of a, b or c may represent: seven cases of a, b, c, a and b, a and c, b and c, a, b and c.

The User Interface (UI) in the embodiment of the present application is a media Interface for interaction and information exchange between an application program or an operating system and a User, and may implement conversion between an internal form of information and an acceptable form of the User. The user interface of the application program is source code written in a specific computer language such as JAVA, extensible markup language (extensible markup language, XML) and the like, and the interface source code is analyzed and rendered on the electronic equipment and finally presented as content which can be identified by a user, such as a control of pictures, words, buttons and the like. Controls, which are basic elements of a user interface, are typically buttons (buttons), gadgets, toolbars, menu bars, text boxes, scroll bars, pictures, and text. The properties and content of the controls in the interface are defined by labels or nodes, such as XML specifies the controls contained in the interface by nodes of < Textview >, < ImgView >, < VideoView >, etc. One node corresponds to a control or attribute in the interface, and the node is rendered into visual content for a user after being analyzed and rendered. In addition, many applications, such as the interface of a hybrid application (hybrid application), typically include web pages. A web page, also referred to as a page, is understood to be a special control embedded in an application interface, which is source code written in a specific computer language, such as hypertext markup language (hyper text markup language, HTML), cascading style sheets (cascading style sheets, CSS), JAVA script (JavaScript, JS), etc., and which can be loaded and displayed as user-identifiable content by a browser or web page display component similar to the browser functionality. The specific content contained in a web page is also defined by tags or nodes in the web page source code, such as HTML defines the elements and attributes of the web page by < p >, < img >, < video >, < canvas >.

A commonly used presentation form of the user interface is a graphical user interface (graphic user interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in a display screen of the electronic device.

The following embodiments and features of the embodiments may be combined with each other without conflict.

With the development of terminal technology, users have increasingly demanded video capturing functions in electronic devices. Currently, an electronic device may set a plurality of video modes according to different shooting scenes, so that a user may select to perform video shooting. However, in actual use, the user may not determine the current shooting scene, and does not know which video mode to select, or even if the shooting scene is determined, how to select the corresponding video mode. For example, when capturing a portrait including a portrait, the user may select a portrait mode, so that the electronic device may process the portrait in the video, however, when the portrait is far away from the electronic device, the proportion of the face occupied at the capturing interface may be small, and even if the portrait is processed, the optimizing effect of the portrait may be poor, resulting in poor capturing effect of the portrait video, thereby affecting the user experience.

In order to avoid poor shooting effect of the portrait video caused by mismatching of the video mode selected by the user and the portrait when shooting the portrait video scene, the embodiment of the application provides a video shooting method, when shooting the portrait video, recommendation of the video mode can be performed based on the proportion of the face, so that the portrait video with better effect is automatically generated, the video shooting method is suitable for the portrait video shooting requirement of the user, and user experience is effectively improved.

In order to better understand the video capturing method provided in the embodiment of the present application, an application scenario of the video capturing method in the embodiment of the present application is described below with reference to fig. 1 and fig. 2.

Referring to fig. 1, when a user opens a camera application of an electronic device and selects video recording, that is, captures video, the electronic device displays preview video frame data on a capture interface of the camera application. When a person exists in the shooting range of the camera, the preview video frame data also contains the person image, and the proportion of the person image in the preview interface in fig. 1 is small.

The electronic equipment provides a video mode selection control on a preview interface of the camera application program, when the user triggers the video mode selection control, the electronic equipment provides controls corresponding to various video modes for the user to select, and the user can trigger the controls to select the corresponding video modes, so that the preview video frame data and the shot video are optimized.

Referring to fig. 2, when a user triggers a control corresponding to a portrait mode, the electronic device automatically processes a portrait in preview video frame data, for example, beautification, background blurring, and the like. However, because the proportion of the portrait in the preview interface is small, the beautifying effect is not easy to see, and the background blurring cannot effectively highlight the portrait main body, so that the shooting effect of the portrait video is poor, thereby influencing the user experience.

Referring to fig. 3, a software architecture diagram of an electronic device according to an embodiment of the present application is shown. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. For example, the Android system is divided into four layers, namely, an application layer 101, a framework layer 102, an Android runtime (Android run) and system library 103, a hardware abstraction layer 104, a kernel layer 105 and a hardware layer 106 from top to bottom.

The application layer may include a series of application packages. For example, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, device control services, etc.

The framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. For example, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

Wherein the window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc. The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture. The telephony manager is for providing communication functions of the electronic device. Such as the management of call status (including on, hung-up, etc.). The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like. The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the framework layer run in virtual machines. The virtual machine executes java files of the application program layer and the framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.

The surface manager is used for managing the display subsystem and providing fusion of 2D and 3D layers for a plurality of application programs. Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The kernel layer is the core of the operating system of the electronic device, is the first layer of software expansion based on hardware, provides the most basic function of the operating system, is the basis of the operating system, is responsible for managing the processes, the memory, the device driver, the files and the network system of the system, and determines the performance and the stability of the system. For example, the kernel may determine the time an application is operating on a certain portion of hardware.

The kernel layer includes hardware-closely related programs, such as interrupt handlers, device drivers, etc., basic, common, higher frequency of operation modules, such as clock management modules, process scheduling modules, etc., and critical data structures. The kernel layer may be provided in the processor or cured in an internal memory.

The hardware layer includes a plurality of hardware devices of the electronic device, such as cameras, display screens, and the like.

Referring to fig. 4, a flowchart of a video shooting method according to an embodiment of the present application is shown. The method is applied to the electronic equipment, and the video shooting method comprises the following steps:

S101, acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face. If the video frame data contains a face, the process enters S102; if the video frame data does not contain a face, the flow continues to S101.

As shown in fig. 3, in an embodiment of the present application, the hardware layer 106 includes an Image processor 1061, where the Image processor 1061 includes, but is not limited to, an Image Front End (IFE) 1062 and an Image processing engine (Image Processing Engine, IPE) 1063. The image processor 1061 communicates with the camera 193 through a mobile industry processor interface (Mobile Industry Processor Interface, MIPI). The camera 193 includes, but is not limited to, a lens and an image sensor. The lens is used for collecting optical signals in the shooting range of the camera, and the image sensor is used for converting the optical signals collected by the lens into electric signals to obtain image data or video frame data. The image data obtained by the image sensor is a RAW image, and the video frame data obtained by the image sensor is a RAW video frame image.

Fig. 5 is a schematic diagram of a video capturing system according to an embodiment of the present application. The video capture system 10 includes, but is not limited to, an image front end 1062, an image processing engine 1063, a mode switching module 11, an out-of-focus processing module 12.

In an embodiment of the present application, obtaining video frame data captured by a camera, and identifying whether the video frame data includes a face includes: responding to preset operation of a user, acquiring video frame data shot by a camera, performing format conversion on the video frame data shot by the camera to obtain a first video stream, performing scene analysis on the first video stream, and identifying whether the video frame data contains a human face or not. The preset operation of the user may be an operation of starting the camera application program, an operation of starting the camera application program and starting the video function, or an operation of starting the camera application program and triggering video shooting.

Specifically, format conversion is performed on video frame data shot by a camera through an image front end, so that a first video stream is obtained. In an embodiment of the present application, video frame data obtained by capturing by a camera includes a plurality of video frame images, the video frame images obtained by capturing by the camera are in a RAW format, format conversion is performed on each RAW video frame image in the video frame data by an image front end to obtain a video frame image in BMP (Bitmap) format, a first video stream is obtained, and the first video stream is transmitted to an image processing engine. Wherein the first video stream is a tiny (tiny) stream.

In an embodiment of the present application, a scene recognition module is operated in the image processing engine, and after the image processing engine receives the first video stream, the scene recognition module recognizes a scene in the first video stream by using an AI scene detection algorithm, and determines whether the video frame data includes a face. Specifically, the scene recognition module performs face recognition on each video frame image in the first video stream, judges whether the video frame data contain faces, and determines that the video frame data contain faces if a continuous preset number of video frame images contain faces; if the continuous preset number of video frame images are not recognized to contain the human faces, determining that the video frame data do not contain the human faces. Wherein the preset number is 5. In other embodiments, the preset number may be set to other values as desired.

In an embodiment of the present application, the scene recognition module recognizing a scene in the first video stream by using an AI scene detection algorithm, and determining whether the video frame data includes a face includes: the scene recognition module recognizes scenes in the first video stream by adopting a cascade classifier and judges whether the video frame data contain human faces or not. The cascade classifier is formed by cascading a plurality of strong classifiers, and each strong classifier is formed by a certain number of weak classifiers through an ADABOOST algorithm (an algorithm for generating a final strong classifier by iterating the weak classifiers). The weak classifier is used for extracting Harr-like rectangular features of an image, wherein the rectangular features refer to rectangles with black areas and white areas, and can comprise original rectangular features and expanded rectangular features. Specifically, any rectangle is selected and placed on the video frame image, and then the pixel sum of the white area is utilized to subtract the pixel sum of the black area, and the obtained value is the characteristic value of the rectangular characteristic. If the rectangular feature is placed in the face area and the non-face area of the video frame image, the calculated feature values are different, so that whether the area in which the rectangular feature is placed in the video frame image is the face area or not can be judged based on the feature values of the rectangular feature, and further whether the video frame image contains the face or not can be judged.

If the face area can be identified in the video frame image through the cascade classifier, the video frame image is determined to contain the face, and if the face area is not identified in the video frame image through the cascade classifier, the video frame image is determined to not contain the face. If the continuous preset number of video frame images contain faces, determining that video frame data shot by the camera contain faces. Referring to fig. 6, an effect diagram of recognizing a face image by using a cascade classifier according to an embodiment of the present application is shown, where a rectangular frame portion is a face area.

In other embodiments, the method of face detection such as face detection based on template matching, face detection based on appearance shape, face detection based on neural network, face detection based on characteristics, face detection based on skin color, and the like may also be used to detect whether the video frame image contains a face.

S102, calculating the proportion of the size of the face area in the video frame data.

In an embodiment of the present application, if the video frame data includes a face, calculating a proportion of a size of a face area corresponding to the face in the video frame data; if the video frame data contains a plurality of faces, calculating the proportion of the size of the face area corresponding to all the faces in the video frame data.

In an embodiment of the present application, a refinement flow for calculating a proportion of a size of a face region in video frame data is shown in fig. 7, and specifically includes:

s1021, a rectangular frame is adopted to identify the face area identified in the video frame data.

In an embodiment of the present application, after recognizing that the video frame data shot by the camera includes a face, the scene recognition module further uses a rectangular frame to identify a face region recognized in the video frame data. The rectangular box may be the smallest rectangle that encloses the face region.

Specifically, in the process of face detection, the scene recognition module calculates face coordinates at the same time, and determines a minimum rectangular frame containing the face area based on coordinates of four endpoints of the upper, lower, left and right of the face area, wherein the transverse edge of the minimum rectangular frame extends along the horizontal direction, and the longitudinal edge extends along the vertical direction.

And S1022, determining the size of the face area based on the rectangular frame.

In an embodiment of the present application, the size of the face area may include a width value w, a height value h, and an area s of the face area, where the area s is a product of the width value w and the height value h, and units of the width value, the height value, and the area are the number of pixels. Determining the size of the face region based on the rectangular box includes: and determining the width value of a rectangular frame in the video frame image, which marks the face area, as the width value of the face area, and determining the height value of the rectangular frame as the height value of the face area.

S1023, calculating the proportion between the size of the face area and the size of the video frame image to obtain the proportion of the size of the face area in the video frame data.

In an embodiment of the present application, the dimensions of the video frame image may include a width value W, a height value H, and an area S of the video frame image, where the area S is a product of the width value W and the height value H, the width value W, the height value H, and the area S of the video frame image are all preset values, the width value W is a number of pixels at a lateral edge of the video frame image, the height value H is a number of pixels at a longitudinal edge of the video frame image, and the area S is a number of pixels of the video frame image. For example, the resolution of the video frame image is 640×480, the number of pixels at the lateral edge of the video frame image is 640, and the number of pixels at the longitudinal edge of the video frame image is 480, and the number of pixels of the video frame image is 307200.

In an embodiment of the present application, calculating a ratio between a size of the face region and a size of the video frame image includes: and calculating the ratio between the width value of the face area and the width value of the video frame image. For example, if the width value of the face region is 400 and the width value of the video frame image is 640, the ratio R between the width value of the face region and the width value of the video frame image _w ＝400/640＝0.625。

In another embodiment of the present application, calculating a ratio between a size of the face region and a size of the video frame image includes: meter with a meter bodyThe ratio between the height value of the face area and the height value of the video frame image is calculated. For example, if the height value of the face region is 240 and the width value of the video frame image is 480, the ratio R between the width value of the face region and the width value of the video frame image _h ＝240/480＝0.5。

In another embodiment of the present application, calculating a ratio between a size of the face region and a size of the video frame image includes: and calculating the ratio between the area of the face region and the area of the video frame image. For example, if the area of the face region is 10500 and the area of the video frame image is 307200, the ratio R between the width value of the face region and the width value of the video frame image _s ＝10500/307200＝0.34。

S103, judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value. If the proportion of the size of the face area in the video frame data is greater than or equal to the first preset value, the process enters S104; if the proportion of the size of the face region in the video frame data is smaller than the first preset value, the process proceeds to S105.

In an embodiment of the present application, the first preset value is 1/3. In other embodiments, the first preset value may be set to other values according to the requirement.

In another embodiment of the present application, it is determined whether a proportion of the size of the face area in the video frame data in the continuous preset number of video frame images is greater than or equal to a first preset value, and if the proportion of the size of the face area in the video frame data in the continuous preset number of video frame images is greater than or equal to the first preset value, the flow proceeds to S104.

In another embodiment of the present application, it is determined whether the number of faces in the continuous preset number of video frame images is one, and whether the proportion of the size of the face area in the continuous preset number of video frame images in the video frame data is greater than or equal to a first preset value, if the number of faces in the continuous preset number of video frame images is one, and the proportion of the size of the face area in the continuous preset number of video frame images in the video frame data is greater than or equal to the first preset value, the flow proceeds to S104.

S104, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode.

As shown in fig. 5, in an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a portrait mode, a prompt control is displayed on a camera application program interface, the text "recommended use portrait mode" is displayed on the prompt control, and in response to a user triggering an operation of the prompt control, a mode switching module switches a current video mode to a portrait mode, and video capturing is performed based on the portrait mode.

In an embodiment of the present application, performing video capturing based on a portrait mode includes: and switching to the lens with the largest aperture for video shooting, or increasing the aperture of the lens for shooting the video at present to the maximum value.

In another embodiment of the present application, performing video capturing based on a portrait mode includes: and carrying out blurring processing on video frame data shot by the camera, and displaying the video frame data subjected to the human image blurring processing on a display screen. Specifically, the camera transmits the video frame data obtained through shooting to the front end of an image, format conversion is carried out on the video frame data through the front end of the image, RAW format video frame data is converted into YUV format video frame data, a second video stream is generated, the second video stream is transmitted to the image processing engine, the second video stream is a preview stream, the image processing engine carries out blurring processing on video frame images in the second video stream through the out-of-focus processing module, and the video frame data after blurring processing is displayed on the display screen.

In an embodiment of the present application, a refinement flow for blurring video frame data captured by a camera is shown in fig. 8, and specifically includes:

s1041, performing portrait matting on the video frame image, and extracting a portrait region in the video frame image.

In an embodiment of the present application, the blurring process is implemented by an out-of-focus processing module based on a bokeh (background blurring) algorithm. Carrying out portrait matting on the video frame image, wherein the extracting of the portrait region in the video frame image comprises the following steps: and inputting the video frame image into a portrait matting model, and extracting a portrait region in the video frame image through the portrait matting model. The portrait matting model can be FCN ((Fully Convolutional Networks, full convolution neural network), semantic segmentation network SegNet and dense prediction network Unet).

S1042, blurring the background area of the video frame image.

In an embodiment of the present application, the background area of the video frame image is an area other than the portrait area in the video frame image. The blurring processing of the background area in the video frame image comprises: and carrying out Gaussian blur processing on the background area to obtain a virtual background area.

Specifically, performing the gaussian blur processing on the background area includes: presetting the mean value and standard deviation of a two-dimensional Gaussian distribution function, dividing a background area into a plurality of n-by-n preset areas, inputting the coordinates of each pixel point in each n-by-n preset area into the two-dimensional Gaussian distribution function to obtain an output value of the two-dimensional Gaussian distribution function, dividing the output value corresponding to each pixel point by the sum of the output values corresponding to all the pixel points in the preset area to obtain the weight of each pixel point in the preset area, multiplying the RGB three-channel pixel values of the pixel points by the weights to obtain a pixel value after Gaussian blur processing, replacing the initial pixel value of the pixel point with the pixel value after Gaussian blur processing to obtain the pixel point after Gaussian blur processing, and determining an image formed by the pixel points after Gaussian blur processing in the n-by-n preset area as a video frame image after blurring processing. Wherein n is a fuzzy radius, and the value can be any positive integer. Alternatively, the two-dimensional gaussian distribution function has a mean value of 0 and a standard deviation of 1.5.

S1043, fusing the extracted portrait area with the virtual background area.

In an embodiment of the present application, the extracted portrait area is placed at an initial portrait position, and the extracted portrait area is merged with the virtual background area, so that the portrait area is merged with the virtual background area.

S105, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value. If the proportion of the size of the face area in the video frame data is smaller than or equal to the second preset value, the process enters S106; if the proportion of the size of the face area in the video frame data is greater than the second preset value, the process returns to S101. Wherein the second preset value is 1/5. In other embodiments, the second preset value may be set to other values according to the requirement.

In another embodiment of the present application, it is determined whether a proportion of the size of the face area in the video frame data in the continuous preset number of video frame images is less than or equal to a second preset value, and if the proportion of the size of the face area in the video frame data in the continuous preset number of video frame images is less than or equal to the second preset value, the flow proceeds to S106.

In another embodiment of the present application, it is determined whether the number of faces in the continuous preset number of video frame images is one, and whether the proportion of the size of the face area in the continuous preset number of video frame images in the video frame data is less than or equal to a second preset value, if the number of faces in the continuous preset number of video frame images is multiple, and the proportion of the size of the face area in the continuous preset number of video frame images in the video frame data is less than or equal to the second preset value, the flow proceeds to S106.

S106, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode.

In an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a principal angle mode, a prompt control is displayed on a camera application program interface, the text "recommended use principal angle mode" is displayed on the prompt control, and in response to a user triggering an operation of the prompt control, a mode switching module switches a current video mode to the principal angle mode, and video shooting is performed based on the principal angle mode. The main angle is a human image corresponding to the focusing position, and the focusing position is determined based on touch selection of a user.

In the main angle mode, the shooting interface simultaneously displays the portrait video frame data and the panoramic video frame data, and the portrait video frame data and the panoramic video frame data are overlapped and displayed in a picture-in-picture mode. In an embodiment of the present application, performing video capturing based on the principal angle mode includes: and controlling one camera to shoot the main angle portrait video, controlling the other camera to shoot the panoramic video, displaying video frame data of the panoramic video at a shooting interface, and superposing and displaying the video frame data of the portrait video in the video frame data of the panoramic video in a picture-in-picture mode. The camera shooting the main angle image video is a main camera or a long-focus camera (the camera with the longest focal length), the main angle can be tracked and shot, and the camera shooting the panoramic video is a wide-angle camera. In other embodiments, the cameras for capturing the portrait video and the panoramic video may be the same camera, and the main angle portrait portion in the panoramic video frame data captured by the cameras is captured and enlarged, and displayed in a picture-in-picture form.

Specifically, referring to fig. 9, the camera transmits captured video frame data to the front end of an image, the front end of the image firstly converts the video frame data into a first video stream (tiny stream), the first video stream is subjected to scene analysis by adopting an AI scene detection algorithm, if the scene analysis result is that the proportion of the size of a face area in the video frame data is smaller than or equal to a second preset value, a recommendation decision of a principal angle mode is made according to the scene analysis result, and the video mode is switched to the principal angle mode in response to user operation. After the video mode is switched to the principal angle mode, the front end of the image converts video frame data shot by the camera into a second video stream (preview stream), and the second video stream is transmitted to the image processing engine, and the image processing engine performs optimization processing on the second video stream, such as anti-shake, noise reduction, color correction and the like. After the format conversion is carried out on the video frame video by the image front end, main corner portrait video frame data in the video frame data is also intercepted, the main corner portrait video frame data is amplified, for example, amplified by one time, a third video stream is generated, the third video stream is a video stream of an amplifying tracking main body, the third video stream is transmitted to an image processing engine by the image front end, and the image processing engine carries out optimization processing on the third video stream, for example, anti-shake, noise reduction, color correction and the like. The image processing engine splices the processed second video stream and the third video stream, and displays the spliced video stream on a display screen, so that the second video stream is completely displayed, and the third video stream is displayed in a picture-in-picture mode.

Referring to fig. 10, a flowchart of a video capturing method according to another embodiment of the present application is shown. The method is applied to the electronic equipment, and the video shooting method comprises the following steps:

s201, acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face. If the video frame data contains a face, the flow proceeds to S202; if the video frame data does not contain a face, the flow proceeds to S206.

S202, calculating the proportion of the size of the face area in the video frame data.

S203, judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value. If the proportion of the size of the face area in the video frame data is greater than or equal to the first preset value, the process enters S204; if the proportion of the size of the face region in the video frame data is smaller than the first preset value, the process proceeds to S205.

S204, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode.

S205, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value. If the proportion of the size of the face area in the video frame data is smaller than or equal to the second preset value, the process enters S206; if the size of the face region is greater than the second preset value, the process proceeds to S207.

S206, determining the recommended video mode as a principal angle mode, and performing video shooting based on the principal angle mode.

S207, identifying whether the scene information of the video frame data is a night scene. If the scene information of the video frame data is a night scene, the process proceeds to S208; if the scene information of the video frame data is not a night scene, the flow returns to S201.

In an embodiment of the present application, identifying whether the scene information of the video frame data is a night scene includes: the scene recognition module acquires continuous preset number of video frame images, acquires brightness information luxIndex of each video frame image, judges whether the brightness information luxIndex of the continuous preset number of video frame images is smaller than or equal to preset brightness, and if the brightness information luxIndex of the continuous preset number of video frame images is smaller than or equal to the preset brightness, the light brightness of the current shooting scene is darker, and the scene information of the video frame data is determined to be a night scene; if the brightness information of any video frame image is larger than the preset brightness, the brightness of the light rays of the current shooting scene is brighter, and the scene information of the video frame data is determined not to be a night scene. Optionally, the preset number is 5.

S208, determining that the recommended video mode is a night scene mode, and performing video shooting based on the night scene mode.

In an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a night scene mode, a prompt control is displayed on a camera application program interface, the text "recommended use night scene mode" is displayed on the prompt control, and in response to a user triggering an operation of the prompt control, a mode switching module switches a current video mode to the night scene mode, and video shooting is performed based on the night scene mode.

Referring to fig. 11, a flowchart of a video capturing method according to another embodiment of the present application is shown. The method is applied to the electronic equipment, and the video shooting method comprises the following steps:

s301, acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face. If the video frame data contains a face, the flow goes to S302; if the video frame data does not contain a face, the flow proceeds to S307.

S302, calculating the proportion of the size of the face area in the video frame data.

S303, judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value. If the proportion of the size of the face area in the video frame data is greater than or equal to the first preset value, the process enters S304; if the proportion of the size of the face region in the video frame data is smaller than the first preset value, the process proceeds to S305.

S304, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode.

S305, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value. If the proportion of the size of the face area in the video frame data is smaller than or equal to the second preset value, the process enters S306; if the proportion of the size of the face region in the video frame data is greater than the second preset value, the process proceeds to S307.

S306, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode.

S307 identifies whether the scene information of the video frame data is a night scene. If the scene information of the video frame data is a night scene, the flow proceeds to S308; if the scene information of the video frame data is not a night scene, the flow proceeds to S309.

S308, determining that the recommended video mode is a night scene mode, and performing video shooting based on the night scene mode.

S309, it is identified whether the scene information of the video frame data is a high dynamic range (High Dynamic Range, HDR) scene. If the scene information of the video frame data is a high dynamic range scene, the flow proceeds to S310; if the scene information of the video frame data is not a high dynamic range scene, the flow returns to S301.

In an embodiment of the present application, identifying whether the scene information of the video frame data is a high dynamic range scene includes: the scene recognition module acquires a continuous preset number of video frame images in the video frame data, acquires a dynamic range value drValue of the continuous preset number of video frame images, judges whether the dynamic range value drValue of the continuous preset number of video frame images is larger than or equal to a preset dynamic range value, and determines that the scene information of the video frame data is a high dynamic range scene if the dynamic range value drValue of the continuous preset number of video frame images is larger than or equal to the preset dynamic range value; if the dynamic range value drValue of any video frame image is smaller than the preset dynamic range value, determining that the scene information of the video frame data is not a high dynamic range scene. In one embodiment of the present application, the dynamic range value is the ratio between the highest luminance and the lowest luminance in the image. Optionally, the preset dynamic range value is 50.

S310, determining that the recommended video mode is a high dynamic range mode, and performing video shooting based on the high dynamic range mode.

In an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a high dynamic range mode, a prompt control is displayed on a camera application program interface, a text "recommended use HDR mode" is displayed on the prompt control, and in response to a user triggering an operation of the prompt control, a mode switching module switches a current video mode to the high dynamic range mode, and video capturing is performed based on the high dynamic range mode.

Fig. 12-13 are flowcharts of a video capturing method according to another embodiment of the present application. The method is applied to the electronic equipment, and the video shooting method comprises the following steps:

s401, acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face. If the video frame data contains a face, the flow proceeds to S402; if the video frame data does not contain a face, the flow proceeds to S407.

S402, calculating the proportion of the size of the face area in the video frame data.

S403, judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value. If the proportion of the size of the face area in the video frame data is greater than or equal to the first preset value, the process enters S404; if the proportion of the size of the face region in the video frame data is smaller than the first preset value, the process proceeds to S405.

S404, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode.

S405, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value. If the proportion of the size of the face area in the video frame data is smaller than or equal to the second preset value, the process enters S406; if the size of the face region is greater than the second preset value, the process proceeds to S407.

S406, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode.

S407, it is identified whether the scene information of the video frame data is a night scene. If the scene information of the video frame data is a night scene, the flow proceeds to S408; if the scene information of the video frame data is not a night scene, the flow proceeds to S409.

S408, determining that the recommended video mode is a night scene mode, and performing video shooting based on the night scene mode.

S409, it is identified whether the scene information of the video frame data is a high dynamic range scene. If the scene information of the video frame data is a high dynamic range scene, the flow proceeds to S410; if the scene information of the video frame data is not a high dynamic range scene, the flow proceeds to S411.

S410, determining that the recommended video mode is a high dynamic range mode, and performing video shooting based on the high dynamic range mode.

S411 identifies whether the scene information of the video frame data is a macro scene. If the scene information of the preview stream is a macro scene, the flow proceeds to S412; if the scene information of the video frame data is not the macro scene, the flow returns to S401.

In an embodiment of the present application, identifying whether the scene information of the video frame data is a macro scene includes: the scene recognition module judges whether scene information of the video frame data is a macro scene or not based on moving data vcmCode of the voice coil motor of the camera, focusing state of the camera and correction data calibData of the camera. Specifically, the scene recognition module acquires moving data vcmCode of the voice coil motor of the camera, focusing state of the camera and correction data calibData of the camera, judges whether the focusing state of the camera is successful or not, if the focusing state of the camera is unsuccessful, judges whether the moving data vcmccode of the voice coil motor of the camera is larger than or equal to preset moving data, if the moving data vcmccode of the voice coil motor of the camera is larger than or equal to preset moving data, judges whether the correction data calibData of the camera is normal, if the correction data calibData of the camera is normal, under the condition that the camera fails to focus, the moving distance of the camera is also large, and the correction data of the camera is normal, which indicates that the camera can not focus due to the size of focusing distance, and determines that scene information of video frame data is a micro-distance scene; if the focusing state of the camera is successful, or the moving data vcmCode of the voice coil motor of the camera is smaller than the preset moving data, or the correction data calibData of the camera is abnormal, determining that the scene information of the video frame data is not a macro scene. Optionally, the preset movement data is a movement step size of 10 voice coil motors.

S412, determining that the recommended video mode is a macro mode, and performing video shooting based on the macro mode.

In an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a macro mode, displaying a prompt control on a camera application program interface, displaying a word "recommended use macro mode" on the prompt control, and in response to a user triggering an operation of the prompt control, switching a current video mode to the macro mode by a mode switching module, and performing video shooting based on the macro mode.

Fig. 14-15 are flowcharts of a video capturing method according to another embodiment of the present application. The method is applied to the electronic equipment, and the video shooting method comprises the following steps:

s501, acquiring video frame data shot by a camera, and identifying whether the video frame data contains a human face. If the video frame data contains a face, the flow proceeds to S502; if the video frame data does not contain a face, the flow proceeds to S506.

S502, calculating the proportion of the size of the face area in the video frame data.

S503, judging whether the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value. If the proportion of the size of the face area in the video frame data is greater than or equal to the first preset value, the process enters S504; if the proportion of the size of the face region in the video frame data is smaller than the first preset value, the process goes to S505.

S504, determining that the recommended video mode is a portrait mode, and performing video shooting based on the portrait mode.

S505, judging whether the proportion of the size of the face area in the video frame data is smaller than or equal to a second preset value, wherein the second preset value is smaller than the first preset value. If the proportion of the size of the face area in the video frame data is smaller than or equal to the second preset value, the process enters S506; if the proportion of the size of the face region in the video frame data is greater than the second preset value, the process goes to S507.

S506, determining that the recommended video mode is a principal angle mode, and performing video shooting based on the principal angle mode.

S507, it is identified whether the scene information of the video frame data is a night scene. If the scene information of the video frame data is a night scene, the flow proceeds to S508; if the scene information of the video frame data is not a night scene, the flow proceeds to S509.

S508, determining that the recommended video mode is a night scene mode, and performing video shooting based on the night scene mode.

S509, identifying whether the scene information of the video frame data is a high dynamic range scene. If the scene information of the video frame data is a high dynamic range scene, the flow proceeds to S510; if the scene information of the preview stream is not a high dynamic range scene, the flow proceeds to S511.

S510, determining that the recommended video mode is a high dynamic range mode, and performing video shooting based on the high dynamic range mode.

S511, it is identified whether the scene information of the video frame data is a macro scene. If the scene information of the video frame data is a macro scene, the flow proceeds to S512; if the scene information of the video frame data is not a macro scene, the flow proceeds to S513.

S512, determining that the recommended video mode is a macro mode, and performing video shooting based on the macro mode.

S513 identifies whether the scene information of the video frame data is a multi-mirror scene. If the scene information of the video frame data is a multi-mirror scene, the flow proceeds to S514; if the scene information of the video frame data is not a multi-mirror scene, the flow returns to S501.

In an embodiment of the present application, identifying whether the scene information of the video frame data is a macro scene includes: the scene recognition module acquires a continuous preset number of video frame images in the video frame data, recognizes whether the continuous preset number of video frame images contain pets, and determines scene information of the video frame data as a multi-lens scene if the continuous preset number of video frame images contain pets; if any video frame image does not contain pets, determining that the scene information of the video frame data is not a multi-mirror scene. That is, the scene recognition module determines that the scene information of the video frame data is a multi-mirror scene based on the scene analysis result aiScencDetResult as the pet included in the video frame images of the continuous preset number.

S514, determining that the recommended video mode is a multi-mirror mode, and performing video shooting based on the multi-mirror mode.

In an embodiment of the present application, after making a recommendation decision and determining that a recommended video mode is a multi-mirror mode, a prompt control is displayed on a camera application program interface, the text "recommend use multi-mirror mode" is displayed on the prompt control, and in response to a user triggering an operation of the prompt control, a mode switching module switches a current video mode to the multi-mirror mode, and video shooting is performed based on the multi-mirror mode.

The embodiment of the application provides an overall scheme for supporting intelligent scene detection and video mode recommendation in a common video mode, wherein typical user scenes can be identified according to the video mode which is currently supported, namely, in the common video mode, a typical scene detection logic is started, and the corresponding optimal video mode is recommended to a user after the typical scenes are detected.

The currently supported video modes comprise an HDR mode, a portrait mode, a main angle mode, a night view mode and a macro mode, and a multi-lens mode. The specific scheme comprises the following steps: entering a common video preview interface, opening a Master AI (intelligent shooting assistant) by default, performing intelligent scene detection, popping up a recommended dialog box according to a scene detection result, and entering a corresponding video mode after a user clicks for mode selection. After entering the selected video recording mode, the method for exiting the current mode does not support scene detection any more: the icon of the current mode is manually forked off or the video is clicked, the normal video mode is automatically returned after the recording is completed, and a new scene detection is started.

Referring to fig. 16, a flowchart of a video capturing method according to another embodiment of the present application is shown.

S601, entering a common video recording mode.

S602, judging whether the Master AI is on or not. If Master AI is not on, the process proceeds to S603; if Master AI is on, the flow proceeds to S604.

S603, keeping the video mode as a common video recording mode.

S604, master AI carries out shooting scene recognition.

S605, judging whether a matching scene exists. If there is a matching scene, the flow proceeds to S606; if there is no matching scene, the flow proceeds to S603.

S606, based on the identified matching scene, decision recommendation of the video mode is performed. The decision recommendation process specifically comprises the following steps: s607, an HDR scene is detected and stabilized for a certain number of frames. S608, the video mode is switched to the HDR mode. S609, detecting a night scene, and stabilizing for a certain number of frames. S610, switching the video mode to a night scene mode. S611, detecting a single person, wherein the face proportion is greater than or equal to 1/3, and stabilizing for a certain frame number. S612, switching the video mode into a portrait mode, and starting the blurring function. S613, detecting that a plurality of people are detected, wherein the maximum face proportion is less than or equal to 1/5, and stabilizing for a certain frame number. S614, the video mode is switched to the main angle mode. S615, detecting the pet, and stabilizing for a certain number of frames. S616, the video mode is switched to the multi-mirror mode. S617, detecting a macro scene, and stabilizing for a certain number of frames. S618, switching the video mode to the macro mode. When the current video mode is manually turned off, S619, a normal video mode is entered.

Referring to fig. 17, decision factors for intelligent scene detection by Master AI are shown, and referring to fig. 18, video specifications of each video mode are shown.

Referring to fig. 19, a flowchart of a video capturing method according to another embodiment of the present application is shown.

S701, video frame data is input.

S702, outputting the video mode to be recommended according to the decision factor.

S703, judging whether the current camera meets the zoom capability required by the video mode to be recommended. If the current camera meets the zoom capability required by the video mode to be recommended, the process goes to S704; if the current camera does not meet the zoom capability required by the video mode to be recommended, the process returns to S701.

In an embodiment of the present application, if the zoom magnification range of the current camera meets the zoom specification required by the video mode to be recommended, determining the zoom capability of the current camera meeting the video mode to be recommended; if the zoom magnification range of the current camera does not meet the zoom specification required by the video mode to be recommended, determining that the current camera does not meet the zoom capability required by the video mode to be recommended. For example, as shown in fig. 18, the zoom specification of the portrait mode is 1x-2x (1-2 times), and the zoom magnification range of the current camera is 1x-4x, and it is determined that the zoom magnification range of the current camera meets the zoom specification required by the portrait mode.

S704, deciding a recommended video mode. That is, the output video mode decision is made as the recommended video mode.

The scheme is different from the main difference point of intelligent scene identification of photographing: 1) The photographing Master AI only needs to recognize the human face and meet the condition of scene priority, namely, entering human image blurring; in the video Master AI, different modes can be entered according to different sizes of human faces, more than 1/3 of human face scenes enter a human image blurring mode, and less than 1/5 of human face scenes enter a main angle mode; 2) The implementation modes are different: photographing Master AI enters human image blurring, only blurring and Mei Yan Suanfa are added on a common photographing link; mode skip is performed in the video Master AI, and the mode is skipped into portrait mode (video blurring) and main angle mode.

The embodiment of the present application further provides an electronic device 100, as shown in fig. 20, where the electronic device 100 may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an Ultra-mobile personal computer (Ultra-mobile Personal Computer, UMPC), a netbook, a cellular phone, a personal digital assistant (Personal Digital Assistant, PDA), an augmented Reality (Augmented Reality, AR) device, a Virtual Reality (VR) device, an artificial intelligence (Artificial Intelligence, AI) device, a wearable device, a vehicle-mounted device, a smart home device, and/or a smart city device, and the specific type of the electronic device 100 is not particularly limited in the embodiment of the present application.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (Universal Serial Bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (Subscriber Identification Module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (Inter-integrated Circuit, I2C) interface, an integrated circuit built-in audio (Inter-integrated Circuit Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (Mobile Industry Processor Interface, MIPI), a General-Purpose Input/Output (GPIO) interface, a subscriber identity module (Subscriber Identity Module, SIM) interface, and/or a universal serial bus (Universal Serial Bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous Serial bus, comprising a Serial Data Line (SDA) and a Serial clock Line (Derail Clock Line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (Camera Serial Interface, CSI), display serial interfaces (Display Serial Interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices 100, such as AR devices, etc.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present invention is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device 100 through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (Low Noise Amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wireless Fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (Global System For Mobile Communications, GSM), general packet radio service (General Packet Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), time division code division multiple access (Time-Division Code Division Multiple Access, TD-SCDMA), long term evolution (Long Term Evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (Global Positioning System, GPS), a global navigation satellite system (Global Navigation Satellite System, GLONASS), a beidou satellite navigation system (Beidou Navigation Satellite System, BDS), a Quasi zenith satellite system (Quasi-Zenith Satellite System, QZSS) and/or a satellite based augmentation system (Satellite Based Augmentation Systems, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), an Active-matrix Organic Light-Emitting Diode (AMOLED) or an Active-matrix Organic Light-Emitting Diode (Matrix Organic Light Emitting Diode), a flexible Light-Emitting Diode (Flex), a mini, a Micro-OLED, a quantum dot Light-Emitting Diode (Quantum Dot Light Emitting Diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (Charge Coupled Device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (Moving Picture Experts Group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a Neural-Network (NN) computing processor, and can rapidly process input information by referencing a biological Neural Network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The internal Memory 121 may include one or more random access memories (Random Access Memory, RAM) and one or more Non-Volatile memories (NVM).

The Random Access Memory may include Static Random-Access Memory (SRAM), dynamic Random-Access Memory (Dynamic Random Access Memory, DRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random Access Memory, SDRAM), double data rate synchronous dynamic Random-Access Memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM, e.g., fifth generation DDR SDRAM is commonly referred to as DDR5 SDRAM), etc.;

the nonvolatile memory may include a disk storage device, a flash memory (flash memory).

The FLASH memory may include NOR FLASH, NAND FLASH, 3d nand FLASH, etc. divided according to an operation principle, may include Single-Level Cell (SLC), multi-Level Cell (MLC), triple-Level Cell (TLC), quad-Level Cell (QLC), etc. divided according to a storage specification, may include universal FLASH memory (Universal Flash Storage, UFS), embedded multimedia memory card (embedded Multi Media Card, eMMC), etc. divided according to a storage specification.

The random access memory may be read directly from and written to by the processor 110, may be used to store executable programs (e.g., machine instructions) for an operating system or other on-the-fly programs, may also be used to store data for users and applications, and the like.

The nonvolatile memory may store executable programs, store data of users and applications, and the like, and may be loaded into the random access memory in advance for the processor 110 to directly read and write.

The external memory interface 120 may be used to connect external non-volatile memory to enable expansion of the memory capabilities of the electronic device 100. The external nonvolatile memory communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and video are stored in an external nonvolatile memory.

The internal memory 121 or the external memory interface 120 is used to store one or more computer programs. One or more computer programs are configured to be executed by the processor 110. The one or more computer programs include a plurality of instructions that when executed by the processor 110, implement the screen display detection method performed on the electronic device 100 in the above embodiment to implement the screen display detection function of the electronic device 100.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device 100 platform (Open Mobile Terminal Platform, OMTP) standard interface, a american cellular telecommunications industry association (Cellular Telecommunications Industry Association of the USA, CTIA) standard interface.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100. The present embodiment also provides a computer storage medium, in which computer instructions are stored, which when executed on the electronic device 100, cause the electronic device 100 to execute the above-mentioned related method steps to implement the video capturing method in the above-mentioned embodiment.

The present application also provides a computer program product, which when run on a computer, causes the computer to perform the above-mentioned related steps to implement the video capturing method in the above-mentioned embodiments.

In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component, or a module, and may include a processor and a memory connected to each other; the memory is used for storing computer-executable instructions, and when the device is operated, the processor can execute the computer-executable instructions stored in the memory, so that the chip can execute the video shooting method in each method embodiment.

The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are used to execute the corresponding methods provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding methods provided above, and will not be described herein.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated unit may be stored in a readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A video capturing method, the method comprising:

responding to a first operation of a user on a camera application program, displaying a first interface, wherein the first interface comprises a first video, and the first video is video frame data shot by a camera;

identifying scene information corresponding to the video frame data, displaying a second interface, wherein the second interface displays a first video and a first control, and the first control corresponds to a first video mode;

and responding to a second operation of the first control by the user, displaying a third interface, wherein the third interface comprises a second video, and the second video is obtained by processing the first video by using the first video mode.

2. The video shooting method of claim 1, wherein the identifying scene information corresponding to the video frame data comprises:

if the video frame data contains a human face, calculating the proportion of the size of a human face area in the video frame image of the video frame data;

if the proportion of the size of the face area in the video frame data is larger than or equal to a first preset value, determining that scene information corresponding to the video frame data is a virtual scene.

3. The video photographing method of claim 2, wherein the method further comprises:

and determining a first video mode corresponding to the virtual scene as a portrait mode, wherein the second video is obtained by processing the first video by using the portrait mode.

4. The video shooting method of claim 2, wherein the identifying scene information corresponding to the video frame data further comprises:

if the proportion of the size of the face area in the video frame image is smaller than or equal to a second preset value, determining that scene information corresponding to the video frame data is a main angle scene, wherein the second preset value is smaller than the first preset value.

5. The video shooting method of claim 4, wherein the method further comprises:

determining a second video mode corresponding to the principal angle scene as a principal angle mode;

the second interface displays a first video and a second control, and the second control corresponds to the second video mode;

and responding to a second operation of the second control by the user, displaying a third interface, wherein the third interface comprises a third video, and the third video is obtained by processing the first video by using the second video mode.

6. The video shooting method of claim 4, wherein the identifying scene information corresponding to the video frame data further comprises:

acquiring a continuous preset number of video frame images in the video frame data, and acquiring brightness information of each video frame image;

and if the brightness information of the continuous preset number of video frame images is smaller than or equal to preset brightness, determining that the scene information corresponding to the video frame data is a night scene.

7. The video shooting method of claim 6, wherein the method further comprises:

determining a third video mode corresponding to the night scene as a night scene mode;

the second interface displays a first video and a third control, and the third control corresponds to the third video mode;

and responding to a second operation of the third control by the user, displaying a third interface, wherein the third interface comprises a fourth video, and the fourth video is obtained by processing the first video by using the third video mode.

8. The video shooting method of claim 6, wherein the identifying scene information corresponding to the video frame data further comprises:

Acquiring a continuous preset number of video frame images in the video frame data, and acquiring a dynamic range value of each video frame image;

and if the dynamic range value of the continuous preset number of video frame images is larger than or equal to the preset dynamic range value, determining that the scene information corresponding to the video frame data is a high dynamic range scene.

9. The video capturing method of claim 8, wherein the method further comprises:

determining a fourth video mode corresponding to the high dynamic range scene as a high dynamic range mode;

the second interface displays a first video and a fourth control, and the fourth control corresponds to the fourth video mode;

and responding to a second operation of the fourth control by the user, displaying the third interface, wherein the third interface comprises a fifth video, and the fifth video is obtained by processing the first video by using the fourth video mode.

10. The video shooting method of claim 8, wherein the identifying scene information corresponding to the video frame data further comprises:

acquiring movement data, focusing state and correction data of a voice coil motor of the camera;

if the movement data of the voice coil motor of the camera is larger than or equal to preset movement data, the correction data of the camera is normal, and the focusing state of the camera is failure, determining that the scene information corresponding to the video frame data is a macro scene.

11. The video capturing method of claim 10, wherein the method further comprises:

determining a fifth video mode corresponding to the macro scene as a macro mode;

the second interface displays a first video and a fifth control, and the fifth control corresponds to the fifth video mode;

and responding to a second operation of the fifth control by the user, displaying the third interface, wherein the third interface comprises a sixth video, and the sixth video is obtained by processing the first video by using the fifth video mode.

12. The video shooting method of claim 10, wherein the identifying scene information corresponding to the video frame data further comprises:

and acquiring a continuous preset number of video frame images in the video frame data, and determining that scene information corresponding to the video frame data is a multi-lens scene if the continuous preset number of video frame images contain pets.

13. The video capturing method of claim 12, wherein the method further comprises:

determining a sixth video mode corresponding to the multi-mirror scene as a multi-mirror mode;

the second interface displays a first video and a sixth control, and the sixth control corresponds to the sixth video mode;

And responding to a second operation of the sixth control by the user, displaying the third interface, wherein the third interface comprises a seventh video, and the seventh video is obtained by processing the first video by using the sixth video mode.

14. The video photographing method of claim 1, wherein the method further comprises:

judging whether the camera meets the zoom capability required by the first video mode;

and if the camera meets the zoom capability required by the first video mode, displaying the first control on the second interface.

15. An electronic device comprising a memory and a processor, wherein:

the memory is used for storing program instructions;

the processor configured to read and execute the program instructions stored in the memory, which when executed by the processor, cause the electronic device to perform the video capturing method according to any one of claims 1 to 14.

16. A chip coupled to a memory in an electronic device, wherein the chip is configured to control the electronic device to perform the video capture method of any one of claims 1 to 14.

17. A computer storage medium storing program instructions which, when run on an electronic device, cause the electronic device to perform the video capture method of any one of claims 1 to 14.