WO2023231622A1 - Video editing method and electronic device - Google Patents

Video editing method and electronic device Download PDF

Info

Publication number
WO2023231622A1
WO2023231622A1 PCT/CN2023/089100 CN2023089100W WO2023231622A1 WO 2023231622 A1 WO2023231622 A1 WO 2023231622A1 CN 2023089100 W CN2023089100 W CN 2023089100W WO 2023231622 A1 WO2023231622 A1 WO 2023231622A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video
protagonist
close
terminal
Prior art date
Application number
PCT/CN2023/089100
Other languages
French (fr)
Chinese (zh)
Other versions
WO2023231622A9 (en
Inventor
韩钰卓
朱世宇
张志超
代秋平
张农
杜远超
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2023231622A1 publication Critical patent/WO2023231622A1/en
Publication of WO2023231622A9 publication Critical patent/WO2023231622A9/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • the present application relates to the field of terminals, and in particular, to a video editing method and electronic device.
  • the terminal device can receive the protagonist selected by the user. Then, the terminal device can always follow the protagonist during the subsequent video recording process, and obtain a close-up video in which the center of the video is always the selected protagonist.
  • This application provides a video editing method and electronic equipment.
  • the user can select an object in the image as the protagonist, and the electronic device can automatically track and record the protagonist in the above image. And save the close-up video of the protagonist.
  • the present application provides a video editing method applied to electronic devices, characterized in that the method includes: displaying a first image and one or more markers associated with the first image in a first interface; An image includes one or more objects, and one or more markers associated with the first image respectively correspond to one or more objects in the first image; the first image is an image currently collected by the camera of the electronic device, or stored in the electronic device A frame of image in the first video; detecting the first operation acting on the first mark; in response to the first operation, determining the first object as the protagonist, obtaining a close-up image centered on the protagonist; a first image associated or the plurality of marks include a first mark, one or more objects in the first image include the first object, and the first mark corresponds to the first object; based on the close-up image centered on the protagonist, a second video centered on the protagonist is generated .
  • the user can select an object in the image collected by the camera as the protagonist; when recording the original video collected by the camera, the electronic device can automatically track the protagonist in the image sequence collected by the camera and record a close-up of the protagonist video.
  • the electronic device can display the local video selected by the user, and the user can select an object in one frame of the local video as the protagonist; the electronic device can automatically track and record the protagonist in the above frame of the image and subsequent images in the local video. And save the close-up video of the protagonist.
  • the method further includes: displaying a second image and one or more markers associated with the second image in the first interface, where the second image includes one or more objects, One or more markers associated with the second image respectively correspond to one or more objects in the second image;
  • the second image is an image after the first image collected by the camera of the electronic device, or after the first image in the first video A frame of image; detecting the fifth operation acting on the second mark; in response to the fifth operation, switching the protagonist to the second object, one or more marks associated with the second image include the second mark, and the second image
  • the one or more objects include a second object, and the second mark corresponds to the second object;
  • the obtaining a close-up image centered on the protagonist includes: generating a A close-up image centered on the first object, based on the second image and subsequent images, generates a second pair of a close-up image centered on the object;
  • the second video includes a first sub-video and a second sub-video, the first sub-video
  • the electronic device can also determine a new protagonist based on the image collected by the camera, for example, switch the protagonist from the first object as the second object, and then record a close-up video of the second object.
  • the electronic device displays the local video
  • the user can also select another object in another frame of the local video as the new protagonist, for example, switching the protagonist from the first object to the second object; the electronic device can select the other object in the local video.
  • the main character in one frame of image and subsequent images is automatically tracked, and a close-up video of the second object is recorded.
  • the electronic device can save the close-up video of the first object and the close-up video of the second object respectively, or it can combine the close-up video of the first object and the close-up video of the second object into one video and save them.
  • the step of obtaining a close-up image centered on the protagonist is specifically: based on the image including the first object from the first image to the last frame of the image in the first video, generating a close-up image centered on the first object. close-up image.
  • the method when the second image is a frame of image after the first image in the first video, before the first interface displays the first image and one or more markers associated with the first image, the method It also includes: displaying a thumbnail of the first video; detecting a second operation acting on the thumbnail of the first video; displaying the first image and one or more markers associated with the first image on the first interface, including: In response to the second operation, the first frame image of the first video and one or more marks corresponding to one or more objects in the first frame image are displayed on the first interface, and the first image is the first frame image.
  • the user can trigger the electronic device to display the first interface for playing the local video through the thumbnail of the local video displayed in a specific application of the electronic device (such as the gallery); display the first interface of the local video on the first interface.
  • a specific application of the electronic device such as the gallery
  • the markers corresponding to each object in the image can be automatically displayed without user operation, allowing the user to select the protagonist.
  • the method when the second image is a frame of image after the first image in the first video, before the first interface displays the first image and one or more markers associated with the first image, the method It also includes: displaying the first frame image of the first video and the first control on the first interface; detecting a third operation acting on the first control; responding to the third operation, playing the first video; displaying the first video on the first interface.
  • An image and one or more tags associated with the first image include: when the first video is played to the M-th frame image, displaying the above-mentioned M-th frame image on the first interface, and one or more tags associated with the above-mentioned M-th frame image. mark.
  • the user when the first interface displays an image in a local video, the user can trigger the electronic device to display marks corresponding to each object in the image through a specified operation. In this way, there is no need to perform object recognition on each frame of the local video, saving the power consumption of object recognition.
  • the Mth frame image and one or more markers associated with the Mth frame image are displayed on the first interface, including: when the Mth frame image is played, the Mth frame image is displayed on the first interface.
  • a fourth operation acting on the first control is detected; in response to the fourth operation, the playback of the first video is paused, and the Mth frame image currently played is displayed; in response to the operation of pausing playback , displaying one or more markers associated with the M-th frame image on the M-th frame image.
  • the electronic device when the local video is paused, the electronic device only displays the marks corresponding to each object in the image for the image currently displayed in the local video. In this way, there is no need to perform object recognition on each frame of the local video, saving the power consumption of object recognition.
  • the first interface also includes a second control, which is based on a close-up image centered on the protagonist, Generating a second video centered on the protagonist includes: detecting a sixth operation acting on the second control; and generating a second video centered on the protagonist based on a close-up image centered on the protagonist in response to the sixth operation.
  • the user can control the electronic device to stop recording the close-up video through a preset operation.
  • the second control is a control used to stop recording.
  • the first interface for recording video includes a control for stopping the recording. During the process of recording the close-up video of the protagonist, the user can control the electronic device to stop recording the close-up video through the above control.
  • the method further includes: in response to the sixth operation, the camera stops collecting images, and the original video is generated and saved based on the images collected by the camera.
  • the electronic device when the user controls the electronic device to stop recording the original video through a preset operation, the electronic device also automatically stops recording the close-up video of the protagonist.
  • the method further includes: displaying a first window, and displaying a close-up image centered on the protagonist in the first window.
  • the user when recording a close-up video of the protagonist, the user can preview the recording process of the close-up video of the protagonist in real time through the first window.
  • the method when the first image is an image currently collected by the camera of the electronic device, the method further includes: detecting a first trigger condition, and the first trigger condition is the consecutive Y frame images after the first image.
  • the protagonist is not included in; generating a second video centered on the protagonist based on the close-up image centered on the protagonist is specifically: in response to the first trigger condition, based on the close-up image centered on the protagonist, generating a second video centered on the protagonist second video.
  • generating a close-up image centered on the first object based on an image including the first object between the first image and the second image includes: obtaining the first object as the center from the first image. A first close-up image of the center; a third close-up image centered on the first object is obtained from the third image; the third image is an image after the first image and before the second image; the second video includes the first close-up image and Second close-up image.
  • the method before obtaining the third close-up image centered on the first object from the third image, the method further includes: determining whether the first object is included in the third image; Obtaining a third close-up image centered on the first object, specifically: when the third image includes the first object, acquiring a third close-up image centered on the first object from the third image.
  • the protagonist in the image collected by the camera or the local video can be located, and the above image can be cropped to obtain a close-up image centered on the protagonist.
  • determining that the first object is included in the third image includes: using a human body detection algorithm to identify the human body image area in the third image; when the human body image areas in the third image do not overlap, calculating the third The intersection ratio IoU distance of each human body image area in the three images and the human body image area of the protagonist in the first image; determine the first human body image area with the smallest IoU distance and satisfy the IoU distance threshold; the object corresponding to the first human body image area is Protagonist; when the human body image area in the third image overlaps, calculate the IoU distance and relocation ReID distance between each human body image area in the third image and the human body image area of the protagonist in the first image; determine the IoU distance and ReID distance and the first human body image area that is the smallest and satisfies the IoU+ReID distance threshold; the object corresponding to the first human body image area is the protagonist.
  • the human body image area of the protagonist can be accurately identified through the IoU distance
  • obtaining the third close-up image centered on the protagonist from the third image specifically includes: determining the third close-up image including the first human body image area based on the first human body image area.
  • determining a third close-up image including the first human body image area based on the first human body image area specifically includes: determining a first scaling ratio based on the first human body image area; determining a third close-up image based on the first human body image area.
  • the scaling ratio is used to reflect the size of the protagonist in the original image.
  • the size of the close-up image of the protagonist determined based on the scaling ratio can be adapted to the small window used to display the close-up image, thereby preventing the small window from displaying the close-up image. Image deformation problem occurs.
  • determining the first scaling ratio based on the first human body image area specifically includes: determining the first scaling ratio based on the size of the largest human body image area in the third image and the size of the first human body image area. Compare.
  • the zoom ratio can be adjusted in real time based on the size of the human body image area of the protagonist in each image collected by the camera or each image displayed in the local video.
  • determining the size of the third close-up image based on the first scaling ratio specifically includes: determining the size of the third close-up image based on the first scaling ratio and the preset size of the second video.
  • the size of the close-up image cropped from each of the above images can be adjusted in real time based on the scaling ratio corresponding to each image collected by the camera or the local video, as well as the size of the close-up video, to ensure that the close-up image can be Adapts to a small window used to display close-up images.
  • the aspect ratio of the third close-up image is the same as the preset aspect ratio of the second video. Implementing the embodiments of the present application ensures that the close-up image can adapt to the small window used to display the close-up image, thereby avoiding the problem of image deformation when the small window displays the close-up image.
  • the present application provides an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled to one or more processors.
  • the one or more memories are used to store computer program codes.
  • the computer program codes include computer instructions.
  • the electronic device causes the electronic device to execute The video editing method in any possible implementation of the first aspect above.
  • embodiments of the present application provide a computer storage medium that includes computer instructions.
  • the computer instructions When the computer instructions are run on an electronic device, the electronic device causes the electronic device to execute the video editing method in any of the possible implementations of the first aspect. .
  • embodiments of the present application provide a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the video editing method in any of the possible implementations of the first aspect.
  • 1A to 1M and 1O to 1P are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
  • FIG. 1N is a schematic diagram of a terminal 100 saving a captured close-up video in a shooting scene provided by an embodiment of the present application;
  • Figures 2A-2B and 2D-2E are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
  • Figure 2C is a schematic diagram of a terminal 100 saving a captured close-up video in a shooting scene provided by an embodiment of the present application. picture;
  • 3A-3C are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application.
  • 4A-4H are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application.
  • Figures 5A, 5B-1 to 5B-4, and 5C to 5E are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
  • Figure 6 is a flow chart of the terminal 100 editing and generating a close-up video in a shooting scene provided by the embodiment of the present application;
  • Figure 7A is a flow chart of the terminal 100 performing object recognition and marking provided by the embodiment of the present application.
  • Figure 7B is a schematic diagram of the terminal 100 determining the face image and human body image in the image provided by the embodiment of the present application;
  • Figure 7C is a schematic diagram of the terminal 100 determining the display position of the selection box provided by the embodiment of the present application.
  • Figure 8A is a flow chart for the terminal 100 to determine the close-up image centered on the protagonist provided by the embodiment of the present application;
  • Figure 8B is a flow chart for the terminal 100 to determine the size of the close-up image provided by the embodiment of the present application.
  • Figures 8C-8D are schematic diagrams of terminal 100 adaptively adjusting close-up images to adapt to window display provided by embodiments of the present application;
  • Figure 9 is a flow chart for positioning the protagonist of the terminal 100 in the rear image frame provided by the embodiment of the present application.
  • Figure 10A is a frame of images in which objects do not overlap in a multi-object scene provided by an embodiment of the present application
  • Figure 10B is a frame image of overlapping objects in a multi-object scene provided by an embodiment of the present application.
  • Figures 10C to 10D are schematic diagrams of the terminal 100 using IoU position to locate the protagonist provided by the embodiment of the present application;
  • Figure 11 is a schematic diagram of the terminal 100 determining the ReID distance of the protagonist in the image provided by the embodiment of the present application;
  • Figure 12A is a flow chart of another terminal 100 locating the protagonist in the rear image frame provided by the embodiment of the present application;
  • Figure 12B is a flow chart of another terminal 100 editing and generating close-up videos in a shooting scene provided by an embodiment of the present application;
  • Figure 13 is a flow chart for the terminal 100 to generate a close-up video in the scenario of editing local videos provided by the embodiment of the present application;
  • Figure 14 is a schematic system structure diagram of the terminal 100 provided by the embodiment of the present application.
  • Figure 15 is a schematic diagram of the hardware structure of the terminal 100 provided by the embodiment of the present application.
  • terminal devices (denoted as terminal 100 , and terminal 100 will be used collectively to refer to the above-mentioned terminal devices) such as mobile phones and tablet computers with functions of photographing and image processing can identify objects in a multi-object scene. Multiple objects in the image, and automatically track the user-specified object, generate and save a close-up video of the object. At the same time, the terminal 100 can also save the original video.
  • the original video is composed of original images collected by the camera.
  • the close-up video is based on the original image and cropped with the protagonist in the original image as the center.
  • a close-up video is one in which the main character is always the center of the shot. In this way, after selecting the protagonist, the user can not only shoot a close-up video centered on the protagonist, but also obtain the original video composed of the original images collected by the original camera.
  • the terminal 100 can also identify the objects included in the local video, and then determine the objects included in the local video according to the user's selection operation. Determine the protagonist of the video. After determining the protagonist, the terminal 100 may also perform an editing operation on the above-mentioned local video to extract the close-up video of the protagonist, thereby obtaining a close-up video with the protagonist always as the center of the shooting.
  • the terminal 100 can also be a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, or a personal digital assistant.
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable devices wearable devices
  • vehicle-mounted devices smart home devices and / or smart city equipment
  • smart home devices smart home devices and / or smart city equipment
  • FIG. 1A exemplarily shows the user interface 101 of the terminal 100 enabling the camera to perform a shooting action.
  • the user interface 101 may include a mode bar 111 , a shooting control 112 , a preview window 113 , a review control 114 , and a conversion control 115 .
  • the mode bar 111 may display multiple shooting mode options, such as night scene, video, photo, portrait and other shooting modes.
  • Night scene mode can be used to take photos in dark scenes, such as taking photos at night.
  • Video mode can be used to record videos.
  • Photo mode can be used to take photos in daylight scenes.
  • Portrait mode can be used to take close-up photos of people.
  • the mode bar 111 also includes a protagonist mode.
  • the protagonist mode corresponds to the shooting method provided by the embodiment of the present application: during the process of shooting a video, the protagonist in the video is determined and automatically tracked, and the original video and the close-up video of the protagonist with the protagonist as the shooting center are saved.
  • the shooting control 112 may be used to receive a user's shooting operation.
  • a photographing scene including photographing mode, portrait mode, and night scene mode
  • the above-mentioned photographing operation is an operation of controlling photographing that acts on the photographing control 112 .
  • the scene of recording video video recording mode
  • the above-mentioned shooting operation includes an operation of starting recording on the shooting control 112 .
  • the preview window 113 can be used to display the sequence of image frames collected by the camera in real time.
  • the image displayed in the preview window 113 may be called an original image.
  • what is displayed in the preview window 113 is a down-sampled sequence of image frames.
  • the image frame sequence without downsampling processing corresponding to the image displayed in the preview window 113 may be called an original image.
  • the review control 114 can be used to view photos or videos taken previously. Generally, the review control 114 can display a thumbnail of a photo taken previously or a thumbnail of the first frame of a video taken previously.
  • User interface 101 may also include a settings bar 116 .
  • Multiple setting controls may be displayed in the setting bar 116 .
  • a setting control is used to set a type of parameters of the camera, thereby changing the images collected by the camera.
  • the setting bar 116 may display setting controls such as aperture 1161, flash 1162, filter 1164, etc.
  • Aperture 1161 can be used to adjust the camera aperture size, thereby changing the brightness of the image captured by the camera; flash 1162 can be used to turn on or off the flash, thereby changing the brightness of the image captured by the camera; filter 1164 can be used to select a filter style, Then adjust the image color.
  • the settings bar 116 may also include further settings controls 1165 . More setting controls 1165 can be used to provide more controls for adjusting camera shooting parameters or image optimization parameters, such as white balance controls, ISO controls, beauty controls, body beauty controls, etc., thereby providing users with richer shooting services. .
  • the terminal 100 can first select the camera mode, refer to the user interface 101 .
  • the terminal 100 may detect the user operation on the mode bar 111 to select the protagonist mode, such as the operation of clicking the protagonist shooting mode option as shown in FIG. 1A , or the operation of sliding the mode bar 111 to select the protagonist shooting mode option, etc. .
  • the terminal 100 may determine to turn on the protagonist mode for shooting.
  • FIG. 1B exemplarily shows the user interface 102 of the terminal 100 for shooting in the protagonist mode.
  • the terminal 100 can perform image content recognition (object recognition) on the image collected by the camera, and identify the objects included in the image.
  • object recognition object recognition
  • the above-mentioned objects include but are not limited to humans, animals, and plants.
  • the following description of the embodiments of this application will mainly take characters as examples. While the terminal 100 displays the image collected by the camera in the preview window 113, the terminal 100 may also display a selection box on each recognized object.
  • the images collected by the camera at a certain moment include Person 1, Person 2, and Person 3.
  • the terminal 100 may use a preset object recognition algorithm to identify objects included in the image.
  • the object recognition algorithm may include a face recognition algorithm and a human body recognition algorithm.
  • the terminal 100 can recognize that the above image includes three objects: Person 1, Person 2, and Person 3.
  • the terminal 100 is not limited to the characters 1, 2, and 3 introduced in the above user interface 102, and the terminal 100 also supports the recognition of animals and plant-type objects.
  • the above-mentioned object recognition algorithm also includes a recognition algorithm for one or more animals, and a recognition algorithm for one or more plants, which are not limited in the embodiments of the present application.
  • the terminal 100 can display the above-mentioned images including Person 1, Person 2, and Person 3 in the preview window 113.
  • the terminal 100 may determine a selection box corresponding to each of the above-described objects.
  • the terminal 100 may display selection boxes corresponding to each object, such as the selection box 121 corresponding to Person 1, the selection box 122 corresponding to Person 2, and the selection box 123 corresponding to Person 3.
  • the user can confirm the video protagonist through the above selection box.
  • the user interface 102 can also display prompts 125, such as "Please click on the protagonist to start automatic focus recording.”
  • Prompt 125 prompts the user to determine the protagonist of the video. According to the prompt 125, the user can click any one of the above selection boxes. The object corresponding to the selection box that the user clicks on is the video protagonist determined by the user.
  • the user interface 102 may also include a focus control 126 and a beauty control 127 .
  • the focal length control 126 can be used to set the focal length of the camera to adjust the viewing range of the camera. When the viewing range of the camera changes, the image displayed in the preview window will change accordingly.
  • the beauty control 127 can be used to adjust the face image of the person in the image. After detecting the user operation on the beautification control 127, the terminal 100 can perform beautification processing on the characters in the image, and display the beautified image in the preview window.
  • the user interface 102 may also display other shooting controls, which are not listed here.
  • the terminal 100 can detect a user operation on any selection box. In response to the above operation, the terminal 100 may determine that the object corresponding to the above selection box is the protagonist. For example, referring to the user interface 103 shown in FIG. 1C , the terminal 100 may detect a user operation acting on the selection box 123 . In response to the above operation, the terminal 100 may determine that the character 3 corresponding to the selection box 123 is the protagonist of the shooting.
  • the terminal 100 may display a small window in the preview window 113 in a picture-in-picture format, and display a close-up image of the character 3 in the small window.
  • the above close-up image refers to the image obtained by cropping the selected protagonist as the center based on the original image collected by the camera (the image displayed in the preview window).
  • FIG. 1D exemplarily shows the user interface 104 in which the terminal 100 displays a small window and displays a close-up image of the character 3 in the small window.
  • the preview window 113 of the user interface 104 may include a small window 141 .
  • a close-up image of the character 3 can be displayed in the small window 141 .
  • the image displayed in the preview window 113 changes, the image displayed in the small window 141 will also change accordingly.
  • the image displayed in the small window 141 is always centered on the person 3. In this way, the video composed of the image displayed in the small window 141 is a close-up video of the character 3 .
  • the close-up image displayed in the small window 141 and the original image displayed in the preview window 113 may also come from different cameras.
  • the close-up image displayed in the small window 141 may be from an image collected by an ordinary camera
  • the original image displayed in the preview window 113 may be from an image collected by a wide-angle camera.
  • Regular cameras and wide-angle cameras can capture images at the same time.
  • the images collected by the ordinary camera and the wide-angle camera correspond at the same time. In this way, the user can browse a larger range of landscape in the preview window 113, and at the same time, a more detailed protagonist image is displayed in the small window 141.
  • the selection box 123 corresponding to character 3 can become as shown in the selection box 142 in Figure 1D.
  • the user can determine the selected shooting protagonist by checking the box 142 .
  • the terminal 100 may also display other styles of icons to indicate that the character 3 is selected as the protagonist to show distinction.
  • the small window 141 for displaying close-up images may also include a close control 143 and a transpose control 144.
  • the close control 143 can be used to close the small window 141.
  • the transpose control can be used to resize the small window 141.
  • the terminal 100 may cancel the previously determined protagonist (Character 3). Then, the terminal 100 may instruct the user to re-select the shooting protagonist among the recognized objects. At this time, the terminal 100 can display the small window 141 in the preview window 113 again based on the redetermined protagonist. At this time, a close-up image obtained by processing the original image with the center of the new protagonist is displayed in the small window 141 .
  • the close control 143 may also be used to pause the recording of the close-up video after starting to record the video. At this time, the terminal 100 will not cancel the previously determined protagonist. After the recording is paused, the off control 143 can be replaced with an on control. After detecting the user operation for opening the control, the terminal 100 can continue to record a close-up video centered on the above-mentioned protagonist.
  • the terminal 100 after closing the small window 141, the terminal 100 only does not display the small window, that is, it does not display the close-up image of the previously determined protagonist (Character 3), but the terminal 100 still maintains the previously determined protagonist.
  • the preview window 113 will not be partially blocked by the small window 141 that displays the close-up image of the protagonist. Users can better monitor the image content of the original video, resulting in higher quality original video.
  • the user can cancel the selected protagonist character 3 by clicking on the check box 142, thereby re-selecting a new protagonist among the recognized objects.
  • the terminal 100 may first generate a small window (vertical window) with an aspect ratio of 9:16 for displaying a close-up image, refer to the small window 141 in FIG. 1D.
  • the above aspect ratio is an exemplary example, and the aspect ratio of vertical windows includes but is not limited to 9:16.
  • the terminal 100 can change the original vertical window into a horizontal window (horizontal window) with an aspect ratio of 16:9.
  • the terminal 100 can also generate a horizontal window by default, and then adjust the horizontal window to a vertical window according to user operations. This embodiment of the present application does not limit this. In this way, the user can adjust the video content of the close-up video through the transpose control 144 to meet his or her own personalized needs.
  • the terminal 100 may permanently display a small window showing a close-up image at the lower left (or lower right, upper left, or upper right) of the screen.
  • the above-mentioned small window can also adjust its display position according to the position of the protagonist in the preview window to avoid blocking the protagonist in the preview window.
  • the terminal 100 can also adjust the position and size of the small window according to user operations.
  • the terminal 100 may also detect long press operations and drag operations on the small window 141, and in response to the above operations, the terminal 100 may The small window moves to the position where the user's drag operation last stopped.
  • the terminal 100 may also detect a double-click operation on the small window 141, and in response to the above operation, the terminal 100 may enlarge or reduce the small window 141.
  • the terminal 100 can also control and adjust the position and size of the small window through gesture recognition and voice recognition.
  • the terminal 100 can recognize that the user has made a fist gesture through the image collected by the camera, and in response to the fist gesture, the terminal 100 can reduce the small window 141 .
  • the terminal 100 can recognize that the user has made a hand-opening gesture through the image collected by the camera. In response to the hand-opening gesture, the terminal 100 can enlarge the small window 141 .
  • the terminal 100 may detect a user operation to start shooting. After starting shooting, the terminal 100 may also detect a user operation to end shooting. In response to the above-mentioned operations of starting and ending shooting, the terminal 100 may save the sequence of image frames collected by the camera during the above-mentioned operations as a video.
  • the terminal 100 may detect a user operation on the shooting control 112 .
  • the above-mentioned user operation on the shooting control 112 may be called a user operation for starting shooting.
  • the terminal 100 may write the original image corresponding to the preview window 113 and the close-up image corresponding to the small window 141 into a specific storage space.
  • the terminal 100 can write the original image collected by the camera (the uncropped image displayed in the preview window 113) into a specific storage space to generate an original video; on the other hand, the terminal 100 can also use the protagonist to The close-up image (the image displayed in the small window 141) in the center is written into a specific storage space, thereby generating a close-up video.
  • the terminal 100 may change the shooting control 112 to the shooting control 161 in the user interface 106 .
  • Capture control 161 may be used to indicate that recording is currently in progress.
  • the protagonist initially selected by the user may leave the viewing range of the camera of the terminal 100 (that is, the protagonist is not included in the preview window 113).
  • the identifiable objects in the preview window 113 include character 1 and character 2, but do not include the protagonist selected by the user: character 3.
  • the terminal 100 may display and close the small window 141 displaying the close-up image of the protagonist.
  • the preview window 113 does not include the small window 141 at this time.
  • the terminal 100 may display a prompt 162, such as "The protagonist is missing, please aim and shoot at the protagonist” to remind the user that the protagonist is missing and the close-up image of the protagonist cannot be determined.
  • the user can adjust the camera position so that the protagonist is within the viewing range of the camera, so that the camera can re-capture an image including the protagonist.
  • the terminal 100 can regenerate the small window 141 and display the current protagonist-centered image in the small window 141. close-up image.
  • the terminal 100 may decide whether to close the small window 141 after several frames. For example, after the moment shown in the user interface 107 shown in FIG. 1G (no protagonist is detected), the terminal 100 can continue to detect N frames of images after this frame image. If none of the N frames of images include the protagonist, the terminal 100 Close the small window 141. After the protagonist disappears and before confirming to close the small window 141, the terminal 100 may determine the image content displayed in the small window 141 during the above period based on the cropping area of the last frame before the protagonist disappears.
  • the user After recording the video for a period of time, the user can end recording the video through the shooting control 161 .
  • the terminal 100 can detect a user operation acting on the shooting control 161.
  • the above user operation acting on the shooting control 161 can be called a user operation to end shooting.
  • the terminal 100 may encapsulate the original image frame sequence written into a specific storage space into a video, that is, an original video.
  • the terminal 100 can also encapsulate the close-up image frame sequence written into a specific storage space into a video, that is, a close-up video.
  • the terminal 100 may display the user interface 110 shown in FIG. 1J.
  • the terminal 100 can change the shooting control 161 to the shooting control 112 to indicate to the user that the video recording has ended.
  • the terminal 100 may display logos representing the original video and the close-up video in the review control 114 .
  • the above-mentioned identification may be a thumbnail of the first frame of the above-mentioned original video, or a thumbnail of the first frame of the above-mentioned close-up video.
  • the user can browse the captured video through the review control 114 .
  • the terminal 100 can obtain two videos.
  • One of the above two videos is the above original video and the other is the above close-up video.
  • the terminal 100 may detect user operations on the lookback control 114 . In response to the above operation, the terminal 100 can display the above two videos for the user to browse.
  • FIG. 1K exemplarily shows the user interface 111 of the terminal 100 displaying the captured video.
  • User interface 111 may include window 191 .
  • Window 191 can be used to play captured video.
  • the terminal 100 may first play the original video captured based on the preview window 113 in the protagonist mode in the window 191 .
  • the terminal 100 may display the prompt 192.
  • Prompt 192 such as "Swipe left to browse the close-up video of the protagonist.” Through the above prompts, users can perform a left swipe operation to obtain a close-up video.
  • the terminal can detect a left sliding operation. In response to the above left swipe operation, the terminal 100 may play a close-up video centered on the protagonist shot in the protagonist mode. Referring to the user interface 112 shown in FIG. 1L and the user interface 113 shown in FIG. 1M, at this time, the close-up video shot based on the small window 141 can be played in the window 191.
  • the terminal 100 may encapsulate the close-up video centered on the protagonist as one close-up video.
  • the initially selected protagonist may disappear within the viewing range of the terminal 100 , and after a period of time, the initially selected protagonist may reappear in the terminal 100 within the viewing range.
  • the terminal 100 can also ignore the above interruption and encapsulate all the close-up images into one close-up video.
  • FIG. 1N exemplarily shows a schematic diagram in which the terminal 100 encapsulates all close-up images into one close-up video.
  • T1 can represent the moment when the video recording starts
  • T2 can represent the time when the video recording ends
  • T3 can represent the detection of the lost video of the protagonist (the user interface 107 shown in Figure 1G)
  • T4 can represent the re-detection of the protagonist. It is time for the protagonist (user interface 108 shown in Figure 1H).
  • the original images collected by the camera constitute the original video.
  • T1-T3 the close-up image centered on person 3 extracted based on the original image collected by the camera constitutes close-up video 1.
  • T3-T4 the close-up image centered on person 3 extracted based on the original image collected by the camera constitutes close-up video 2.
  • the terminal 100 can package the above close-up video 1 and close-up video 2 into one close-up video.
  • the terminal 100 may also save multiple close-up videos.
  • the terminal 100 may encapsulate the close-up image of the protagonist before the interruption into a close-up video 1, and encapsulate the close-up image of the protagonist after the interruption into a close-up video 2. Then, the terminal 100 can save the above close-up videos respectively.
  • the terminal 100 can also enable the protagonist mode through the method shown in FIGS. 1O-1P. As shown in FIG. 1O , in the video recording mode, the terminal 100 may display the protagonist mode control 1166 in the setting bar 116 . When a user operation on the protagonist mode control 1166 is detected, the terminal 100 may turn on the protagonist mode, see FIG. 1P.
  • the terminal 100 may also determine a new protagonist and shoot a close-up video centered on the new protagonist.
  • the terminal 100 may confirm that the loss of the protagonist is detected. At this time, the terminal 100 can close the small window 141 that displays the close-up image of the protagonist, and display a prompt 162 to instruct the user to adjust the camera position, thereby reacquiring a new image containing the protagonist.
  • the user can also select character 2 as the protagonist.
  • the terminal 100 may detect a user operation on the selection box 122 . In response to the above operation, the terminal 100 may determine a new protagonist: character 2.
  • the small window 141 can directly display the close-up image of the switched character 2, presenting a jumping display effect.
  • the small window 141 can also achieve a non-jumping protagonist switching display effect through a smoothing strategy. For example, after switching the protagonist to character 2, the terminal 100 can determine a set of smoothly moving image frames based on the path from character 3 to character 2 in the preview window 113, and then display the above image frames in the small window 141 to achieve non-stop operation. Jumping protagonist switching display.
  • the terminal 100 can also use a fixed transition effect to connect the close-up images of the protagonist before and after switching.
  • the above-mentioned fixed transition effects include overlay, vortex, translation, etc. commonly used in video editing. The embodiments of the present application do not limit this.
  • the switching effect of the close-up image displayed in the small window 141 may also refer to the switching effect when switching characters mentioned above.
  • the terminal 100 can directly display a close-up based on the 2 ⁇ original image in the small window 141 image; optionally, the terminal 100 can also determine a set of image frames with a gradient transition effect based on the 1 ⁇ and 2 ⁇ original images, thereby achieving a non-jumping focus switching display effect in the small window 141; optionally, the terminal 100 can also use fixed transition effects such as superposition, vortex, and translation to achieve a non-jumping focus switching display effect in the small window 141, which will not be described again here.
  • the terminal 100 can regenerate the small window 141 and display a close-up image centered on the new protagonist in the small window 141 . Then, the terminal 100 can continuously track the person 2 and display a close-up image of the person 2 in the small window 141 in real time.
  • the close-up video generated by the terminal 100 is a close-up video including a plurality of protagonists.
  • time T3 is the time when the loss of the initially selected protagonist (Character 3) is detected.
  • Time T5 is the time when it is detected that the user selects a new protagonist (Character 3).
  • close-up video 1 is a close-up video centered on the initially selected protagonist (Character 3); within T5-T2, close-up video 2 is centered on the re-selected protagonist (Character 2) Close-up video.
  • the terminal 100 can combine the above close-up video 1 and the close-up video 2 and package them into one video. frequency. Then, the terminal 100 can play the combined close-up video for the user to browse. Referring to the user interfaces 204 and 205 shown in Figures 2D to 2E, the above merged close-up video can be played in the window 191.
  • the protagonist of the previous close-up video that is, close-up video 1
  • the protagonist of the latter close-up video that is, close-up video 2
  • the terminal 100 may also first detect a user operation to start shooting on the shooting control 112 and start recording the video. During the process of recording a video, the terminal 100 can detect objects contained in the image in real time and display a selection box corresponding to each object. After detecting the user operation of clicking a certain selection box, the terminal 100 can determine that the object corresponding to the selection box is the protagonist, and display a small window showing a close-up image of the protagonist. At the same time, the terminal 100 can also record the close-up in the small window. image. In the above method, the video length of the close-up video must be smaller than the original video.
  • the above-mentioned local videos include original videos and close-up videos captured and saved during the aforementioned process.
  • FIG. 3A exemplarily shows a user interface 301 of the terminal 100 displaying locally saved videos and/or pictures.
  • user interface 301 may display multiple thumbnail icons.
  • a thumbnail icon corresponds to a video or picture obtained by a shooting operation.
  • the plurality of thumbnail icons may include icon 213 .
  • the icon 213 may correspond to the video generated by the aforementioned shooting operation shown in FIGS. 1E-1I.
  • the terminal 100 may detect user operations on the icon 213 . In response to the above operations, the terminal 100 may display the videos captured by the aforementioned shooting operations shown in FIGS. 1E-1I: original videos and close-up videos, refer to FIG. 3B.
  • user interface 302 shown in FIG. 3B may include window 221 .
  • Window 221 can be used to display captured video: original video and close-up video.
  • the window 221 can display the video 222 and the video 223.
  • video 222 is the original video shot in the protagonist mode.
  • Video 223 is a close-up video shot in protagonist mode.
  • the terminal 100 when the terminal 100 displays the user interface 302, the video 222 and the video 223 can be played simultaneously. This way, users can browse the original video and the close-up video at the same time. In some examples, the terminal 100 may also play the video 222 first and then the video 223 to facilitate user browsing.
  • the terminal 100 may detect a user operation, such as a click operation, acting on the video 222 or the video 223. Taking video 222 as an example, after detecting a click operation on video 222, the terminal 100 may display the user interface 111 shown in FIG. 1K to further display the original video. Correspondingly, after detecting a click operation on video 223, the terminal 100 may display the user interface 112 shown in FIG. 1L to further display the close-up video.
  • a user operation such as a click operation, acting on the video 222 or the video 223.
  • the terminal 100 may display the user interface 111 shown in FIG. 1K to further display the original video.
  • the terminal 100 may display the user interface 112 shown in FIG. 1L to further display the close-up video.
  • the terminal 100 can also directly display the user interface 111 shown in Figure 1K to display the original video. Then, after detecting the left swipe operation, the terminal 100 may display the user interface 112 shown in FIG. 1L to display the close-up video.
  • FIG. 3C exemplarily shows another user interface 303 of the terminal 100 displaying locally saved videos and/or pictures.
  • the terminal 100 may display two thumbnail icons, such as icon 231 and icon 232. These two thumbnail icons correspond to the original video and close-up video captured in protagonist mode.
  • icon 231 may correspond to the above-mentioned original video
  • icon 232 may correspond to the above-mentioned close-up video.
  • the terminal 100 may display the user interface 111 shown in FIG. 1K to display the original video. After detecting the user operation on the icon 232, the terminal 100 may display the user operation shown in FIG. 1L.
  • User interface 112 displays a close-up video.
  • users can browse the close-up video by swiping left or right.
  • users can browse the original video by swiping right or left.
  • the terminal 100 can automatically track the movement trajectory of the protagonist selected by the user in the image, and generate a close-up video that is always centered on the protagonist. Then, the terminal 100 can also save the close-up video and the original video at the same time for the user to browse and use, so as to meet the user's more diverse needs.
  • the original video can retain all the image content captured by the camera during the recording process.
  • Close-up videos can focus on displaying the video content of the protagonist selected by the user.
  • the terminal 100 can also change the protagonist in real time according to the user's operation, so as to meet the user's need to change the shooting protagonist and further enhance the user experience.
  • the terminal 100 can also perform object recognition and protagonist tracking on the local video that has been captured. Based on the tracked protagonist in each frame, the terminal 100 can perform editing operations such as cropping, combining, and packaging on the above-mentioned local video, thereby obtaining a close-up video centered on the protagonist.
  • FIG. 4A exemplarily shows a user interface 401 of the terminal 100 displaying locally saved videos and/or pictures.
  • the user interface 401 may display a plurality of thumbnail icons corresponding to locally saved videos and/or pictures, such as icon 411.
  • the icon 411 corresponds to a local video stored on the terminal 100.
  • the terminal 100 may detect a user operation on the icon 411. In response to the above operation, the terminal 100 may display the above local video. Referring to user interface 402 shown in FIG. 4B , user interface 402 may include window 412 . Window 412 may be used to display locally stored videos and/or pictures. At this time, the terminal 100 can play the local video corresponding to the above icon 411 in the window 412.
  • User interface 402 also includes menu bar 413.
  • the menu bar 413 includes one or more controls for setting pictures or videos, such as sharing controls, collection controls, editing controls, deleting controls, and so on.
  • the menu bar 413 also includes controls 414 for displaying more setting items. When a user operation on the control 414 is detected, the terminal 100 may display more setting items.
  • the terminal 100 may display the menu bar 413 .
  • the menu bar 413 may include more setting items, such as “detailed information", “category tags” and other setting items.
  • “Details” can be used to display the shooting information of the currently displayed picture or video, such as shooting time, shooting location, camera parameters, etc.
  • “Category Tag” can be used to set the tag of the currently displayed image or video, so that users can quickly obtain the image or video through the tag.
  • the menu bar 413 may also include a setting item “Extract Protagonist”. Extract Protagonist can be used to generate a close-up video centered on a selected protagonist.
  • the terminal 100 may detect a user operation on the “extract protagonist” setting item.
  • the terminal 100 may display the user interface 404 shown in FIG. 4D.
  • User interface 404 may be used to determine the protagonist, generate and save a close-up video centered on the protagonist.
  • user interface 404 may include window 420.
  • the window 420 can play the local video, that is, display the image frame sequence of the local video in sequence.
  • User interface 404 also includes progress bar 424.
  • the progress bar 424 can be used to indicate the playback progress; when the video is paused or played, the progress bar 424 can also be used to switch the currently displayed image frame, that is, the user You can change the playback progress by manually dragging the progress bar to switch the currently displayed image frame.
  • the window 420 may also display selection boxes corresponding to each object in the currently displayed image frame, such as selection boxes 421, 422, and 423.
  • the selection box 421 corresponds to the person 1 in the current image frame
  • the selection box 422 corresponds to the person 2 in the current image frame
  • the selection box 423 corresponds to the person 3 in the current image frame.
  • the window 420 only displays selection boxes corresponding to each object in the currently displayed image frame when a preset operation on the window 420 is detected.
  • the above-mentioned preset operation is an operation for pausing video playback; for example, the above-mentioned preset operation is an operation for dragging the progress bar to switch image frames; for example, the above-mentioned preset operation is a touch operation for the currently playing image frame , double-click operation or long-press operation, etc.
  • the terminal 100 may detect the user's operation of selecting character 3 as the protagonist, such as the operation of clicking the selection box 423 corresponding to character 3. In response to the above operation, the terminal 100 may determine that the character 3 is the protagonist, and then the terminal 100 may sequentially determine the position of the character 3 in subsequent image frames, and determine the size of the close-up image centered on the character 3. By combining the close-up images centered on the character 3, the terminal 100 can obtain the close-up video of the character 3.
  • the terminal 100 may also detect the user operation of determining the protagonist when the window 420 displays any image frame after the first frame image of the local video. For example, referring to the user interface 405 shown in FIG. 4E, when the i-th frame is played, the terminal 100 may detect the user's operation of selecting character 3 (or other objects) as the protagonist.
  • the window 420 may also automatically display selection boxes corresponding to each object in each currently displayed image frame. In this way, it is convenient for the user to switch the protagonist at any time through the selection box corresponding to the subsequent image frame.
  • the window 420 displays a selection box corresponding to each object in the currently displayed image frame. In this way, the selection box is displayed based on the preset operation only when the user intends to switch the protagonist, which can save the energy consumption of object recognition.
  • User interface 404 may also include controls 425 .
  • Control 425 may be used to save the currently generated close-up video. For example, after detecting the user operation on the control 425, the terminal 100 may save the close-up video of the above-mentioned character 3 to the local storage space. After completing the saving operation, the terminal 100 may display the user interface 407 shown in FIG. 4G to display the above-mentioned saved close-up video. At this time, users can browse the above close-up video at any time.
  • the terminal 100 can also support the user to switch the protagonist to obtain a close-up video including multiple objects.
  • the terminal 100 may determine that the current protagonist is character 3. Then, referring to the user interface 408 shown in FIG. 4H , when the terminal 100 displays an image including at least one object in any frame after the first frame of the image (for example, the Nth frame), the terminal 100 may also display the selection corresponding to the at least one object. frame, for example, the selection box 422 corresponding to character 2; the terminal 100 can detect the user's operation of clicking the selection box 422 (corresponding to character 2), and in response to the above operation, the terminal 100 can switch the protagonist to character 2. At this time, the protagonist from the M-th frame to the N-1th frame is character 3, and the protagonist from the N-th frame to the end of the video is character 2.
  • the user starts to select the protagonist as character 3 when playing the M-th frame image of the local video (for example, the first frame image shown in Figure 4D), and when playing the N-th frame image of the local video (for example, the first frame image shown in Figure 4D (image shown in 4H), switch the protagonist from character 3 to character 2, and then save the close-up video, then the Mth frame image in the local video will be the N-1th
  • the protagonist of the frame image is character 3
  • the protagonist from the Nth frame image to the end of the video is character 2.
  • the first half of the close-up video saved by the terminal 100 is a close-up video centered on character 3, which is generated based on the image of character 3 in the M-th frame image to the N-1-th frame image of the local video; the above
  • the second half of the close-up video is a close-up video centered on character 2, which is generated based on images including character 2 from the Mth frame to the last frame of the local video.
  • the terminal 100 can also save two close-up videos respectively, that is, the close-up video centered on person 3 and the close-up video centered on person 2.
  • the user interface for editing protagonist and close-up videos shown in Figure 4D may also be as shown in Figure 5A.
  • the terminal 100 may first traverse the currently displayed local video and determine all objects included in the video. At this time, the terminal 100 can display all the above objects, such as Person 1, Person 2, and Person 3 in the user interface 501. Then, the terminal 100 can detect the user operation on any of the above characters, determine that the selected character is the protagonist, and then obtain a close-up image of the protagonist based on the image of the protagonist in the local video; and then combine the close-up images of the above protagonists , for close-up videos centered on the aforementioned protagonists.
  • the terminal 100 may detect the user operation on any of the above characters, determine that the selected character is the protagonist, and then obtain a close-up image of the protagonist based on the image of the protagonist in the local video; and then combine the close-up images of the above protagonists , for close-up videos centered on the aforementioned protagonists.
  • the user can also set multiple protagonists to obtain a close-up video including multiple protagonists, or a close-up video corresponding to multiple protagonists.
  • the user interface 501 may also include a split control 511.
  • the split control 511 can split the local video displayed in the window 420 into multiple video segments.
  • the terminal 100 may detect a user operation acting on the split control 511.
  • the terminal 100 may display the user interface 502-2 shown in FIG. 5B-2.
  • the user can divide the local video into one or more video segments by performing a dividing operation on the progress bar 424 .
  • the terminal 100 may detect the user's operation of clicking the progress bar 424. In response to the above user operation, the terminal 100 may display the user interface 502-3 shown in FIG. 5B-3. At this time, the terminal 100 may display the dividing frame 512 on the progress bar 424. Then, the user can divide the local video into two video segments through the above-mentioned dividing box 512.
  • the terminal 100 can divide the original local video into two segments. At this time, 0:00-2:30 is a video (video segment 1); 2:30-4:00 is a video (video segment 2). The currently selected video segment can be represented in black. Furthermore, when a video segment is selected, the user can further divide the video segment into two video segments through the split control 511. In this way, the terminal 100 can divide the original video into multiple video segments.
  • terminal 100 can determine all objects included in video segment 1, and then display them, such as characters 1, 2, and 3. The user can select any object from all displayed objects as the protagonist. For example, the terminal 100 may determine that the character 3 is the protagonist of the video segment 1 based on the detected user operation acting on the character 3 .
  • the terminal 100 may detect the user's operation of clicking on video segment 2.
  • the terminal 100 may display the user interface 504 shown in FIG. 5D.
  • the terminal 100 can display all objects included in the video segment 2, such as character 1 and character 2 (the video segment 2 does not include character 3).
  • the user can select character 2 as the protagonist of video segment 2.
  • terminal 100 may detect operations on control 425 .
  • the terminal 100 may Save the close-up video based on the above local video.
  • the protagonist in the local video from 0:00 to 2:30 is character 3, and the protagonist from 2:30 to 4:00 is character 2.
  • the first half of the close-up video is a close-up video centered on character 3. It is generated based on the image of character 3 from 0:00-2:30 of the local video; the second half of the above close-up video is a close-up video centered on character 2 and is based on 2:30-4 of the local video: 00 includes the image of character 2.
  • the terminal 100 can also store two close-up videos respectively, that is, the close-up video centered on character 3 and the close-up video centered on character 2.
  • the terminal 100 can perform object recognition and protagonist tracking on the local video that has been captured, and then the terminal 100 can generate and save a close-up video centered on the protagonist. In this way, for any video stored on the terminal 100, the user can use the above method anytime and anywhere to obtain a close-up video of any object in the video, thereby meeting the user's personalized editing needs.
  • FIG. 6 exemplarily shows a flow chart for the terminal 100 to generate a close-up video of the protagonist during the shooting process.
  • the video editing method corresponding to the protagonist mode requires real-time identification and marking of objects (such as people, animals, plants, etc.) in images collected by the camera. This requires occupying a large amount of computing resources of the terminal 100. Therefore, in the embodiment of the present application, the protagonist mode is turned off by default when the camera is turned on.
  • objects such as people, animals, plants, etc.
  • the terminal 100 can provide the user with a control for turning on or off the protagonist mode, which is recorded as the first control. After detecting a user operation on the first control, the terminal 100 can turn on the protagonist mode and execute the shooting algorithm corresponding to the protagonist mode, such as identifying objects in the image, protagonist tracking, and so on. For example, in the user interface 102 shown in FIG. 1B , the protagonist mode option in the mode bar 111 may be called a first control. After detecting a user operation on the protagonist mode option, the terminal 100 may provide the user with the shooting service shown in FIGS. 1B-1I.
  • users can determine whether to turn on the protagonist mode according to their own needs, thereby avoiding occupying the computing resources of the terminal 100, reducing the computing efficiency of the terminal 100, and affecting the user experience.
  • S602 Perform object detection on the i-th frame image collected by the camera, and determine the objects included in the i-th frame image.
  • the terminal 100 in the protagonist mode, the terminal 100 needs to determine the protagonist according to the user's selection operation. At this time, the terminal 100 needs to first recognize the objects included in the images collected by the camera, and then mark the recognized objects. In this way, the user can select any object as the protagonist among the above-mentioned recognized objects. Correspondingly, the terminal 100 can also determine the protagonist based on user operations.
  • FIG. 7A exemplarily shows a flowchart for the terminal 100 to recognize objects in the image after turning on the protagonist mode.
  • S701 Perform face recognition and human body recognition on the i-th frame image collected by the camera, and determine the face image and human body image in the i-th frame.
  • the terminal 100 may include a face recognition algorithm and a human body recognition algorithm as thresholds.
  • Face recognition algorithms can be used to identify faces in images.
  • Human body recognition algorithms can be used to identify human body images in images, including faces, bodies, and limbs.
  • the terminal 100 can execute the face recognition algorithm and the human body recognition algorithm respectively, and then determine the face image and human body image in the i-th frame image.
  • the i-th frame image is any frame image collected by the camera after turning on the protagonist mode.
  • the terminal 100 can determine that the frame image includes faces face1, face2, and face3; through the human body recognition algorithm, the terminal 100 can determine that the frame image includes human bodies body1, body2 , body3.
  • S702 Match the recognized face image and human body image to determine the object included in the i-th frame image.
  • the terminal 100 can calculate the intersection over union (IoU) of each face image and body image, which is recorded as IoU face&body . Then, the terminal 100 can use the above-mentioned IoU face&body to match the recognized face image and the human body image, and determine the object included in the i-th frame image.
  • IoU intersection over union
  • the intersection between the face of any one of the two non-overlapping people in the image and the other person's human body is 0, while the intersection with the own human body is basically close to the own face. Therefore, the smaller the IoU face&body is, the closer it is to 0, and the face and human body corresponding to the IoU face& body do not match, that is, they cannot be regarded as the face and human body of the same person.
  • the first threshold M1 may be preset in the terminal 100 .
  • the face corresponding to the IoU face&body matches the human body, otherwise, it does not match.
  • a matching set of face images and body images identifies an object.
  • the terminal 100 can determine the M objects included in the i-th frame image based on the recognized face image and human body image.
  • the terminal 100 can calculate the IoU of face1, face2, face3, and body1, body2, and body3 respectively. Taking face1 as an example, the IoU values of face1, body2, and body3 are all 0, and the IoU values of face1 and body1 are not 0 and satisfy M1. At this time, the terminal 100 can determine that face1 and body1 can constitute an object (i.e. Character 1). Similarly, the terminal 100 can determine that face2 and body2 can constitute an object (ie, character 2), and face3 and body3 can constitute an object (ie, character 3).
  • the terminal 100 may no longer calculate the IoU of the face image and human body image that have constituted an object. For example, the terminal 100 may first calculate the IoU between face1 and all bodies (body1, body2, and body3). At this time, the terminal 100 can determine body1 that matches face1. Therefore, the terminal 100 can determine that face1 and body1 constitute an object. Then, the terminal 100 can calculate the IoU of face2 and all remaining bodies (body2, body3). At this time, the terminal 100 can no longer calculate the IoU of face2 and body1 to reduce redundant calculations and improve calculation efficiency.
  • the terminal 100 may also directly use the human body detection algorithm to identify the object in the i-th frame image. At this time, the terminal 100 does not need to match the face image and the human body image.
  • the above method can better identify the objects included in the i-th frame image in single-object scenes and in scenes with multiple objects and non-overlapping between the objects.
  • the accuracy of the above method in identifying the objects included in the i-th frame image is low, and it is easy to cause recognition misalignment or not recognize the overlapping people at all.
  • the object recognition method shown in S701-S702 can more stably and correctly identify multiple objects included in the image frame.
  • the above-mentioned object recognition algorithm also includes a recognition algorithm for specific animals and a recognition algorithm for specific plants.
  • the terminal 100 can identify whether the i-th frame image includes objects of types such as animals, plants, etc.
  • the terminal 100 can set the above-mentioned animals, plants and other objects as protagonists.
  • the object recognition algorithm's identification of objects that can support recognition depends on the developer's presets.
  • S603 Display the i-th frame image and markers corresponding to each object in the i-th frame image.
  • the terminal 100 may create tags respectively corresponding to the above-mentioned M objects.
  • the terminal 100 may display the above mark at the same time. This mark can be used to prompt the user terminal 100 to recognize an object that can be determined as a protagonist. Further, the mark can be used by the user to instruct the terminal 100 to determine which object is the protagonist.
  • the terminal 100 can determine three markers corresponding to the above three objects.
  • the above mark may be a selection box in the preview window 113 .
  • the terminal 100 may display the selection boxes 121, 122, and 123. Among them, the selection boxes 121, 122, and 123 are respectively used to mark Person 1, Person 2, and Person 3 in the image.
  • the user can not only browse the images collected by the camera in the preview window 113, but also obtain the objects recognized by the terminal 100, that is, the protagonists that support setting. Furthermore, the user can also click on any selection box (for example, selection box 123) to determine that the object (character 3) corresponding to the selection box is the protagonist. After detecting the user operation of clicking any selection box, the terminal 100 may set the object corresponding to the clicked selection box as the protagonist. Subsequently, the terminal 100 can locate the above-mentioned protagonist in the image sequence collected by the camera, thereby realizing protagonist tracking and generating a close-up video of the protagonist.
  • any selection box for example, selection box 123
  • the terminal 100 can display a selection box on the image of the above-mentioned animal or plant accordingly. Users can also choose the above-mentioned animals and plants as protagonists.
  • the display position of the selection box can be determined based on the face image and the human body image.
  • FIG. 7C exemplarily shows a schematic diagram in which the terminal 100 determines the display position of the selection box.
  • the terminal 100 can determine the midpoints of the face image and the human body image: the midpoint P1 of the face image and the midpoint P2 of the human body image.
  • the terminal 100 can determine the midpoint P3 of the object (ie, the person 3) corresponding to the above face image and the human body image.
  • the midpoint of the selection frame 123 is the above-mentioned P3.
  • the user can see the i-th frame image of the camera on the screen of the terminal 100, as well as the marks (selection boxes) corresponding to each object in the i-th frame image.
  • the marks selection boxes
  • the terminal 100 may detect a user operation acting on any mark. In response to the above operation, the terminal 100 may determine that the object corresponding to the above mark is the protagonist, and set the frame index number FrameID of the i-th frame image to 1.
  • the terminal 100 may detect a user operation acting on the selection box 123 .
  • the above-mentioned character 3 is the first object
  • the user operation performed on the selection box 123 is the operation of selecting the first object.
  • FrameID can be used to reflect which frame image the frame image is to determine the protagonist.
  • the above close-up image refers to an image in which the main image is cropped with the selected protagonist as the center based on the original image collected by the camera (the image displayed in the preview window), and the resulting image content is the protagonist.
  • the terminal 100 may determine a close-up image centered on the protagonist corresponding to the i-th frame image based on the i-th frame image. For example, in the user interface 403, after determining that the character 3 is the protagonist, the terminal 100 can crop the image currently displayed in the preview window 113 with the character 3 as the center to obtain a close-up image of the character 3 as the image content.
  • FIG. 8A exemplarily shows a flowchart in which the terminal 100 determines a close-up image centered on a protagonist.
  • S801 Determine the zoom ratio ZoomRatio of the i-th image frame based on the human body image of the protagonist in the i-th frame image.
  • the protagonist image occupies a smaller image area in the entire original image. At this time, the size of the close-up image centered on the protagonist becomes smaller. On the contrary, the larger the image area occupied by the protagonist image in the entire original image, the larger the size of the close-up image with the protagonist as the center.
  • the close-up image of character 1 expected to be displayed in the small window should be an image surrounded by a dotted frame 61.
  • the size of the dotted frame 61 is the size of the close-up image of the protagonist in the i-th frame image.
  • the close-up image of character 3 expected to be displayed in the small window should be an image surrounded by a dotted frame 62.
  • the size of the dotted frame 62 is the size of the close-up image of the protagonist in the i-th frame image. It can be seen from this that in order to ensure the integrity of the protagonist, the terminal 100 needs to determine the size of the close-up image according to the size of the protagonist in the original image.
  • the zoom ratio ZoomRatio can be used to reflect the size of the main character in the original image. After determining the ZoomRatio, the terminal 100 may determine the size of the close-up image of the protagonist in the current frame.
  • the terminal 100 can use a preset human body recognition algorithm to recognize human body images in the image, such as body1, body2, body3, etc. After determining that character 3 is the protagonist, the terminal 100 may determine the ZoomRatio using the size of the human body image (body3) of character 3.
  • maxBboxSize refers to the size of the largest recognized human body image
  • detectBboxSize refers to the size of the protagonist's human body image
  • minZoomRatio is the minimum value of the preset ZoomRatio
  • maxZoomRatio is the maximum value of the preset ZoomRatio.
  • the terminal 100 can determine the ZoomRatio[i] of the i-th frame image.
  • S802 Determine the size of the close-up image of the protagonist corresponding to the i-th frame image according to ZoomRatio[i]: CropRagionWidth, CropRagionHeight.
  • CropRagionWidth is used to represent the width of the close-up image
  • CropRagionHeight is used to represent the height of the close-up image.
  • CropRagionWidth and CropRagionHeight can be determined based on the ZoomRatio introduced above. Specifically, the calculation formulas (Q2, Q3) of CropRagionWidth and CropRagionHeight are as follows:
  • WinWidth is used to represent the width of the small window
  • WinHeight is used to represent the height of the small window.
  • the CropRagionWidth and CropRagionHeight obtained based on WinWidth, WinHeight and ZoomRatio can just correspond to the width and height of the small window, thus avoiding the problem of image deformation when displaying close-up images in the small window.
  • the value of WinWidth may be 1080p (pixel), and the value of WinHeight may be 1920p.
  • the value of WinWidth can be 1920p and the value of WinHeight can be 1080p.
  • S803 Crop the i-th frame image according to CropRagionWidth, CropRagionHeight and the object midpoint, and determine the protagonist's close-up image corresponding to the i-th frame image.
  • the terminal 100 can crop the original image to obtain a close-up image centered on the protagonist.
  • the image in the area formed by P3 as the center, the width and height being CropRagionWidth and CropRagionHeight respectively, is a close-up image of the protagonist (Character 3).
  • S606 Display the above close-up image in a small window to generate a frame of the close-up video.
  • the terminal 100 may generate a small window for displaying a close-up image.
  • the small window can be embedded in the preview window in a picture-in-picture form.
  • a small window 141 is embedded in the preview window 113 in the form of a picture-in-picture.
  • the above-mentioned small window may be a rectangle with an aspect ratio of 9:16 (vertical window) or 16:9 (horizontal window).
  • the preview window and the small window can also be arranged in other ways, and the small window can also present other sizes and shapes.
  • the terminal 100 may divide the preview window 113 into two windows arranged side by side on the left and right. One window is used to display the original image collected by the camera in real time; the other window is used to display the close-up image centered on the protagonist.
  • the embodiments of this application do not limit the specific form used to display close-up images.
  • the terminal 100 may determine the close-up image centered on the protagonist corresponding to the i-th frame. At this time, the terminal 100 may display a close-up image centered on the protagonist in the above-mentioned small window.
  • the width CropRagionWidth and height CropRagionHeight of the close-up image are respectively equal to the width WinWidth and height WinHeight of the small window used to display the close-up image, see Figure 8C.
  • CropRagionWidth 1080p
  • CropRagionHeight 1920p
  • WinWidth 1080p
  • WinHeight 1920p.
  • the close-up images cropped according to 1080p and 1920p just match the small window, and the terminal 100 The above close-up image can be displayed directly in the small window.
  • the CropRagionWidth, CropRagionHeight and WinWidth, WinHeight of the close-up image are not equal.
  • the terminal 100 can adaptively adjust the close-up image to obtain a close-up image that matches the size of the small window, and then display the close-up image in the small window.
  • the terminal 100 can enlarge the close-up images of 540p and 960p in equal proportions to obtain close-up images of 1080p and 1920p. In this way, the terminal 100 can also display the above close-up image in a small window.
  • the above-mentioned close-up image that has undergone adaptive adjustment processing and is sent to the small window for display is one frame of the close-up video.
  • the terminal 100 can set the FrameID of the i-th frame image to 1 to indicate that the frame is the protagonist.
  • the first frame of the image is the first frame of the image.
  • the terminal 100 can also directly use the similarity to determine the protagonist in the j-th frame image.
  • the object with the highest similarity and higher than the similarity threshold in the j-th frame image can be determined as the protagonist.
  • the terminal 100 can also use method two to locate the protagonist in the j-th frame image.
  • Figure 9 exemplarily shows a second flow chart for locating the protagonist in the j-th frame image.
  • the terminal 100 still needs to perform object recognition on the j-th frame image to determine the objects included in the j-th frame image. Then, the terminal 100 may determine whether the objects in the j-th image frame overlap, and then determine to use different protagonist positioning methods to determine the protagonist in the j-th image frame based on whether the objects in the j-th image frame overlap.
  • the terminal 100 can pass all objects in the j-th image frame
  • the intersection-union ratio distance (IoU distance, recorded as [IoU]) with the protagonist in j-1 is used to determine the protagonist of the j-th frame image.
  • the terminal 100 can determine the protagonist of the j-th frame image through the IoU distance and re-identification distance (ReID distance, denoted as [ReID]) of all objects in the j-th frame image from the protagonist in j-1 .
  • the terminal 100 may use the human body ranges (ie, human body frames) of the multiple objects recognized by the human body detection algorithm. At this time, the terminal 100 can use whether the human body frames intersect to determine whether the objects in the j-th frame image overlap. As shown in FIG. 10A , if any two objects in the j-th frame image do not overlap (the human body frames of any two objects do not intersect), such as person 3 and person 4, then the terminal 100 determines that the objects in the j-th frame image do not overlap. As shown in FIG. 10B , if there are at least two overlapping objects in the j-th frame image, such as person 3 and person 4, then the objects in the j-th frame image overlap.
  • human body ranges ie, human body frames
  • the terminal 100 can determine the protagonist of the j-th frame image through the IoU distance of all objects in the j-th frame image to the protagonist in the j-1 th frame.
  • the terminal 100 may first determine the IoU distances of all objects in the j-th frame image from the protagonist in j-1, and determine the minimum IoU distance [IoU] min in the j-th frame image.
  • the dotted box 3 can represent the human body frame of the protagonist in the j-1th frame; the dotted box 1' can represent the human body frame of the character 1 in the jth frame; the dotted box 2' can represent the jth frame
  • the terminal 100 can determine the intersection ratio of the dotted box 3 and the dotted box 1', which is recorded as IoU 31 . Therefore, the terminal 100 can determine the IoU distance [IoU 31 ] between character 1 in the j-th frame and the protagonist in the j-1th frame:
  • the terminal 100 can obtain the IoU distances between characters 2, 3, and 4 in the j-th frame and the protagonist in the j-1th frame: [IoU 32 ], [IoU 33 ], [IoU 34 ].
  • the terminal 100 can determine that [IoU] min in the j-th frame image is [IoU 33 ].
  • the object in the j-th frame image that has the smallest IoU distance from the protagonist in j-1 is not necessarily the protagonist.
  • the IoU distance between characters 1, 2, and 4 in the j-th frame image and the protagonist in j-1 is 1, and the IoU distance between character 3 in the j-th frame image and the protagonist in j-1 is 0.9. .
  • the IoU distance between character 3 and the protagonist is the smallest.
  • the IoU distance between character 3 and the protagonist is very large. Therefore, if the object with the smallest IoU distance (Character 3) is directly determined as the protagonist, misidentification of the protagonist will easily occur, which will lead to the failure of automatic tracking of the protagonist and affect the user experience.
  • the terminal 100 after determining [IoU] min in the j-th frame image, the terminal 100 also needs to determine whether the above-mentioned [IoU] min is less than the preset IoU distance threshold (denoted as D1).
  • the terminal 100 can mark that the protagonist in the frame image is missing (the protagonist is not matched).
  • Ran The rear terminal 100 may determine whether to terminate the protagonist tracking based on the currently accumulated number of image frames lost by the protagonist, which will not be discussed here.
  • IoU distance is an optional indicator to determine the similarity between each object in the next frame of video and the protagonist in the previous frame.
  • the terminal 100 may also choose other indicators.
  • the terminal 100 can also directly use IoU to determine the degree of similarity between all objects in the j-th frame image and the protagonist in the j-1th frame. At this time, the object in the j-th frame image that has the largest IoU with the protagonist in j-1 and is greater than the IoU threshold can be confirmed as the protagonist.
  • the terminal 100 may determine the protagonist of the j-th frame image through the IoU distance and ReID distance of all objects in the j-th frame image from the protagonist in the j-1th frame.
  • the terminal 100 cannot determine the protagonist of the j-th frame image only through the IoU distance of all objects in the j-th frame image to the protagonist in the j-1th frame.
  • an object that overlaps with the protagonist in the j-1th frame image may appear in the j-th frame image at the position of the original protagonist in the j-1th frame image.
  • the IoU distance between the above-mentioned object and the protagonist is the closest, but the object is not the protagonist. This can easily lead to misidentification.
  • the terminal 100 in addition to using the IoU distance of each object in the two frames of images to determine the protagonist, the terminal 100 also needs to determine whether the object in each position is the protagonist originally determined by the user. At this time, the terminal 100 also needs to calculate the ReID distance between all objects in the j-th image and the protagonist in the j-1-th image. ReID distance is based on the use of neural networks and is a parameter that reflects the degree of similarity between image contents.
  • FIG. 11 exemplarily shows a schematic diagram of the terminal 100 determining the ReID distance between each object in the j-th image and the protagonist in the j-1-th image.
  • the terminal 100 can determine the feature vector F0 of the protagonist in the j-1th frame image.
  • CNN convolutional neural network
  • the terminal 100 can determine the feature vectors F1 to F4 of each object (people 1 to 4) in the j-th frame image.
  • the terminal 100 can calculate the inner product of the feature vectors (F1 ⁇ F4) of each object in the j-th frame image and the feature vector F0 of the protagonist in the j-1th frame image: ⁇ F0, F1>, ⁇ F0, F2>, ⁇ F0,F3>, ⁇ F0,F4>.
  • the terminal 100 can determine the ReID distance between the character 1 and the protagonist (denoted as [ReID ] 31 ):
  • the terminal 100 can obtain the ReID distances between characters 2, 3, and 4 in the j-th frame and the protagonist in the j-1th frame: [ReID] 32 , [ReID] 33 , [ReID] 34 .
  • the terminal 100 may determine the minimum ReID distance [ReID] min in the jth frame image. Referring to FIG. 11 , at this time, the terminal 100 can determine that [ReID] min in the j-th frame image is [ReID] 33 .
  • the terminal 100 can determine the IoU+ReID distance between each of the above objects and the protagonist, that is, the sum of the IoU distance and the ReID distance, which is recorded as [IoU+ReID].
  • the smaller [IoU+ReID] means the smaller the IoU between the object and the protagonist, and the smaller the ReID distance.
  • Image-wise the object is located close to the original protagonist and has similar image content. Therefore, the terminal 100 can determine the protagonist of the j-th frame image using [IoU+ReID].
  • the smaller the [IoU+ReID] the more likely the object is to be the protagonist.
  • the terminal 100 after determining [IoU+ReID] min in the j-th frame image, the terminal 100 also needs to determine whether the above-mentioned [IoU+ReID] min is less than the preset IoU+ReID distance threshold (denoted as D2). If [IoU+ReID] min ⁇ D2, the terminal 100 may determine that the object corresponding to the above [IoU+ReID] min is the protagonist. On the contrary, if [IoU+ReID] min ⁇ D2 does not hold, the terminal 100 can mark that the main character in the frame image is missing.
  • the terminal 100 may also periodically execute the method of locating the protagonist in the j-th frame image shown in FIG. 9 , see FIG. 12A .
  • KCF Kernel Correlation Filter
  • the terminal 100 can avoid calculating the IoU distance and ReID distance every time, thereby saving computing resources and improving computing efficiency.
  • S608 Confirm whether the protagonist is matched. If so, perform the steps shown in S605; if not, determine whether the lost frames are less than the lost frame number threshold Y.
  • the terminal 100 may calculate the ZoomRatio every few frames, for example, calculate the ZoomRatio every 4 frames.
  • the terminal 100 can perform smoothing processing in the process of determining the close-up image to avoid image jumps.
  • the terminal 100 can modify the missing frame Count: Increases the count of lost frames by 1.
  • the number of lost frames refers to the number of image frames of the protagonist that the terminal 100 has not recognized continuously. Then, the terminal 100 may determine whether to end the protagonist tracking based on the number of lost frames.
  • the terminal 100 may be set with a lost frame number threshold Y. If the number of currently recorded lost frames ⁇ Y, the terminal 100 may determine that the objects captured by the camera no longer include the protagonist initially selected by the user. At this time, the terminal 100 can confirm that the protagonist tracking is completed. If the number of currently recorded lost frames is ⁇ Y, the terminal 100 can obtain the next frame image (the j+1th frame image), Determine whether the next frame includes the main character. It can be understood that when determining whether the next frame image includes the protagonist, the terminal 100 can perform the protagonist tracking calculation shown in Figure 9 on the next frame image and the last frame image previously matched to the protagonist, and determine that the protagonist Whether there is a protagonist in the next frame of the image.
  • next frame image is the j-th frame image
  • the last frame image matched to the protagonist is the j-1th frame image.
  • the terminal 100 may maintain the initially set protagonist and continue to locate the protagonist in subsequent images collected by the camera. After re-detecting the above-mentioned protagonist, the terminal 100 can continue to record the close-up video of the protagonist.
  • the terminal 100 can close the small window 141 and stop recording the close-up video of the character 3 .
  • the terminal 100 can re-display the small window 141 and restart recording the close-up video of the person 3.
  • the terminal 100 may also instruct the user to select a new protagonist. After detecting the user operation of selecting the protagonist again, the terminal 100 can determine the new protagonist, locate the new protagonist in subsequent images, and simultaneously display and save a close-up video of the new protagonist.
  • the terminal 100 can re-determine character 2 as the protagonist, generate a small window 141 and A close-up image of the character 2 is displayed in the small window 141 .
  • the terminal 100 supports switching protagonists during shooting. At this time, the terminal 100 may also determine whether the j-th frame image corresponds to a user operation of switching the protagonist after acquiring the j-th frame image, so as to change the protagonist and display the close-up image in the small window 141.
  • the terminal 100 can determine whether a user operation of switching the protagonist is detected, for example, clicking the selection box 122 corresponding to the character 2 shown in Figures 2A-2B switches the protagonist character 3 to the character. 2 user operations.
  • the terminal 100 may determine that character 2 in the j-th frame image is the protagonist. Then, the terminal 100 can reset the FrameID of the jth frame image to 1 (S604), obtain the image frame after the image frame, and locate the person 2 in the subsequent image frame, thereby displaying the new image in the small window 141. A close-up image of the main character 2.
  • FIG. 13 illustrates a flow chart for the terminal 100 to edit the captured local video, generate and save the close-up video.
  • the first control may be the "Extract Protagonist" option control in the menu bar 413 .
  • the user operation of the user clicking on the above-mentioned "Extract Protagonist” can be called a user operation acting on the first control.
  • the terminal 100 can obtain the image frame sequence of the local video and determine the objects included in each image frame.
  • the terminal 100 can display each image frame in the local video.
  • the terminal 100 will also display marks (selection boxes) corresponding to each object.
  • the terminal 100 can determine the protagonist among the plurality of objects based on user operations.
  • the terminal 100 can traverse subsequent image frames at a time, determine the protagonist in the subsequent image frame, thereby obtain a subsequent close-up image centered on the protagonist, and generate a close-up video. This process is the same as the method of determining the protagonist in the real-time shooting process shown in Figures 1A-1I, and will not be described again here.
  • the terminal 100 may not record the number of missing frames.
  • the terminal 100 only needs to determine whether the local video has been traversed, that is, whether the j-th frame image is the last frame of the local video. If the video is not over, that is, the j-th frame image is not the last frame of the local video, the terminal 100 can continue to obtain the next frame image and locate the protagonist in the next frame image.
  • the terminal 100 may not display the close-up image. This is because, during the editing process of the local video, the terminal 100 does not need to play the local video, so the terminal 100 does not need to play the close-up video during the editing process. After the editing is completed, the terminal 100 can save the above close-up video. Users can then browse the above close-up video at any time.
  • Figure 14 is a schematic system structure diagram of the terminal 100 provided by the embodiment of the present application.
  • the layered architecture divides the system into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the system is divided into five layers, from top to bottom: application layer, application framework layer, hardware abstraction layer, driver layer and hardware layer.
  • the application layer can include a series of application packages.
  • the application package may include a camera, a gallery, etc.
  • the application framework layer provides application programming interface (API) and programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a camera access interface and a video editing interface.
  • the camera access interface may include camera management and camera equipment.
  • the camera access interface is used to provide an application programming interface and programming framework for camera applications.
  • the video editing interface is used to provide an application programming interface and programming framework for editing pictures and/or videos for the gallery application.
  • the application programming interface and programming framework for editing videos provided by the video editing interface are mainly used.
  • the hardware abstraction layer is the interface layer between the application framework layer and the driver layer, providing a virtual hardware platform for the operating system.
  • the hardware abstraction layer may include a camera hardware abstraction layer and a camera algorithm library.
  • the camera hardware abstraction layer can provide virtual hardware of camera device 1, camera device 2 or more camera devices.
  • the camera algorithm library may include running code and data to implement the video editing method provided by the embodiment of the present application.
  • the driver layer is the layer between hardware and software.
  • the driver layer includes drivers for various hardware.
  • the driver layer may include camera device drivers, digital signal processor drivers, image processor drivers, etc.
  • the camera device driver is used to drive the sensor of the camera to collect images and drive the image signal processor to preprocess the images.
  • the digital signal processor driver is used to drive the digital signal processor to process images.
  • the image processor driver is used to drive the graphics processor to process images.
  • the camera application In response to the user's operation of opening the camera application, such as clicking the camera application icon, the camera application calls the camera access interface of the application framework layer, starts the camera application, and then calls the camera device (camera device 1 and/or Other camera devices) send instructions to start the camera.
  • the camera hardware abstraction layer sends this instruction to the camera device driver at the kernel layer.
  • the camera device driver can start the corresponding camera sensor and collect image light signals through the sensor.
  • a camera device in the camera hardware abstraction layer corresponds to a camera sensor in the hardware layer.
  • the camera sensor can transmit the collected image optical signal to the image signal processor for pre-processing to obtain the image electrical signal (original image), and transmit the above-mentioned original image to the camera hardware abstraction layer through the camera device driver.
  • the camera hardware abstraction layer can send the above raw image to the display for display.
  • the camera hardware abstraction layer sends raw images to the camera algorithm library.
  • the camera algorithm library stores program codes that implement the video editing methods (processing processes such as object recognition, protagonist tracking, and cropping close-up images) provided by the embodiments of this application. Based on the digital signal processor and image processor, executing the above code, the camera algorithm library can also output the recognized objects in the image frame and determine the close-up image centered on the protagonist, thereby locating the protagonist in the original image and cropping the protagonist. Features centered close-up image.
  • the camera algorithm library can send determined close-up images to the camera hardware abstraction layer.
  • the camera hardware abstraction layer can then send it to the display. This allows the camera app to display a close-up image centered on the selected protagonist alongside the original image.
  • the camera hardware abstraction layer can also write the original image sequence and the close-up image sequence to a specific storage space.
  • the terminal 100 can realize the function of recording video, and save the original image stream collected by the camera in real time and the close-up image stream obtained based on the original image as local videos (original video and close-up video).
  • the camera application calls the image editing interface of the application framework layer, and then calls the camera algorithm library stored in the camera algorithm library to implement the present application.
  • the embodiment provides the program code of the video editing method.
  • the camera algorithm library executes the above code to realize the function of locating the protagonist in the original image, cropping the close-up image centered on the protagonist, and then realizing the function of editing the original video to obtain the close-up video.
  • the mark corresponding to each object in the image may also be called a selection box; the second video may also be called a close-up video.
  • the first interface may be the user interface 102 shown in Figure 1B; the first image may be an image collected by the camera displayed in the preview window 113 in the user interface 102.
  • the first image is the image in Figure 1B The image displayed in the preview window 113, or the i-th frame image collected by the aforementioned camera; referring to FIG. 1C, the first mark may be the selection box 123 corresponding to the character 3, and the first operation may be an input operation acting on the selection box 123.
  • the fifth operation may be an input operation acting on the selection box 122; the first sub-video may be the aforementioned close-up video centered on character 3, and the second sub-video may be the aforementioned close-up video centered on character 2.
  • the first video may also be called a local video.
  • the first video may be the local video corresponding to the aforementioned icon 411, and the thumbnail of the first video may be the icon 411 of the local video.
  • the second operation may be an operation of clicking the icon 411; the first interface may be the user interface 404 shown in Figure 4D; the first image may be a frame of image in the local video, for example, the first image may be as shown in Figure 4D
  • the first object may be character 3; the second image may be the image displayed in the window of user interface 404 shown in FIG. 4H, and the second mark may be character 2 in the above image.
  • the second object can be the aforementioned character 2
  • the fifth operation can be an input operation of clicking the selection box 422
  • the first sub-video can be the aforementioned close-up video centered on the character 3
  • the second sub-video can be It is the aforementioned close-up video centered on character 2.
  • Figure 15 is a schematic diagram of the hardware structure of the terminal 100 provided by the embodiment of the present application.
  • the terminal 100 may include a processor 119, an external memory interface 120, an internal memory 129, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 149, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 199, indicator 198, camera 193, display screen 194, and user Identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the terminal 100.
  • the terminal 100 may include more or fewer components than shown in the figures, or some components may be combined, or some components may be separated, or may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 119 may include one or more processing units.
  • the processor 119 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor a graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 119 may also be provided with a memory for storing instructions and data.
  • the memory in processor 119 is cache memory. This memory may hold instructions or data that have been recently used or recycled by the processor 119 . If the processor 119 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 119 is reduced, thus improving the efficiency of the system.
  • processor 119 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the interface connection relationships between the modules illustrated in the embodiment of the present invention are only schematic illustrations and do not constitute a structural limitation on the terminal 100 .
  • the terminal 100 may also adopt the above implementation.
  • the charging management module 140 is used to receive charging input from the charger. While the charging management module 140 charges the battery 149, it can also provide power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 149, the charging management module 140 and the processor 119.
  • the power management module 141 receives input from the battery 149 and/or the charging management module 140 and supplies power to the processor 119, internal memory 129, display screen 194, camera 193, wireless communication module 160, etc.
  • the wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellite system. (global navigation satellite system, GNSS), frequency modulation (FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 119 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 119, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the terminal 100 implements the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 119 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD).
  • the display panel can also use organic light-emitting diode (OLED), active matrix organic light-emitting diode or active matrix organic light-emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode ( Manufacturing of flex light-emitting diodes (FLED), miniled, microled, micro-oled, quantum dot light emitting diodes (QLED), etc.
  • the electronic device may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the terminal 100 tracks the protagonist and determines the close-up image of the protagonist, and displays the figures shown in Figures 1A-1M, 2A-2D, 3A-3C, 4A-4H, and 5A-5E.
  • the ability to display the user interface depends on the display functions provided by the above-mentioned GPU, display screen 194, and application processor.
  • the terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, you open the shutter and light is transmitted through the lens. Passed to the camera photosensitive element, the optical signal is converted into an electrical signal. The camera photosensitive element passes the electrical signal to the ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the terminal 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Terminal 100 may support one or more video codecs.
  • the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • the NPU can realize intelligent cognitive applications of the terminal 100, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the terminal 100 collects original images through the ISP and the shooting capabilities provided by the camera 193, and uses the video codec and the image computing and processing capabilities provided by the GPU to perform computing processing such as tracking the protagonist and determining the close-up image of the protagonist.
  • the terminal 100 can implement neural network algorithms such as face recognition, human body recognition, and re-identification (ReID) through the computing processing capabilities provided by the NPU.
  • ReID re-identification
  • Internal memory 129 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).
  • RAM random access memories
  • NVM non-volatile memories
  • Random access memory can include static random-access memory (SRAM), dynamic random-access memory (DRAM), synchronous dynamic random-access memory (SDRAM), double data rate synchronous Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as the fifth generation DDR SDRAM is generally called DDR5SDRAM), etc.
  • SRAM static random-access memory
  • DRAM dynamic random-access memory
  • SDRAM synchronous dynamic random-access memory
  • DDR SDRAM double data rate synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • DDR5SDRAM double data rate synchronous dynamic random access memory
  • Non-volatile memory can include disk storage devices and flash memory. Flash memory can be divided according to the operating principle to include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. According to the storage unit potential level, it can include single-level storage cells (single-level cell, SLC), multi-level storage cells (multi-level cell, MLC), third-level storage unit (triple-level cell, TLC), fourth-level storage unit (quad-level cell, QLC), etc., which can include universal flash storage (English: universal flash storage, UFS) according to storage specifications. , embedded multi media card (embedded multi media Card, eMMC), etc.
  • SLC single-level storage cells
  • multi-level storage cells multi-level cell, MLC
  • third-level storage unit triple-level cell, TLC
  • QLC quad-level cell
  • UFS universal flash storage
  • embedded multi media card embedded multi media Card
  • the random access memory can be directly read and written by the processor 119 and can be used to store the operating system or other ongoing
  • the executable program (such as machine instructions) of a running program can also be used to store user and application data, etc.
  • the non-volatile memory can also store executable programs and user and application data, etc., and can be loaded into the random access memory in advance for direct reading and writing by the processor 119.
  • the code for implementing the video editing method described in the embodiment of the present application may be stored in a non-volatile memory.
  • the terminal 100 may load the executable code stored in the non-volatile memory into the random access memory.
  • the external memory interface 120 can be used to connect an external non-volatile memory to expand the storage capability of the terminal 100 .
  • the external non-volatile memory communicates with the processor 119 through the external memory interface 120 to implement the data storage function.
  • the terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals.
  • Speaker 170A also called “speaker”
  • Receiver 170B also called “earpiece”
  • Receiver 170B is used to convert audio electrical signals into sound signals.
  • the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the headphone interface 170D is used to connect wired headphones.
  • the terminal 100 can simultaneously enable the microphone 170C to collect sound signals, and convert the sound signals into electrical signals to store them. In this way, users can get videos with sound.
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the gyro sensor 180B may be used to determine the movement posture of the terminal 100 .
  • the angular velocity of terminal 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the angle at which the terminal 100 shakes, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shake of the terminal 100 through reverse movement to achieve anti-shake.
  • Air pressure sensor 180C is used to measure air pressure.
  • the terminal 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • Magnetic sensor 180D includes a Hall sensor.
  • the terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip cover.
  • the acceleration sensor 180E can detect the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected.
  • Distance sensor 180F for measuring distance.
  • the terminal 100 can measure distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 can use the distance sensor 180F to measure distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the terminal 100 emits infrared light through a light emitting diode.
  • the terminal 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100 . When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100 .
  • the ambient light sensor 180L is used to sense ambient light brightness. The terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • ambient light Sensor 180L can also be used to automatically adjust white balance when taking pictures.
  • Fingerprint sensor 180H is used to collect fingerprints.
  • the terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photography, fingerprint answering incoming calls, etc.
  • Temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 uses the temperature detected by the temperature sensor 180J to execute the temperature processing policy.
  • Touch sensor 180K also known as "touch device”.
  • the touch sensor 180K can be disposed on the display screen 194.
  • the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the terminal 100 in a position different from that of the display screen 194 .
  • the terminal 100 can use the touch sensor 180K to detect the user's click, slide and other operations on the display screen 194 to implement FIGS. 1A-1M, 2A-2D, 4A-4H, and FIG. 5A - The video editing method shown in Figure 5E.
  • Bone conduction sensor 180M can acquire vibration signals.
  • the buttons 190 include a power button, a volume button, etc.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.
  • the motor 199 can generate vibration prompts.
  • the motor 199 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • the indicator 198 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the terminal 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • UI user interface
  • the term "user interface (UI)” in the description, claims and drawings of this application is a media interface for interaction and information exchange between an application or operating system and a user, which implements the internal form of information. Conversion to and from a user-acceptable form.
  • the user interface of an application is source code written in specific computer languages such as Java and extensible markup language (XML).
  • XML Java and extensible markup language
  • the interface source code is parsed and rendered on the terminal device, and finally presented as content that the user can recognize.
  • Control also called widget, is the basic element of user interface. Typical controls include toolbar, menu bar, text box, button, and scroll bar. (scrollbar), images and text.
  • the properties and contents of controls in the interface are defined through tags or nodes.
  • XML specifies the controls contained in the interface through nodes such as ⁇ Textview>, ⁇ ImgView>, and ⁇ VideoView>.
  • a node corresponds to a control or property in the interface. After parsing and rendering, the node is rendered into user-visible content.
  • applications such as hybrid applications, often include web pages in their interfaces.
  • a web page also known as a page, can be understood as a special control embedded in an application interface.
  • a web page is source code written in a specific computer language, such as hypertext markup language (GTML), cascading styles Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc.
  • web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with functions similar to the browser.
  • the specific content contained in the web page is also defined through tags or nodes in the web page source code.
  • GTML defines the elements and attributes of the web page through ⁇ p>, ⁇ img>, ⁇ video>, and ⁇ canvas>.
  • GUI graphical user interface
  • GUI refers to a user interface related to computer operations that is displayed graphically. It can be an image displayed on the display of an electronic device Interface elements such as icons, windows, and controls, where controls can include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, and widgets.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Studio Devices (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Provided in the present application is a video editing method. The method can be applied to terminal devices such as a mobile phone and a tablet computer. By implementing the method, a terminal device can crop a video while recording the video, generate a close-up video centred on a protagonist in the video, and store the close-up video. In addition, the terminal device can also crop a locally stored video, and also generate and store a close-up video centred on a protagonist in the video. In this way, during a video recording process or for a recorded video, a user can set a protagonist for photographing and obtain a close-up video centred on a protagonist in the video, so as to meet personalized usage requirements of the user.

Description

一种视频编辑方法和电子设备A video editing method and electronic device
本申请要求于2022年05月30日提交中国专利局、申请号为202210603653.3、申请名称为“一种视频编辑方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on May 30, 2022, with the application number 202210603653.3 and the application name "A video editing method and electronic device", the entire content of which is incorporated by reference into this application middle.
技术领域Technical field
本申请涉及终端领域,尤其涉及一种视频编辑方法和电子设备。The present application relates to the field of terminals, and in particular, to a video editing method and electronic device.
背景技术Background technique
现在,手机等支持拍摄视频的终端设备可以实现自动追踪的拍摄模式。在录制视频时,终端设备可以接收用户选中的主角。然后,终端设备可以在后续录制视频的过程中,始终跟随该主角,并得到视频中心始终为上述被选中主角的特写视频。Now, mobile phones and other terminal devices that support video shooting can implement automatic tracking shooting mode. When recording a video, the terminal device can receive the protagonist selected by the user. Then, the terminal device can always follow the protagonist during the subsequent video recording process, and obtain a close-up video in which the center of the video is always the selected protagonist.
发明内容Contents of the invention
本申请提供了视频编辑方法和电子设备,针对摄像头当前采集的图像或本地视频中的图像,用户可以选定图像中的一个对象为主角,电子设备可以对上述图像中的主角进行自动追踪,录制并保存主角的特写视频。This application provides a video editing method and electronic equipment. For the image currently collected by the camera or the image in the local video, the user can select an object in the image as the protagonist, and the electronic device can automatically track and record the protagonist in the above image. And save the close-up video of the protagonist.
第一方面,本申请提供了一种视频编辑方法,应用于电子设备,其特征在于,所述方法包括:在第一界面中显示第一图像以及第一图像关联的一个或多个标记;第一图像包括一个或多个对象,第一图像关联的一个或多个标记分别与第一图像中的一个或多个对象对应;第一图像为电子设备的摄像头当前采集的图像,或电子设备存储的第一视频中的一帧图像;检测到作用于第一标记的第一操作;响应于第一操作,确定第一对象为主角,获取以主角为中心的特写图像;第一图像关联的一个或多个标记包括第一标记,第一图像中的一个或多个对象包括第一对象,第一标记对应第一对象;基于以主角为中心的特写图像,生成以主角为中心的第二视频。In a first aspect, the present application provides a video editing method applied to electronic devices, characterized in that the method includes: displaying a first image and one or more markers associated with the first image in a first interface; An image includes one or more objects, and one or more markers associated with the first image respectively correspond to one or more objects in the first image; the first image is an image currently collected by the camera of the electronic device, or stored in the electronic device A frame of image in the first video; detecting the first operation acting on the first mark; in response to the first operation, determining the first object as the protagonist, obtaining a close-up image centered on the protagonist; a first image associated or the plurality of marks include a first mark, one or more objects in the first image include the first object, and the first mark corresponds to the first object; based on the close-up image centered on the protagonist, a second video centered on the protagonist is generated .
实施本申请实施例,用户可以选定摄像头采集的图像中的一个对象为主角;在录制摄像头采集的原始视频时,电子设备可以针对摄像头采集的图像序列中的主角进行自动追踪,录制主角的特写视频。电子设备可以显示用户选择的本地视频,用户可以选定本地视频的一帧图像中的一个对象为主角;电子设备可以对本地视频中上述一帧图像及之后的图像中的主角进行自动追踪,录制并保存主角的特写视频。Implementing the embodiments of this application, the user can select an object in the image collected by the camera as the protagonist; when recording the original video collected by the camera, the electronic device can automatically track the protagonist in the image sequence collected by the camera and record a close-up of the protagonist video. The electronic device can display the local video selected by the user, and the user can select an object in one frame of the local video as the protagonist; the electronic device can automatically track and record the protagonist in the above frame of the image and subsequent images in the local video. And save the close-up video of the protagonist.
在一种实现方式中,在确定第一对象为主角之后,还包括:在第一界面中显示第二图像和第二图像关联的一个或多个标记,第二图像包括一个或多个对象,第二图像关联的一个或多个标记分别与第二图像中的一个或多个对象对应;第二图像为电子设备的摄像头采集的第一图像之后的图像,或第一视频中第一图像之后的一帧图像;检测到作用于第二标记的第五操作;响应于第五操作,将主角切换为第二对象,第二图像关联的一个或多个标记包括第二标记,第二图像中的一个或多个对象包括第二对象,第二标记对应第二对象;所述获取以主角为中心的特写图像,包括:根据第一图像至第二图像之间包括第一对象的图像,生成以第一对象为中心的特写图像,根据第二图像及其之后的图像,生成以第二对 象为中心的特写图像;第二视频包括第一子视频和第二子视频,第一子视频是基于以第一对象为中心的特写图像生成的视频,第二子视频是基于以第二对象为中心的特写图像生成的视频。实施本申请实施例,在录制原始视频过程中,在录制、暂停录制或停止录制第一对象的特写视频时,电子设备还可基于摄像头采集的图像确定新主角,例如将主角由第一对象切换为第二对象,进而录制第二对象的特写视频。电子设备显示本地视频时,用户还可以选定本地视频的另一帧图像中的另一个对象为新主角,例如将主角由第一对象切换为第二对象;电子设备可以对本地视频中上述另一帧图像及之后的图像中的主角进行自动追踪,进而录制第二对象的特写视频。本申请中,电子设备可以分别保存第一对象的特写视频和第二对象的特写视频,也可以将第一对象的特写视频和第二对象的特写视频合为一个视频进行保存。In one implementation, after determining that the first object is the protagonist, the method further includes: displaying a second image and one or more markers associated with the second image in the first interface, where the second image includes one or more objects, One or more markers associated with the second image respectively correspond to one or more objects in the second image; the second image is an image after the first image collected by the camera of the electronic device, or after the first image in the first video A frame of image; detecting the fifth operation acting on the second mark; in response to the fifth operation, switching the protagonist to the second object, one or more marks associated with the second image include the second mark, and the second image The one or more objects include a second object, and the second mark corresponds to the second object; the obtaining a close-up image centered on the protagonist includes: generating a A close-up image centered on the first object, based on the second image and subsequent images, generates a second pair of a close-up image centered on the object; the second video includes a first sub-video and a second sub-video, the first sub-video is a video generated based on a close-up image centered on the first object, and the second sub-video is based on a close-up image centered on the second object Video generated for a close-up image of the center. Implementing the embodiments of this application, during the process of recording the original video, when recording, pausing the recording, or stopping recording the close-up video of the first object, the electronic device can also determine a new protagonist based on the image collected by the camera, for example, switch the protagonist from the first object as the second object, and then record a close-up video of the second object. When the electronic device displays the local video, the user can also select another object in another frame of the local video as the new protagonist, for example, switching the protagonist from the first object to the second object; the electronic device can select the other object in the local video. The main character in one frame of image and subsequent images is automatically tracked, and a close-up video of the second object is recorded. In this application, the electronic device can save the close-up video of the first object and the close-up video of the second object respectively, or it can combine the close-up video of the first object and the close-up video of the second object into one video and save them.
在一种实现方式中,所述获取以主角为中心的特写图像,具体为:根据第一视频中的第一图像至最后一帧图像中包括第一对象的图像,生成以第一对象为中心的特写图像。实施本申请实施例,针对本地视频,可以录制仅以一个对象为主角的特写视频。In one implementation, the step of obtaining a close-up image centered on the protagonist is specifically: based on the image including the first object from the first image to the last frame of the image in the first video, generating a close-up image centered on the first object. close-up image. By implementing the embodiments of this application, for local videos, a close-up video with only one object as the protagonist can be recorded.
在一种实现方式中,当第二图像为第一视频中第一图像之后的一帧图像时,在第一界面显示第一图像以及第一图像关联的一个或多个标记之前,所述方法还包括:显示第一视频的缩略图;检测到作用于第一视频的缩略图的第二操作;所述在第一界面显示第一图像以及第一图像关联的一个或多个标记,包括:响应于第二操作,在第一界面显示第一视频的第一帧图像,以及与第一帧图像中一个或多个对象对应的一个或多个标记,第一图像为第一帧图像。实施本申请实施例,用户通过电子设备的特定应用(例如图库)中显示的本地视频的缩略图,可以触发电子设备显示用于播放本地视频的第一界面;在第一界面显示本地视频的第一帧图像时,无需用户操作,即可以自动显示图像中的各对象对应的标记,以供用户选择主角。In one implementation, when the second image is a frame of image after the first image in the first video, before the first interface displays the first image and one or more markers associated with the first image, the method It also includes: displaying a thumbnail of the first video; detecting a second operation acting on the thumbnail of the first video; displaying the first image and one or more markers associated with the first image on the first interface, including: In response to the second operation, the first frame image of the first video and one or more marks corresponding to one or more objects in the first frame image are displayed on the first interface, and the first image is the first frame image. Implementing the embodiments of this application, the user can trigger the electronic device to display the first interface for playing the local video through the thumbnail of the local video displayed in a specific application of the electronic device (such as the gallery); display the first interface of the local video on the first interface. When an image is framed, the markers corresponding to each object in the image can be automatically displayed without user operation, allowing the user to select the protagonist.
在一种实现方式中,当第二图像为第一视频中第一图像之后的一帧图像时,在第一界面显示第一图像以及第一图像关联的一个或多个标记之前,所述方法还包括:在第一界面显示第一视频的第一帧图像和第一控件;检测到作用于第一控件的第三操作;响应于第三操作,播放第一视频;在第一界面显示第一图像以及第一图像关联的一个或多个标记,包括:当第一视频播放到第M帧图像时,在第一界面显示上述第M帧图像,以及上述第M帧图像关联的一个或多个标记。实施本申请实施例,第一界面显示本地视频中的图像时,用户通过指定操作才能触发电子设备显示图像中的各对象对应的标记。这样,无需对本地视频的每一帧图像进行对象识别,节省了对象识别的功耗。In one implementation, when the second image is a frame of image after the first image in the first video, before the first interface displays the first image and one or more markers associated with the first image, the method It also includes: displaying the first frame image of the first video and the first control on the first interface; detecting a third operation acting on the first control; responding to the third operation, playing the first video; displaying the first video on the first interface. An image and one or more tags associated with the first image include: when the first video is played to the M-th frame image, displaying the above-mentioned M-th frame image on the first interface, and one or more tags associated with the above-mentioned M-th frame image. mark. When implementing the embodiments of this application, when the first interface displays an image in a local video, the user can trigger the electronic device to display marks corresponding to each object in the image through a specified operation. In this way, there is no need to perform object recognition on each frame of the local video, saving the power consumption of object recognition.
在一种实现方式中,所述当第一视频播放到第M帧图像时,在第一界面显示上述第M帧图像,以及上述第M帧图像关联的一个或多个标记,包括:当第一视频播放到第M帧图像时,检测到作用于第一控件的第四操作;响应于第四操作,暂停播放第一视频,显示当前播放到的第M帧图像;响应于暂停播放的操作,在上述第M帧图像上显示上述第M帧图像关联的一个或多个标记。实施本申请实施例,本地视频暂停播放时,针对本地视频当前显示的图像,电子设备才显示图像中的各对象对应的标记。这样,无需对本地视频的每一帧图像进行对象识别,节省了对象识别的功耗。In one implementation, when the first video is played to the Mth frame image, the Mth frame image and one or more markers associated with the Mth frame image are displayed on the first interface, including: when the Mth frame image is played, the Mth frame image is displayed on the first interface. When a video is played to the Mth frame image, a fourth operation acting on the first control is detected; in response to the fourth operation, the playback of the first video is paused, and the Mth frame image currently played is displayed; in response to the operation of pausing playback , displaying one or more markers associated with the M-th frame image on the M-th frame image. When implementing the embodiments of the present application, when the local video is paused, the electronic device only displays the marks corresponding to each object in the image for the image currently displayed in the local video. In this way, there is no need to perform object recognition on each frame of the local video, saving the power consumption of object recognition.
在一种实现方式中,第一界面还包括第二控件,所述基于以主角为中心的特写图像, 生成以主角为中心的第二视频,包括:检测到作用于第二控件的第六操作;响应于第六操作,基于以主角为中心的特写图像,生成以主角为中心的第二视频。实施本申请实施例,录制主角的特写视频的过程中,用户可以通过预设操作控制电子设备停止录制特写视频。In one implementation, the first interface also includes a second control, which is based on a close-up image centered on the protagonist, Generating a second video centered on the protagonist includes: detecting a sixth operation acting on the second control; and generating a second video centered on the protagonist based on a close-up image centered on the protagonist in response to the sixth operation. When implementing the embodiments of this application, during the process of recording a close-up video of the protagonist, the user can control the electronic device to stop recording the close-up video through a preset operation.
在一种实现方式中,当第一图像为电子设备的摄像头当前采集的图像时,第二控件为用于停止录像的控件。实施本申请实施例,录制视频的第一界面包括用于停止录像的控件,录制主角的特写视频的过程中,用户通过上述控件可以控制电子设备停止录制特写视频。In one implementation, when the first image is an image currently collected by a camera of the electronic device, the second control is a control used to stop recording. Implementing the embodiment of the present application, the first interface for recording video includes a control for stopping the recording. During the process of recording the close-up video of the protagonist, the user can control the electronic device to stop recording the close-up video through the above control.
在一种实现方式中,所述方法还包括:响应于第六操作,摄像头停止采集图像,基于摄像头已采集的图像生成并保存原始视频。实施本申请实施例,用户通过预设操作控制电子设备停止录制原始视频时,电子设备也自动停止录制主角的特写视频时。In one implementation, the method further includes: in response to the sixth operation, the camera stops collecting images, and the original video is generated and saved based on the images collected by the camera. When implementing the embodiments of this application, when the user controls the electronic device to stop recording the original video through a preset operation, the electronic device also automatically stops recording the close-up video of the protagonist.
在一种实现方式中,在确定第一对象为主角之后,所述方法还包括:显示第一窗口,在第一窗口中显示以主角为中心的特写图像。实施本申请实施例,录制主角的特写视频时,通过第一窗口,用户可以实时预览主角的特写视频的录制进程。In one implementation, after determining that the first object is the protagonist, the method further includes: displaying a first window, and displaying a close-up image centered on the protagonist in the first window. When implementing the embodiment of the present application, when recording a close-up video of the protagonist, the user can preview the recording process of the close-up video of the protagonist in real time through the first window.
在一种实现方式中,当第一图像为电子设备的摄像头当前采集的图像时,所述方法还包括:检测到第一触发条件,第一触发条件为第一图像之后的连续的Y帧图像中不包括主角;所述基于以主角为中心的特写图像,生成以主角为中心的第二视频,具体为:响应于第一触发条件,基于以主角为中心的特写图像,生成以主角为中心的第二视频。实施本申请实施例,检测到摄像头连续采集的Y帧图像均不包括用户选定的主角时,判断主角脱离摄像头的拍摄范围,电子设备停止录制主角的特写视频。In one implementation, when the first image is an image currently collected by the camera of the electronic device, the method further includes: detecting a first trigger condition, and the first trigger condition is the consecutive Y frame images after the first image. The protagonist is not included in; generating a second video centered on the protagonist based on the close-up image centered on the protagonist is specifically: in response to the first trigger condition, based on the close-up image centered on the protagonist, generating a second video centered on the protagonist second video. When implementing the embodiments of this application, it is detected that the Y-frame images continuously collected by the camera do not include the protagonist selected by the user, it is determined that the protagonist is out of the shooting range of the camera, and the electronic device stops recording the close-up video of the protagonist.
在一种实现方式中,所述根据第一图像至第二图像之间包括第一对象的图像,生成以第一对象为中心的特写图像,包括:从第一图像中获取以第一对象为中心的第一特写图像;从第三图像中获取以第一对象为中心的第三特写图像;第三图像是第一图像之后、第二图像之前的图像;第二视频包括第一特写图像和第二特写图像。实施本申请实施例,可以定位摄像头采集的图像或本地视频的图像中的主角,获取上述图像中的主角的特写图像,进而录制特写图像生成特写视频。In one implementation, generating a close-up image centered on the first object based on an image including the first object between the first image and the second image includes: obtaining the first object as the center from the first image. A first close-up image of the center; a third close-up image centered on the first object is obtained from the third image; the third image is an image after the first image and before the second image; the second video includes the first close-up image and Second close-up image. By implementing the embodiments of the present application, you can locate the protagonist in the image collected by the camera or the local video image, obtain a close-up image of the protagonist in the above image, and then record the close-up image to generate a close-up video.
在一种实现方式中,在从第三图像中获取以第一对象为中心的第三特写图像之前,所述方法还包括:确定第三图像中是否包括第一对象;所述从第三图像中获取以第一对象为中心的第三特写图像,具体为:当第三图像中包括第一对象时,从第三图像中获取以第一对象为中心的第三特写图像。实施本申请实施例,可以定位摄像头采集的图像或本地视频的图像中的主角,裁剪上述图像获取以主角为中心的特写图像。In one implementation, before obtaining the third close-up image centered on the first object from the third image, the method further includes: determining whether the first object is included in the third image; Obtaining a third close-up image centered on the first object, specifically: when the third image includes the first object, acquiring a third close-up image centered on the first object from the third image. By implementing the embodiments of the present application, the protagonist in the image collected by the camera or the local video can be located, and the above image can be cropped to obtain a close-up image centered on the protagonist.
在一种实现方式中,所述确定第三图像中包括第一对象,包括:利用人体检测算法识别第三图像中的人体图像区域;当第三图像中的人体图像区域不重叠时,计算第三图像中的各个人体图像区域与第一图像中主角的人体图像区域的交并比IoU距离;确定IoU距离最小且满足IoU距离阈值的第一人体图像区域;第一人体图像区域对应的对象为主角;当第三图像中的人体图像区域重叠时,计算第三图像中的各个人体图像区域与第一图像中主角的人体图像区域的IoU距离和重定位ReID距离;确定IoU距离与ReID距离的和最小且满足IoU+ReID距离阈值的第一人体图像区域;第一人体图像区域对应的对象为主角。实施本申请实施例,通过图像中各对象的IoU距离和ReID距离,可以准确识别主角的人体图像区域。 In one implementation, determining that the first object is included in the third image includes: using a human body detection algorithm to identify the human body image area in the third image; when the human body image areas in the third image do not overlap, calculating the third The intersection ratio IoU distance of each human body image area in the three images and the human body image area of the protagonist in the first image; determine the first human body image area with the smallest IoU distance and satisfy the IoU distance threshold; the object corresponding to the first human body image area is Protagonist; when the human body image area in the third image overlaps, calculate the IoU distance and relocation ReID distance between each human body image area in the third image and the human body image area of the protagonist in the first image; determine the IoU distance and ReID distance and the first human body image area that is the smallest and satisfies the IoU+ReID distance threshold; the object corresponding to the first human body image area is the protagonist. By implementing the embodiments of this application, the human body image area of the protagonist can be accurately identified through the IoU distance and ReID distance of each object in the image.
在一种实现方式中,所述从第三图像中获取以主角的为中心的第三特写图像,具体包括:基于第一人体图像区域确定包括第一人体图像区域的第三特写图像。实施本申请实施例,识别图像中主角的人体图像区域后,可以基于该人体图像区域对图像进行裁剪,获取裁剪后的主角的特写图像。In one implementation, obtaining the third close-up image centered on the protagonist from the third image specifically includes: determining the third close-up image including the first human body image area based on the first human body image area. By implementing the embodiments of this application, after identifying the human body image area of the protagonist in the image, the image can be cropped based on the human body image area to obtain the cropped close-up image of the protagonist.
在一种实现方式中,所述基于第一人体图像区域确定包括第一人体图像区域的第三特写图像,具体包括:根据第一人体图像区域确定第一缩放比;基于第一缩放比确定第三特写图像的尺寸。实施本申请实施例,缩放比用于反映原始图像中的主角的大小,基于缩放比确定的主角的特写图像的尺寸,可以适应用于显示特写图像的小窗口,从而避免小窗口显示特写图像时发生图像变形的问题。In one implementation, determining a third close-up image including the first human body image area based on the first human body image area specifically includes: determining a first scaling ratio based on the first human body image area; determining a third close-up image based on the first human body image area. Three close-up image sizes. When implementing the embodiments of this application, the scaling ratio is used to reflect the size of the protagonist in the original image. The size of the close-up image of the protagonist determined based on the scaling ratio can be adapted to the small window used to display the close-up image, thereby preventing the small window from displaying the close-up image. Image deformation problem occurs.
在一种实现方式中,所述根据第一人体图像区域确定第一缩放比,具体包括:根据第三图像中的最大的人体图像区域的尺寸和第一人体图像区域的尺寸,确定第一缩放比。实施本申请实施例,可以基于摄像头采集的各图像或本地视频显示的各图像中主角的人体图像区域的尺寸,实时调整缩放比。In one implementation, determining the first scaling ratio based on the first human body image area specifically includes: determining the first scaling ratio based on the size of the largest human body image area in the third image and the size of the first human body image area. Compare. By implementing the embodiments of the present application, the zoom ratio can be adjusted in real time based on the size of the human body image area of the protagonist in each image collected by the camera or each image displayed in the local video.
在一种实现方式中,所述基于第一缩放比确定第三特写图像的尺寸,具体包括:根据第一缩放比、预设的第二视频的尺寸,确定第三特写图像的尺寸。实施本申请实施例,可以基于摄像头采集的各图像或本地视频的各图像对应的缩放比,以及特写视频的尺寸,实时调整从上述各图像中裁剪出的特写图像的尺寸,以保证特写图像可以适应用于显示特写图像的小窗口。In one implementation, determining the size of the third close-up image based on the first scaling ratio specifically includes: determining the size of the third close-up image based on the first scaling ratio and the preset size of the second video. When implementing the embodiments of this application, the size of the close-up image cropped from each of the above images can be adjusted in real time based on the scaling ratio corresponding to each image collected by the camera or the local video, as well as the size of the close-up video, to ensure that the close-up image can be Adapts to a small window used to display close-up images.
在一种实现方式中,第三特写图像的宽高比与预设的第二视频的宽高比相同。实施本申请实施例,保证了特写图像可以适应用于显示特写图像的小窗口,从而避免小窗口显示特写图像时发生图像变形的问题。In one implementation, the aspect ratio of the third close-up image is the same as the preset aspect ratio of the second video. Implementing the embodiments of the present application ensures that the close-up image can adapt to the small window used to display the close-up image, thereby avoiding the problem of image deformation when the small window displays the close-up image.
第二方面,本申请提供了一种电子设备,包括一个或多个处理器和一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述第一方面任一项可能的实现方式中的视频编辑方法。In a second aspect, the present application provides an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to one or more processors. The one or more memories are used to store computer program codes. The computer program codes include computer instructions. When the one or more processors execute the computer instructions, the electronic device causes the electronic device to execute The video editing method in any possible implementation of the first aspect above.
第三方面,本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述第一方面任一项可能的实现方式中的视频编辑方法。In a third aspect, embodiments of the present application provide a computer storage medium that includes computer instructions. When the computer instructions are run on an electronic device, the electronic device causes the electronic device to execute the video editing method in any of the possible implementations of the first aspect. .
第四方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述第一方面任一项可能的实现方式中的视频编辑方法。In a fourth aspect, embodiments of the present application provide a computer program product. When the computer program product is run on a computer, it causes the computer to execute the video editing method in any of the possible implementations of the first aspect.
附图说明Description of the drawings
图1A-图1M、图1O-图1P是本申请实施例提供的一组主角模式拍摄方法的用户界面示意图;1A to 1M and 1O to 1P are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
图1N是本申请实施例提供的一种终端100在拍摄场景下保存已拍摄特写视频的示意图;FIG. 1N is a schematic diagram of a terminal 100 saving a captured close-up video in a shooting scene provided by an embodiment of the present application;
图2A-图2B、图2D-图2E是本申请实施例提供的一组主角模式拍摄方法的用户界面示意图;Figures 2A-2B and 2D-2E are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
图2C是本申请实施例提供的一种终端100在拍摄场景下保存已拍摄特写视频的示意 图;Figure 2C is a schematic diagram of a terminal 100 saving a captured close-up video in a shooting scene provided by an embodiment of the present application. picture;
图3A-图3C是本申请实施例提供的一组主角模式拍摄方法的用户界面示意图;3A-3C are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
图4A-图4H是本申请实施例提供的一组主角模式拍摄方法的用户界面示意图;4A-4H are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
图5A、图5B-1~图5B-4、图5C-图5E是本申请实施例提供的一组主角模式拍摄方法的用户界面示意图;Figures 5A, 5B-1 to 5B-4, and 5C to 5E are schematic user interface diagrams of a set of protagonist mode shooting methods provided by embodiments of the present application;
图6是本申请实施例提供的终端100在拍摄场景下编辑并生成特写视频的流程图;Figure 6 is a flow chart of the terminal 100 editing and generating a close-up video in a shooting scene provided by the embodiment of the present application;
图7A是本申请实施例提供的终端100执行对象识别与标记的流程图;Figure 7A is a flow chart of the terminal 100 performing object recognition and marking provided by the embodiment of the present application;
图7B是本申请实施例提供的终端100确定图像中的人脸图像和人体图像的示意图;Figure 7B is a schematic diagram of the terminal 100 determining the face image and human body image in the image provided by the embodiment of the present application;
图7C是本申请实施例提供的终端100确定选择框的显示位置的示意图;Figure 7C is a schematic diagram of the terminal 100 determining the display position of the selection box provided by the embodiment of the present application;
图8A是本申请实施例提供的终端100确定以主角为中心的特写图像的流程图;Figure 8A is a flow chart for the terminal 100 to determine the close-up image centered on the protagonist provided by the embodiment of the present application;
图8B是本申请实施例提供的终端100确定特写图像的尺寸的流程图;Figure 8B is a flow chart for the terminal 100 to determine the size of the close-up image provided by the embodiment of the present application;
图8C-图8D是本申请实施例提供的终端100对特写图像进行自适应调整以适应窗口展示的示意图;Figures 8C-8D are schematic diagrams of terminal 100 adaptively adjusting close-up images to adapt to window display provided by embodiments of the present application;
图9是本申请实施例提供的终端100定位在后图像帧中主角的流程图;Figure 9 is a flow chart for positioning the protagonist of the terminal 100 in the rear image frame provided by the embodiment of the present application;
图10A是本申请实施例提供的多对象场景下对象不重叠的一帧图像;Figure 10A is a frame of images in which objects do not overlap in a multi-object scene provided by an embodiment of the present application;
图10B是本申请实施例提供的多对象场景下对象重叠的一帧图像;Figure 10B is a frame image of overlapping objects in a multi-object scene provided by an embodiment of the present application;
图10C-图10D是本申请实施例提供的终端100利用IoU位置定位主角的示意图;Figures 10C to 10D are schematic diagrams of the terminal 100 using IoU position to locate the protagonist provided by the embodiment of the present application;
图11是本申请实施例提供的终端100确定图像中主角的ReID距离示意图;Figure 11 is a schematic diagram of the terminal 100 determining the ReID distance of the protagonist in the image provided by the embodiment of the present application;
图12A是本申请实施例提供的另一种终端100定位在后图像帧中主角的流程图;Figure 12A is a flow chart of another terminal 100 locating the protagonist in the rear image frame provided by the embodiment of the present application;
图12B是本申请实施例提供的另一种终端100在拍摄场景下编辑并生成特写视频的流程图;Figure 12B is a flow chart of another terminal 100 editing and generating close-up videos in a shooting scene provided by an embodiment of the present application;
图13是本申请实施例提供的终端100在编辑本地视频的场景下生成特写视频的流程图;Figure 13 is a flow chart for the terminal 100 to generate a close-up video in the scenario of editing local videos provided by the embodiment of the present application;
图14是本申请实施例提供的终端100的系统结构示意图;Figure 14 is a schematic system structure diagram of the terminal 100 provided by the embodiment of the present application;
图15是本申请实施例提供的终端100的硬件结构示意图。Figure 15 is a schematic diagram of the hardware structure of the terminal 100 provided by the embodiment of the present application.
具体实施方式Detailed ways
本申请以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments and are not intended to limit the present application.
在本申请提供的一种实施例中,手机、平板电脑等具备拍摄和图像处理功能的终端设备(记为终端100,后续统一使用终端100指代上述终端设备)可以在多对象场景中,识别图像中的多个对象,并自动追踪用户指定的对象,生成并保存该对象的特写视频。同时,终端100还可保存原始视频。In an embodiment provided by this application, terminal devices (denoted as terminal 100 , and terminal 100 will be used collectively to refer to the above-mentioned terminal devices) such as mobile phones and tablet computers with functions of photographing and image processing can identify objects in a multi-object scene. Multiple objects in the image, and automatically track the user-specified object, generate and save a close-up video of the object. At the same time, the terminal 100 can also save the original video.
其中,原始视频是由摄像头采集的原始图像组成的。特写视频是在原始图像的基础上,以原始图像中的主角为中心裁剪得到的。特写视频即始终以主角为拍摄中心的视频。这样,在选定主角之后,用户既可以拍得以主角为中心的特写视频,又可以同时得到由原始的摄像头采集的原始图像组成的原始视频。Among them, the original video is composed of original images collected by the camera. The close-up video is based on the original image and cropped with the protagonist in the original image as the center. A close-up video is one in which the main character is always the center of the shot. In this way, after selecting the protagonist, the user can not only shoot a close-up video centered on the protagonist, but also obtain the original video composed of the original images collected by the original camera.
进一步的,终端100还可识别本地视频中包括的对象,然后根据用户的选择操作,确 定该视频的主角。在确定主角之后,终端100也可对上述本地视频进行提取主角特写视频的编辑操作,从而得到始终以主角为拍摄中心的特写视频。Further, the terminal 100 can also identify the objects included in the local video, and then determine the objects included in the local video according to the user's selection operation. Determine the protagonist of the video. After determining the protagonist, the terminal 100 may also perform an editing operation on the above-mentioned local video to extract the close-up video of the protagonist, thereby obtaining a close-up video with the protagonist always as the center of the shooting.
不限于手机、平板电脑,终端100还可以是桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备和/或智慧城市设备,本申请实施例对该终端的具体类型不作特殊限制。Not limited to mobile phones and tablet computers, the terminal 100 can also be a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, or a personal digital assistant. (personal digital assistant, PDA), augmented reality (AR) devices, virtual reality (VR) devices, artificial intelligence (artificial intelligence, AI) devices, wearable devices, vehicle-mounted devices, smart home devices and / or smart city equipment, the embodiment of this application does not place special restrictions on the specific type of the terminal.
下面具体介绍终端100实施本申请实施例提供的视频编辑方法的用户示意图。The following is a detailed introduction to the user diagram of the terminal 100 implementing the video editing method provided by the embodiment of the present application.
首先,图1A示例性示出了终端100启用摄像头执行拍摄动作的用户界面101。First, FIG. 1A exemplarily shows the user interface 101 of the terminal 100 enabling the camera to perform a shooting action.
如图1A所示,用户界面101可包括模式栏111、拍摄控件112、预览窗113、回看控件114、转换控件115。As shown in FIG. 1A , the user interface 101 may include a mode bar 111 , a shooting control 112 , a preview window 113 , a review control 114 , and a conversion control 115 .
模式栏111中可显示有多个拍摄模式选项,例如夜景、录像、拍照、人像等拍摄模式。夜景模式可用于光线较暗的场景下拍摄照片,例如夜晚拍摄照片。录像模式可用于录制视频。拍照模式可用于日光场景下拍摄照片。人像模式可用于拍摄人物特写照片。在本申请实施例中,模式栏111还包括主角模式。主角模式对应本申请实施例提供的拍摄方法:在拍摄视频的过程中,确定并自动追踪视频中的主角,保存原始视频和以主角为拍摄中心的主角特写视频。The mode bar 111 may display multiple shooting mode options, such as night scene, video, photo, portrait and other shooting modes. Night scene mode can be used to take photos in dark scenes, such as taking photos at night. Video mode can be used to record videos. Photo mode can be used to take photos in daylight scenes. Portrait mode can be used to take close-up photos of people. In this embodiment of the present application, the mode bar 111 also includes a protagonist mode. The protagonist mode corresponds to the shooting method provided by the embodiment of the present application: during the process of shooting a video, the protagonist in the video is determined and automatically tracked, and the original video and the close-up video of the protagonist with the protagonist as the shooting center are saved.
拍摄控件112可用于接收用户的拍摄操作。在拍照场景下(包括拍照模式、人像模式、夜景模式),上述拍摄操作即作用于拍摄控件112的控制拍照的操作。在录制视频的场景下(录像模式),上述拍摄操作包括作用于拍摄控件112的开始录制的操作。The shooting control 112 may be used to receive a user's shooting operation. In a photographing scene (including photographing mode, portrait mode, and night scene mode), the above-mentioned photographing operation is an operation of controlling photographing that acts on the photographing control 112 . In the scene of recording video (video recording mode), the above-mentioned shooting operation includes an operation of starting recording on the shooting control 112 .
预览窗113可用于实时地显示摄像头采集的图像帧序列。预览窗113中显示图像可称为原始图像。在一些实施例中,预览窗113中显示的是经过下采样处理的图像帧序列。此时,预览窗113中显示的图像对应的未经下采样处理的图像帧序列可称为原始图像。The preview window 113 can be used to display the sequence of image frames collected by the camera in real time. The image displayed in the preview window 113 may be called an original image. In some embodiments, what is displayed in the preview window 113 is a down-sampled sequence of image frames. At this time, the image frame sequence without downsampling processing corresponding to the image displayed in the preview window 113 may be called an original image.
回看控件114可用于查看前一次拍摄的照片或视频。一般的,回看控件114可显示前一次拍摄的照片的缩略图或前一次拍摄的视频的首帧图像的缩略图。The review control 114 can be used to view photos or videos taken previously. Generally, the review control 114 can display a thumbnail of a photo taken previously or a thumbnail of the first frame of a video taken previously.
用户界面101还可包括设置栏116。设置栏116中可显示有多个设置控件。一个设置控件用于设置摄像头的一类参数,从而改变摄像头采集到的图像。例如,设置栏116可显示有光圈1161、闪光灯1162、滤镜1164等设置控件。光圈1161可用于调整摄像头光圈大小,从而改变摄像头采集到的图像的画面亮度;闪光灯1162可用于开启或关闭闪光灯,从而改变摄像头采集到的图像的画面亮度;滤镜1164可用于选择滤镜风格,进而调整图像色彩。设置栏116还可包括更多设置控件1165。更多设置控件1165可用于提供更多的用于调整摄像头拍摄参数或图像优化参数的控件,例如白平衡控件、ISO控件、美颜控件、美体控件等等,从而为用户提供更丰富的拍摄服务。User interface 101 may also include a settings bar 116 . Multiple setting controls may be displayed in the setting bar 116 . A setting control is used to set a type of parameters of the camera, thereby changing the images collected by the camera. For example, the setting bar 116 may display setting controls such as aperture 1161, flash 1162, filter 1164, etc. Aperture 1161 can be used to adjust the camera aperture size, thereby changing the brightness of the image captured by the camera; flash 1162 can be used to turn on or off the flash, thereby changing the brightness of the image captured by the camera; filter 1164 can be used to select a filter style, Then adjust the image color. The settings bar 116 may also include further settings controls 1165 . More setting controls 1165 can be used to provide more controls for adjusting camera shooting parameters or image optimization parameters, such as white balance controls, ISO controls, beauty controls, body beauty controls, etc., thereby providing users with richer shooting services. .
默认的,在启用摄像头拍摄时,终端100可首先选择拍照模式,参考用户界面101。在此过程中,终端100可检测到作用于模式栏111选择主角模式的用户操作,例如图1A所示的点击主角拍摄模式选项的操作,又或者滑动模式栏111选择主角拍摄模式选项的操作等。 响应于上述操作,终端100可确定开启主角模式进行拍摄。By default, when camera shooting is enabled, the terminal 100 can first select the camera mode, refer to the user interface 101 . During this process, the terminal 100 may detect the user operation on the mode bar 111 to select the protagonist mode, such as the operation of clicking the protagonist shooting mode option as shown in FIG. 1A , or the operation of sliding the mode bar 111 to select the protagonist shooting mode option, etc. . In response to the above operation, the terminal 100 may determine to turn on the protagonist mode for shooting.
图1B示例性示出了终端100在主角模式下进行拍摄的用户界面102。FIG. 1B exemplarily shows the user interface 102 of the terminal 100 for shooting in the protagonist mode.
在选择主角模式之后,终端100可对摄像头采集的图像进行图像内容识别(对象识别),识别该图像中包括的对象。上述对象包括但不限于人、动物、植物。本申请实施例后续主要以人物为例进行说明。在终端100在预览窗113中显示摄像头采集的图像的同时,终端100还可在识别到的各个对象上显示选择框。After selecting the protagonist mode, the terminal 100 can perform image content recognition (object recognition) on the image collected by the camera, and identify the objects included in the image. The above-mentioned objects include but are not limited to humans, animals, and plants. The following description of the embodiments of this application will mainly take characters as examples. While the terminal 100 displays the image collected by the camera in the preview window 113, the terminal 100 may also display a selection box on each recognized object.
参考用户界面102,某一时刻摄像头采集的图像中包括人物1、人物2、人物3。终端100在接收到摄像头采集并生成的上述图像之后,在显示上述图像之前,可利用预设的对象识别算法识别图像中包括的对象。这里,对象识别算法可以包括人脸识别算法、人体识别算法。这时,利用上述对象识别算法,终端100可识别到上述图像中包括人物1、人物2、人物3这3个对象。Referring to the user interface 102, the images collected by the camera at a certain moment include Person 1, Person 2, and Person 3. After receiving the above-mentioned image collected and generated by the camera and before displaying the above-mentioned image, the terminal 100 may use a preset object recognition algorithm to identify objects included in the image. Here, the object recognition algorithm may include a face recognition algorithm and a human body recognition algorithm. At this time, using the above object recognition algorithm, the terminal 100 can recognize that the above image includes three objects: Person 1, Person 2, and Person 3.
当然,在一些示例中,不限于上述用户界面102中介绍的人物1、2、3,终端100还支持识别动物、植物类型的对象。相应的,上述对象识别算法还包括针对一种或多种动物的识别算法,以及针对一种或多种植物的识别算法,本申请实施例对此不作限制。Of course, in some examples, the terminal 100 is not limited to the characters 1, 2, and 3 introduced in the above user interface 102, and the terminal 100 also supports the recognition of animals and plant-type objects. Correspondingly, the above-mentioned object recognition algorithm also includes a recognition algorithm for one or more animals, and a recognition algorithm for one or more plants, which are not limited in the embodiments of the present application.
一方面,终端100可在预览窗113中显示上述包括人物1、人物2、人物3的图像。另一方面,在显示上述图像之前,终端100可以确定与上述各个对象对应的选择框。在显示上述图像时,终端100可显示与各个对象对应的选择框,例如对应人物1的选择框121、对应人物2的选择框122、对应人物3的选择框123。这时,用户可以通过上述选择框确认视频主角。On the one hand, the terminal 100 can display the above-mentioned images including Person 1, Person 2, and Person 3 in the preview window 113. On the other hand, before displaying the above-described image, the terminal 100 may determine a selection box corresponding to each of the above-described objects. When displaying the above image, the terminal 100 may display selection boxes corresponding to each object, such as the selection box 121 corresponding to Person 1, the selection box 122 corresponding to Person 2, and the selection box 123 corresponding to Person 3. At this time, the user can confirm the video protagonist through the above selection box.
同时,用户界面102还可显示提示语125,例如“请点击主角人物,开启自动追焦录像”。提示语125提示用户确定视频主角。根据提示语125的提示,用户可点击上述选择框中的任意一个。用户点击操作作用于的选择框对应的对象即用户确定的视频主角。At the same time, the user interface 102 can also display prompts 125, such as "Please click on the protagonist to start automatic focus recording." Prompt 125 prompts the user to determine the protagonist of the video. According to the prompt 125, the user can click any one of the above selection boxes. The object corresponding to the selection box that the user clicks on is the video protagonist determined by the user.
用户界面102(主角模式拍摄界面)还可包括焦距控件126、美颜控件127。焦距控件126可用于设置摄像头的焦距,以调整摄像头的取景范围。摄像头的取景范围变化时,预览窗中显示的图像会相应地变化。美颜控件127可用于调整图像中的人物的人脸图像。当检测作用于美颜控件127的用户操作之后,终端100可对图像中的人物进行美颜处理,并在预览窗中显示美颜处理后的图像。用户界面102还可显示有其他拍摄控件,这里不再一一例举。The user interface 102 (protagonist mode shooting interface) may also include a focus control 126 and a beauty control 127 . The focal length control 126 can be used to set the focal length of the camera to adjust the viewing range of the camera. When the viewing range of the camera changes, the image displayed in the preview window will change accordingly. The beauty control 127 can be used to adjust the face image of the person in the image. After detecting the user operation on the beautification control 127, the terminal 100 can perform beautification processing on the characters in the image, and display the beautified image in the preview window. The user interface 102 may also display other shooting controls, which are not listed here.
在显示图1B所示的用户界面102时,终端100可检测到作用于任一选择框的用户操作。响应于上述操作,终端100可确定上述选择框对应的对象为主角。例如,参考图1C所示的用户界面103,终端100可检测到作用于选择框123的用户操作。响应于上述操作,终端100可确定选择框123对应的人物3为拍摄主角。When displaying the user interface 102 shown in FIG. 1B , the terminal 100 can detect a user operation on any selection box. In response to the above operation, the terminal 100 may determine that the object corresponding to the above selection box is the protagonist. For example, referring to the user interface 103 shown in FIG. 1C , the terminal 100 may detect a user operation acting on the selection box 123 . In response to the above operation, the terminal 100 may determine that the character 3 corresponding to the selection box 123 is the protagonist of the shooting.
随后,终端100可在预览窗113中以画中画的形式显示一个小窗,并在该小窗中显示人物3的特写图像。上述特写图像是指在摄像头采集的原始图像(预览窗中显示的图像)的基础上,以选定的主角为中心进行裁剪,得到的图像。Subsequently, the terminal 100 may display a small window in the preview window 113 in a picture-in-picture format, and display a close-up image of the character 3 in the small window. The above close-up image refers to the image obtained by cropping the selected protagonist as the center based on the original image collected by the camera (the image displayed in the preview window).
图1D示例性示出了终端100显示小窗并在小窗中显示人物3的特写图像的用户界面104。 FIG. 1D exemplarily shows the user interface 104 in which the terminal 100 displays a small window and displays a close-up image of the character 3 in the small window.
如图1D所示,用户界面104的预览窗113中可包括小窗141。此时,小窗141中可显示人物3的特写图像。随着预览窗113中显示的图像的变化,小窗141中显示的图像也会相应的变化。并且,小窗141中显示的始终为以人物3为中心的图像。这样,小窗141中显示的图像所构成的视频即人物3的特写视频。As shown in FIG. 1D , the preview window 113 of the user interface 104 may include a small window 141 . At this time, a close-up image of the character 3 can be displayed in the small window 141 . As the image displayed in the preview window 113 changes, the image displayed in the small window 141 will also change accordingly. Furthermore, the image displayed in the small window 141 is always centered on the person 3. In this way, the video composed of the image displayed in the small window 141 is a close-up video of the character 3 .
可选的,小窗141中显示的特写图像与预览窗113中显示的原始图像还可来自不同的摄像头。例如,小窗141中显示的特写图像可以来自于普通摄像头采集的图像,预览窗113中显示的原始图像可以来自于广角摄像头采集的图像。普通摄像头和广角摄像头可以同时采集图像。普通摄像头和广角摄像头采集的图像在同一时刻是对应。这样,用户可以在预览窗113浏览更大范围的景观,同时,在小窗141中显示细节更多的主角图像。Optionally, the close-up image displayed in the small window 141 and the original image displayed in the preview window 113 may also come from different cameras. For example, the close-up image displayed in the small window 141 may be from an image collected by an ordinary camera, and the original image displayed in the preview window 113 may be from an image collected by a wide-angle camera. Regular cameras and wide-angle cameras can capture images at the same time. The images collected by the ordinary camera and the wide-angle camera correspond at the same time. In this way, the user can browse a larger range of landscape in the preview window 113, and at the same time, a more detailed protagonist image is displayed in the small window 141.
在确定人物3为拍摄主角之后,人物3对应的选择框123可变成图1D中选中框142所示的样子。用户可通过选中框142确定已选中的拍摄主角。不限于用户界面104中所示的选中框142,终端100还可显示其他样式的图标,以表示人物3被选中为主角,以示区分。After character 3 is determined to be the protagonist of the shooting, the selection box 123 corresponding to character 3 can become as shown in the selection box 142 in Figure 1D. The user can determine the selected shooting protagonist by checking the box 142 . Not limited to the check box 142 shown in the user interface 104, the terminal 100 may also display other styles of icons to indicate that the character 3 is selected as the protagonist to show distinction.
可选的,用于展示特写图像的小窗141还可包括关闭控件143和转置控件144。关闭控件143可用于关闭小窗141。转置控件可用于调整小窗141的尺寸。Optionally, the small window 141 for displaying close-up images may also include a close control 143 and a transpose control 144. The close control 143 can be used to close the small window 141. The transpose control can be used to resize the small window 141.
在一些示例中,在根据作用在关闭控件143的用户操作关闭小窗141之后,终端100可取消之前确定的主角(人物3)。然后,终端100可指示用户重新在已识别到的对象中选择拍摄主角。这时,终端100可基于重新确定的主角,再次在预览窗113中显示小窗141。此时,小窗141中显示以新主角中心对原始图像进行处理得到的特写图像。In some examples, after closing the small window 141 according to the user operation acting on the closing control 143, the terminal 100 may cancel the previously determined protagonist (Character 3). Then, the terminal 100 may instruct the user to re-select the shooting protagonist among the recognized objects. At this time, the terminal 100 can display the small window 141 in the preview window 113 again based on the redetermined protagonist. At this time, a close-up image obtained by processing the original image with the center of the new protagonist is displayed in the small window 141 .
在一些示例中,在开始录制视频之后,关闭控件143还可用于暂停录制特写视频。此时,终端100不会取消之前确定主角。在暂停录制之后,关闭控件143可更换为开启控件。当检测到作用于开启控件的用户操作之后,终端100可继续以上述主角为中心,录制特写视频。In some examples, the close control 143 may also be used to pause the recording of the close-up video after starting to record the video. At this time, the terminal 100 will not cancel the previously determined protagonist. After the recording is paused, the off control 143 can be replaced with an on control. After detecting the user operation for opening the control, the terminal 100 can continue to record a close-up video centered on the above-mentioned protagonist.
在另一些示例中,在关闭小窗141后,终端100仅不显示该小窗,即不显示之前确定的主角(人物3)的特写图像,但是,终端100仍然保持之前确定的主角。这时,预览窗113不会被展示主角特写图像的小窗141遮挡住部分。用户可以更好的监测原始视频的图像内容,从而得到质量更高的原始视频。这时,用户可通过点击选中框142的操作,取消已选中的主角人物3,从而重新在已识别到的对象中选择新的主角。In other examples, after closing the small window 141, the terminal 100 only does not display the small window, that is, it does not display the close-up image of the previously determined protagonist (Character 3), but the terminal 100 still maintains the previously determined protagonist. At this time, the preview window 113 will not be partially blocked by the small window 141 that displays the close-up image of the protagonist. Users can better monitor the image content of the original video, resulting in higher quality original video. At this time, the user can cancel the selected protagonist character 3 by clicking on the check box 142, thereby re-selecting a new protagonist among the recognized objects.
可选的,在确定主角之后,终端100可首先生成宽高比9:16的用于展示特写图像的小窗(竖窗),参考图1D中的小窗141。上述宽高比为示例性例举,竖窗的宽高比包括但不限于9:16这一类。当检测到作用于转置控件144的用户操作之后,终端100可将原来的竖窗变更为宽高比16:9的横向窗口(横窗)。当然,终端100也可默认生成横窗,然后,根据用户操作将横窗调整为竖窗,本申请实施例对此不作限制。这样,用户可用过转置控件144调整特写视频的视频内容,以满足自身个性化需求。Optionally, after determining the protagonist, the terminal 100 may first generate a small window (vertical window) with an aspect ratio of 9:16 for displaying a close-up image, refer to the small window 141 in FIG. 1D. The above aspect ratio is an exemplary example, and the aspect ratio of vertical windows includes but is not limited to 9:16. After detecting a user operation on the transpose control 144, the terminal 100 can change the original vertical window into a horizontal window (horizontal window) with an aspect ratio of 16:9. Of course, the terminal 100 can also generate a horizontal window by default, and then adjust the horizontal window to a vertical window according to user operations. This embodiment of the present application does not limit this. In this way, the user can adjust the video content of the close-up video through the transpose control 144 to meet his or her own personalized needs.
可选的,终端100可固定地在屏幕的左下方(或右下方、左上方、右上方)显示展示特写图像的小窗。在一些示例中,上述小窗还可根据预览窗中的主角的位置,调整显示位置,以避免对预览窗中主角的造成遮挡。Optionally, the terminal 100 may permanently display a small window showing a close-up image at the lower left (or lower right, upper left, or upper right) of the screen. In some examples, the above-mentioned small window can also adjust its display position according to the position of the protagonist in the preview window to avoid blocking the protagonist in the preview window.
进一步的,终端100还可根据用户操作调整小窗的位置和大小。在一些示例中,终端100还可检测到作用于小窗141的长按操作和拖动操作,响应于上述操作,终端100可将 小窗移动到用户拖动操作最后停下的位置。在另一些示例中,终端100还可检测到作用于小窗141的双击操作,响应于上述操作,终端100可将放大或缩小小窗141。不限于上述介绍的长按操作、拖动操作以及双击操作,终端100还可通过手势识别和语音识别来控制调整小窗的位置和大小。例如,终端100可通过摄像头采集的图像识别到用户做出了握拳手势,响应于上述握拳手势,终端100可缩小小窗141。终端100可通过摄像头采集的图像识别到用户做出了张手手势,响应于上述张手手势,终端100可放大小窗141。Furthermore, the terminal 100 can also adjust the position and size of the small window according to user operations. In some examples, the terminal 100 may also detect long press operations and drag operations on the small window 141, and in response to the above operations, the terminal 100 may The small window moves to the position where the user's drag operation last stopped. In other examples, the terminal 100 may also detect a double-click operation on the small window 141, and in response to the above operation, the terminal 100 may enlarge or reduce the small window 141. Not limited to the long press operation, drag operation and double-click operation introduced above, the terminal 100 can also control and adjust the position and size of the small window through gesture recognition and voice recognition. For example, the terminal 100 can recognize that the user has made a fist gesture through the image collected by the camera, and in response to the fist gesture, the terminal 100 can reduce the small window 141 . The terminal 100 can recognize that the user has made a hand-opening gesture through the image collected by the camera. In response to the hand-opening gesture, the terminal 100 can enlarge the small window 141 .
在确定主角之后,终端100可检测到开始拍摄的用户操作。在开始拍摄之后,终端100还可检测到结束拍摄的用户操作。响应于上述开始拍摄和结束拍摄的操作,终端100可将上述操作期间摄像头采集的图像帧序列保存为视频。After determining the protagonist, the terminal 100 may detect a user operation to start shooting. After starting shooting, the terminal 100 may also detect a user operation to end shooting. In response to the above-mentioned operations of starting and ending shooting, the terminal 100 may save the sequence of image frames collected by the camera during the above-mentioned operations as a video.
参考图1E所示的用户界面105,终端100可检测到作用于拍摄控件112的用户操作。上述作用于拍摄控件112的用户操作可称为开始拍摄的用户操作。响应于上述开始拍摄的用户操作,终端100可将预览窗113对应的原始图像与小窗141对应的特写图像写入到特定的存储空间中。Referring to the user interface 105 shown in FIG. 1E , the terminal 100 may detect a user operation on the shooting control 112 . The above-mentioned user operation on the shooting control 112 may be called a user operation for starting shooting. In response to the above user operation of starting shooting, the terminal 100 may write the original image corresponding to the preview window 113 and the close-up image corresponding to the small window 141 into a specific storage space.
一方面,终端100可将摄像头采集的原始图像(预览窗113中显示的未经裁剪的图像)写入到特定的存储空间中,从而生成原始视频;另一方面,终端100还可将以主角为中心的特写图像(小窗141中显示的图像)写入到特定的存储空间中,从而生成特写视频。On the one hand, the terminal 100 can write the original image collected by the camera (the uncropped image displayed in the preview window 113) into a specific storage space to generate an original video; on the other hand, the terminal 100 can also use the protagonist to The close-up image (the image displayed in the small window 141) in the center is written into a specific storage space, thereby generating a close-up video.
参考图1F所示的用户界面106,在检测到开始拍摄的用户操作之后,终端100可将拍摄控件112变更为用户界面106中拍摄控件161。拍摄控件161可用于指示当前正处于录制过程中。Referring to the user interface 106 shown in FIG. 1F , after detecting a user operation to start shooting, the terminal 100 may change the shooting control 112 to the shooting control 161 in the user interface 106 . Capture control 161 may be used to indicate that recording is currently in progress.
在开始拍摄之后的某一时刻,用户初始选定的主角可能离开终端100的摄像头的取景范围(即预览窗113中不包括主角)。参考图1G所示的用户界面107,预览窗113中可识别到的对象包括人物1和人物2,但是不包括前述用户选中的主角:人物3。At some point after the shooting starts, the protagonist initially selected by the user may leave the viewing range of the camera of the terminal 100 (that is, the protagonist is not included in the preview window 113). Referring to the user interface 107 shown in FIG. 1G , the identifiable objects in the preview window 113 include character 1 and character 2, but do not include the protagonist selected by the user: character 3.
这时,终端100可显示关闭显示主角特写图像的小窗141。参考图1G,此时预览窗113中不包括小窗141。同时,终端100可显示提示语162,例如“主角丢失,请对准主角拍摄”,以提示用户主角丢失,无法确定到主角的特写图像。At this time, the terminal 100 may display and close the small window 141 displaying the close-up image of the protagonist. Referring to FIG. 1G , the preview window 113 does not include the small window 141 at this time. At the same time, the terminal 100 may display a prompt 162, such as "The protagonist is missing, please aim and shoot at the protagonist" to remind the user that the protagonist is missing and the close-up image of the protagonist cannot be determined.
响应于上述提示语162,用户可调整摄像头位置以使得主角在摄像头的取景范围内,从而使得摄像头可以重新采集到包括主角的图像。参考图1H所示的用户界面108,此时,预览窗113中重新检测到人物3(主角),于是,终端100可重新生成小窗141,并在小窗141中显示当前的以主角为中心的特写图像。In response to the above prompt 162, the user can adjust the camera position so that the protagonist is within the viewing range of the camera, so that the camera can re-capture an image including the protagonist. Referring to the user interface 108 shown in Figure 1H, at this time, character 3 (the protagonist) is re-detected in the preview window 113. Therefore, the terminal 100 can regenerate the small window 141 and display the current protagonist-centered image in the small window 141. close-up image.
在一些实施例中,终端100可也间隔几帧之后再决定是否关闭小窗141。例如,在图1G所示的用户界面107所示的时刻(未检测到主角)之后,终端100可继续检测此帧图像之后的N帧图像,若这N帧图像均不包括主角,则终端100关闭小窗141。在主角消失之后、确认关闭小窗141之前,终端100可以以主角消失之前最后一帧的裁剪区域确定上述期间的小窗141中显示的图像内容。In some embodiments, the terminal 100 may decide whether to close the small window 141 after several frames. For example, after the moment shown in the user interface 107 shown in FIG. 1G (no protagonist is detected), the terminal 100 can continue to detect N frames of images after this frame image. If none of the N frames of images include the protagonist, the terminal 100 Close the small window 141. After the protagonist disappears and before confirming to close the small window 141, the terminal 100 may determine the image content displayed in the small window 141 during the above period based on the cropping area of the last frame before the protagonist disappears.
在录制视频一段时间之后,用户可通过拍摄控件161结束拍摄视频。参考图1I所示的 用户界面109,终端100可检测到作用于拍摄控件161的用户操作,上述作用于拍摄控件161的用户操作可称为结束拍摄的用户操作。After recording the video for a period of time, the user can end recording the video through the shooting control 161 . Referring to Figure 1I In the user interface 109, the terminal 100 can detect a user operation acting on the shooting control 161. The above user operation acting on the shooting control 161 can be called a user operation to end shooting.
响应于结束拍摄的用户操作,终端100可将写入到特定存储空间中的原始图像帧序列封装为一个视频,即原始视频。同时,终端100还可写入到特定存储空间中的特写图像帧序列封装为一个视频,即特写视频。In response to a user operation to end shooting, the terminal 100 may encapsulate the original image frame sequence written into a specific storage space into a video, that is, an original video. At the same time, the terminal 100 can also encapsulate the close-up image frame sequence written into a specific storage space into a video, that is, a close-up video.
在结束拍摄之后,终端100可显示图1J所示的用户界面110。After ending the shooting, the terminal 100 may display the user interface 110 shown in FIG. 1J.
如用户界面110所示,在结束拍摄之后,终端100可将拍摄控件161变更为拍摄控件112,以指示用户已结束视频录制。同时,终端100可在回看控件114中显示表征上述原始视频和特写视频的标识。一般的,上述标识可以为上述原始视频的第一帧图像的缩略图,或者上述特写视频的第一帧视频的缩略图。As shown in the user interface 110, after ending the shooting, the terminal 100 can change the shooting control 161 to the shooting control 112 to indicate to the user that the video recording has ended. At the same time, the terminal 100 may display logos representing the original video and the close-up video in the review control 114 . Generally, the above-mentioned identification may be a thumbnail of the first frame of the above-mentioned original video, or a thumbnail of the first frame of the above-mentioned close-up video.
用户可通过回看控件114浏览已拍摄的视频。这里,完成一次主角模式的视频拍摄之后,终端100可得到两个视频。上述两个视频中一个为上述原始视频,一个为上述特写视频。参考图1J所示的用户界面110,终端100可检测到作用于回看控件114的用户操作。响应于上述操作,终端100可显示上述两个视频,以供用户浏览。The user can browse the captured video through the review control 114 . Here, after completing one video shooting in protagonist mode, the terminal 100 can obtain two videos. One of the above two videos is the above original video and the other is the above close-up video. Referring to the user interface 110 shown in FIG. 1J , the terminal 100 may detect user operations on the lookback control 114 . In response to the above operation, the terminal 100 can display the above two videos for the user to browse.
图1K示例性示出了终端100显示已拍摄的视频的用户界面111。FIG. 1K exemplarily shows the user interface 111 of the terminal 100 displaying the captured video.
用户界面111可包括窗口191。窗口191可用于播放已拍摄的视频。可选的,终端100可首先在窗口191中播放前述主角模式下基于预览窗113拍摄得到的原始视频。同时,终端100可显示提示语192。提示语192例如“左滑浏览主角特写视频”。通过上述提示语,用户可执行左滑操作,从而获取特写视频。User interface 111 may include window 191 . Window 191 can be used to play captured video. Optionally, the terminal 100 may first play the original video captured based on the preview window 113 in the protagonist mode in the window 191 . At the same time, the terminal 100 may display the prompt 192. Prompt 192 such as "Swipe left to browse the close-up video of the protagonist." Through the above prompts, users can perform a left swipe operation to obtain a close-up video.
如用户界面111所示,终端可检测到左滑操作。响应于上述左滑操作,终端100可播放在主角模式下拍摄的以主角为中心的特写视频。参考图1L所示的用户界面112和图1M所示的用户界面113,此时,窗口191中可播放基于小窗141拍摄得到的特写视频。As shown in user interface 111, the terminal can detect a left sliding operation. In response to the above left swipe operation, the terminal 100 may play a close-up video centered on the protagonist shot in the protagonist mode. Referring to the user interface 112 shown in FIG. 1L and the user interface 113 shown in FIG. 1M, at this time, the close-up video shot based on the small window 141 can be played in the window 191.
在一些示例中,终端100可将以主角为中心的特写视频封装为一个特写视频。例如,参考图1G所示的用户界面107,在拍摄视频的过程中,初始选定的主角可能消失在终端100的取景范围内,一段时间后,初始选定的主角又可能重新出现在终端100的取景范围内。这时,以主角为中心的特写视频存在中断。优选的,终端100也可忽略上述中断,将全部特写图像封装为一个特写视频。In some examples, the terminal 100 may encapsulate the close-up video centered on the protagonist as one close-up video. For example, referring to the user interface 107 shown in FIG. 1G , during the video shooting process, the initially selected protagonist may disappear within the viewing range of the terminal 100 , and after a period of time, the initially selected protagonist may reappear in the terminal 100 within the viewing range. At this point, there is an interruption in the close-up video centered on the protagonist. Preferably, the terminal 100 can also ignore the above interruption and encapsulate all the close-up images into one close-up video.
具体的,图1N示例性示出了终端100将全部特写图像封装为一个特写视频的示意图。Specifically, FIG. 1N exemplarily shows a schematic diagram in which the terminal 100 encapsulates all close-up images into one close-up video.
如图1N所示,T1可表示开始录制视频的时刻,T2可表示结束录制视频的时刻,T3可表示检测到主角丢失的视频(图1G所示的用户界面107),T4可表示主角重新检测到主角的时刻(图1H所示的用户界面108)。T1-T2时间内,摄像头采集的原始图像组成原始视频。T1-T3时间内,基于摄像头采集的原始图像提取的以人物3为中心的特写图像组成特写视频1。T3-T4时间内,基于摄像头采集的原始图像提取的以人物3为中心的特写图像组成特写视频2。在拍摄结束之后,终端100可将上述特写视频1、特写视频2封装为一个特写视频。As shown in Figure 1N, T1 can represent the moment when the video recording starts, T2 can represent the time when the video recording ends, T3 can represent the detection of the lost video of the protagonist (the user interface 107 shown in Figure 1G), and T4 can represent the re-detection of the protagonist. It is time for the protagonist (user interface 108 shown in Figure 1H). During T1-T2, the original images collected by the camera constitute the original video. During T1-T3, the close-up image centered on person 3 extracted based on the original image collected by the camera constitutes close-up video 1. During T3-T4, the close-up image centered on person 3 extracted based on the original image collected by the camera constitutes close-up video 2. After the shooting is completed, the terminal 100 can package the above close-up video 1 and close-up video 2 into one close-up video.
在另一些示例中,终端100也可保存多个特写视频。例如,终端100可将中断之前的主角特写图像封装为一个特写视频1,将中断之后的主角特写图像封装为一个特写视频2。 然后,终端100可分别保存上述特写视频。In other examples, the terminal 100 may also save multiple close-up videos. For example, the terminal 100 may encapsulate the close-up image of the protagonist before the interruption into a close-up video 1, and encapsulate the close-up image of the protagonist after the interruption into a close-up video 2. Then, the terminal 100 can save the above close-up videos respectively.
在一些实施例中,终端100也可通过图1O-图1P所示的方法,开启主角模式。如图1O所示,在录像模式下,终端100可在设置栏116中显示主角模式控件1166。当检测到作用于主角模式控件1166的用户操作时,终端100可开启主角模式,参考图1P。In some embodiments, the terminal 100 can also enable the protagonist mode through the method shown in FIGS. 1O-1P. As shown in FIG. 1O , in the video recording mode, the terminal 100 may display the protagonist mode control 1166 in the setting bar 116 . When a user operation on the protagonist mode control 1166 is detected, the terminal 100 may turn on the protagonist mode, see FIG. 1P.
在一些实施例中,在检测到主角丢失之后,终端100还可确定新的主角,并拍摄以上述新的主角为中心的特写视频。In some embodiments, after detecting that the protagonist is missing, the terminal 100 may also determine a new protagonist and shoot a close-up video centered on the new protagonist.
结合图1G所示的用户界面107,在检测到预览窗113中的全部对象不包括初始选定的主角(人物3)时,终端100可确认检测到主角丢失。这时,终端100可关闭显示主角特写图像的小窗141,并显示提示语162,以指示用户调整摄像头方位,从而重新获取到新的包含主角的图像。In conjunction with the user interface 107 shown in FIG. 1G , when it is detected that all objects in the preview window 113 do not include the initially selected protagonist (Character 3), the terminal 100 may confirm that the loss of the protagonist is detected. At this time, the terminal 100 can close the small window 141 that displays the close-up image of the protagonist, and display a prompt 162 to instruct the user to adjust the camera position, thereby reacquiring a new image containing the protagonist.
在本申请实施例中,参考图2A所示的用户界面201,用户还可以选择人物2作为主角。示例性的,终端100可检测到作用于选择框122的用户操作。响应于上述操作,终端100可确定新的主角:人物2。In this embodiment of the present application, referring to the user interface 201 shown in FIG. 2A , the user can also select character 2 as the protagonist. For example, the terminal 100 may detect a user operation on the selection box 122 . In response to the above operation, the terminal 100 may determine a new protagonist: character 2.
其中,在切换主角的过程中,小窗141可以直接显示切换后的人物2的特写图像,呈现跳跃式的显示效果。可选的,小窗141还可通过平滑策略实现非跳跃式的主角切换显示效果。例如,在将主角切换为人物2之后,终端100可根据预览窗113中人物3到人物2的路径,确定一组平滑移动的图像帧,然后在小窗141中显示上述图像帧,以实现非跳跃式的主角切换显示。例如,终端100还可使用固定的过渡效果,连接切换前后的主角的特写图像。上述固定的过渡效果例如视频编辑中常用的叠加、旋涡、平移等等。本申请实施例对此不作限定。Among them, during the process of switching the protagonist, the small window 141 can directly display the close-up image of the switched character 2, presenting a jumping display effect. Optionally, the small window 141 can also achieve a non-jumping protagonist switching display effect through a smoothing strategy. For example, after switching the protagonist to character 2, the terminal 100 can determine a set of smoothly moving image frames based on the path from character 3 to character 2 in the preview window 113, and then display the above image frames in the small window 141 to achieve non-stop operation. Jumping protagonist switching display. For example, the terminal 100 can also use a fixed transition effect to connect the close-up images of the protagonist before and after switching. The above-mentioned fixed transition effects include overlay, vortex, translation, etc. commonly used in video editing. The embodiments of the present application do not limit this.
同样的,结合图1B中介绍的焦距控件126。在使用焦距控件126切换当前焦距(或者摄像头)时,小窗141中显示的特写图像的切换效果也可参考上述角色切换时的切换效果。例如,在检测到将当期焦距从1倍焦距(1×)变更为2倍焦距(2×)时,可选的,终端100可直接在小窗141中显示基于2×的原始图像得到的特写图像;可选的,终端100还可基于1×和2×的原始图像确定一组具有渐变过渡效果的图像帧,从而实现小窗141中非跳跃式的焦距切换显示效果;可选的,终端100也可使用叠加、旋涡、平移等固定的过渡效果实现小窗141中非跳跃式的焦距切换显示效果,这里不再赘述。Likewise, in conjunction with the focus control 126 introduced in Figure 1B. When the focus control 126 is used to switch the current focus (or camera), the switching effect of the close-up image displayed in the small window 141 may also refer to the switching effect when switching characters mentioned above. For example, when it is detected that the current focal length is changed from 1 times the focal length (1×) to 2 times the focal length (2×), optionally, the terminal 100 can directly display a close-up based on the 2× original image in the small window 141 image; optionally, the terminal 100 can also determine a set of image frames with a gradient transition effect based on the 1× and 2× original images, thereby achieving a non-jumping focus switching display effect in the small window 141; optionally, the terminal 100 can also use fixed transition effects such as superposition, vortex, and translation to achieve a non-jumping focus switching display effect in the small window 141, which will not be described again here.
参考图2B所示的用户界面202,在确定新的主角人物2之后,终端100可重新生成小窗141,并在小窗141中显示以新的主角为中心的特写图像。然后,终端100可以持续地追踪人物2,并在小窗141中实时地显示人物2的特写图像。Referring to the user interface 202 shown in FIG. 2B , after determining the new protagonist character 2 , the terminal 100 can regenerate the small window 141 and display a close-up image centered on the new protagonist in the small window 141 . Then, the terminal 100 can continuously track the person 2 and display a close-up image of the person 2 in the small window 141 in real time.
这时,在结束拍摄之后,终端100生成的特写视频是包括多个主角的特写视频。At this time, after the shooting is completed, the close-up video generated by the terminal 100 is a close-up video including a plurality of protagonists.
参考图2C所示的示意图,T3时刻是检测到初始选定的主角(人物3)丢失的时刻。T5时刻是检测到用户选定新的主角(人物3)的时刻。这时,T1-T3内,特写视频1是以初始选定的主角(人物3)为中心的特写视频;T5-T2内,特写视频2是以重新选定的主角(人物2)为中心的特写视频。Referring to the schematic diagram shown in Figure 2C, time T3 is the time when the loss of the initially selected protagonist (Character 3) is detected. Time T5 is the time when it is detected that the user selects a new protagonist (Character 3). At this time, within T1-T3, close-up video 1 is a close-up video centered on the initially selected protagonist (Character 3); within T5-T2, close-up video 2 is centered on the re-selected protagonist (Character 2) Close-up video.
在本申请实施例中,终端100可将上述特写视频1、特写视频2合并,封装为一个视 频。然后,终端100可播放上述合并后的特写视频,以供用户浏览。参考图2D-图2E所示的用户界面204、205,窗口191中可播放上述合并后的特写视频。在用户界面204中,前一段的特写视频(即特写视频1)的主角为人物3。在用户界面205中,后一段的特写视频(即特写视频2)的主角为人物2。In this embodiment of the present application, the terminal 100 can combine the above close-up video 1 and the close-up video 2 and package them into one video. frequency. Then, the terminal 100 can play the combined close-up video for the user to browse. Referring to the user interfaces 204 and 205 shown in Figures 2D to 2E, the above merged close-up video can be played in the window 191. In the user interface 204, the protagonist of the previous close-up video (that is, close-up video 1) is character 3. In the user interface 205, the protagonist of the latter close-up video (that is, close-up video 2) is character 2.
在一些实施例中,终端100也可先检测到作用于拍摄控件112的开始拍摄的用户操作,开始录制视频。在录制视频的过程中,终端100可实时地检测图像包含的对象,并显示与各个对象对应的选择框。在检测到用户点击某一选择框的用户操作之后,终端100可确定该选择框对应的对象为主角,并显示展示有主角特写图像的小窗,同时,终端100还可录制小窗中的特写图像。在上述方法中,特写视频的视频长度一定小于原始视频。In some embodiments, the terminal 100 may also first detect a user operation to start shooting on the shooting control 112 and start recording the video. During the process of recording a video, the terminal 100 can detect objects contained in the image in real time and display a selection box corresponding to each object. After detecting the user operation of clicking a certain selection box, the terminal 100 can determine that the object corresponding to the selection box is the protagonist, and display a small window showing a close-up image of the protagonist. At the same time, the terminal 100 can also record the close-up in the small window. image. In the above method, the video length of the close-up video must be smaller than the original video.
在结束拍摄之后,用户还可以通过图库应用随时浏览本地视频。上述本地视频包括前述过程中拍摄并保存的原始视频和特写视频。After finishing shooting, users can also browse local videos at any time through the gallery application. The above-mentioned local videos include original videos and close-up videos captured and saved during the aforementioned process.
图3A示例性示出了终端100展示本地保存的视频和/或图片的用户界面301。FIG. 3A exemplarily shows a user interface 301 of the terminal 100 displaying locally saved videos and/or pictures.
如图3A所示,用户界面301可显示多个缩略图图标。一个缩略图图标对应一次拍摄操作得到的视频或图片。示例性的,上述多个缩略图图标可包括图标213。图标213可对应前述图1E-图1I所示的拍摄操作生成的视频。As shown in Figure 3A, user interface 301 may display multiple thumbnail icons. A thumbnail icon corresponds to a video or picture obtained by a shooting operation. For example, the plurality of thumbnail icons may include icon 213 . The icon 213 may correspond to the video generated by the aforementioned shooting operation shown in FIGS. 1E-1I.
终端100可检测到作用于图标213的用户操作。响应于上述操作,终端100可展示前述图1E-图1I所示的拍摄操作拍摄的视频:原始视频和特写视频,参考图3B。The terminal 100 may detect user operations on the icon 213 . In response to the above operations, the terminal 100 may display the videos captured by the aforementioned shooting operations shown in FIGS. 1E-1I: original videos and close-up videos, refer to FIG. 3B.
如图3B所示,图3B所示的用户界面302可包括窗口221。窗口221可用于展示已拍摄的视频:原始视频和特写视频。此时,窗口221可展示视频222和视频223。其中,视频222为主角模式下拍摄得到的原始视频。视频223为主角模式下拍摄得到的特写视频。As shown in FIG. 3B , user interface 302 shown in FIG. 3B may include window 221 . Window 221 can be used to display captured video: original video and close-up video. At this time, the window 221 can display the video 222 and the video 223. Among them, video 222 is the original video shot in the protagonist mode. Video 223 is a close-up video shot in protagonist mode.
在一些示例中,终端100在显示用户界面302时,可同时播放视频222和视频223。这样,用户可以同时浏览原始视频和特写视频。在一些示例中,终端100也可先播放视频222后播放视频223,以便用户浏览。In some examples, when the terminal 100 displays the user interface 302, the video 222 and the video 223 can be played simultaneously. This way, users can browse the original video and the close-up video at the same time. In some examples, the terminal 100 may also play the video 222 first and then the video 223 to facilitate user browsing.
在用户界面302的基础上,终端100可检测到作用于视频222或视频223的用户操作,例如点击操作。以视频222为例,在检测到作用于视频222的点击操作之后,终端100可显示图1K所示的用户界面111,进一步展示原始视频。对应的,在检测到作用于视频223的点击操作之后,终端100可显示图1L所示的用户界面112,进一步展示特写视频。Based on the user interface 302, the terminal 100 may detect a user operation, such as a click operation, acting on the video 222 or the video 223. Taking video 222 as an example, after detecting a click operation on video 222, the terminal 100 may display the user interface 111 shown in FIG. 1K to further display the original video. Correspondingly, after detecting a click operation on video 223, the terminal 100 may display the user interface 112 shown in FIG. 1L to further display the close-up video.
可选的,在图3A所示的用户界面301的基础上,在检测到作用于图标213的用户操作之后,终端100也可直接显示图1K所示的用户界面111,展示原始视频。然后,在检测到左滑操作之后,终端100可显示图1L所示的用户界面112,展示特写视频。Optionally, based on the user interface 301 shown in Figure 3A, after detecting the user operation on the icon 213, the terminal 100 can also directly display the user interface 111 shown in Figure 1K to display the original video. Then, after detecting the left swipe operation, the terminal 100 may display the user interface 112 shown in FIG. 1L to display the close-up video.
图3C示例性示出了另一种终端100展示本地保存的视频和/或图片的用户界面303。FIG. 3C exemplarily shows another user interface 303 of the terminal 100 displaying locally saved videos and/or pictures.
在用户界面303中,终端100可以显示两个缩略图图标,例如图标231,图标232。这两个缩略图图标分别对应主角模式下拍摄得到的原始视频和特写视频。例如,图标231可对应上述原始视频,图标232可对应上述特写视频。In the user interface 303, the terminal 100 may display two thumbnail icons, such as icon 231 and icon 232. These two thumbnail icons correspond to the original video and close-up video captured in protagonist mode. For example, icon 231 may correspond to the above-mentioned original video, and icon 232 may correspond to the above-mentioned close-up video.
在检测到作用于图标231的用户操作之后,终端100可显示图1K所示的用户界面111,展示原始视频。在检测到作用于图标232的用户操作之后,终端100可显示图1L所示的用 户界面112,展示特写视频。After detecting the user operation on the icon 231, the terminal 100 may display the user interface 111 shown in FIG. 1K to display the original video. After detecting the user operation on the icon 232, the terminal 100 may display the user operation shown in FIG. 1L. User interface 112 displays a close-up video.
同样的,在显示原始视频之后,用户可通过左滑或右滑操作浏览特写视频。在显示特写视频之后,用户可通过右滑或左滑操作浏览原始视频。Similarly, after the original video is displayed, users can browse the close-up video by swiping left or right. After displaying the close-up video, users can browse the original video by swiping right or left.
实施上述视频编辑方法,在多对象视频拍摄场景中,终端100可以自动追踪用户选定的主角在图像中运动轨迹,并生成始终以主角为中心的特写视频。然后,终端100还可同时保存特写视频和原始视频,以供用户浏览和使用,以满足用户更多样化的需求。原始视频可以保留录制过程中摄像头采集到的全部图像内容。特写视频可以集中地展示用户选定的主角的视频内容。其中,在录制视频的过程中,终端100还可根据用户的操作实时地更换主角,以满足用户变更拍摄主角的需求,进一步提升用户使用体验。Implementing the above video editing method, in a multi-object video shooting scene, the terminal 100 can automatically track the movement trajectory of the protagonist selected by the user in the image, and generate a close-up video that is always centered on the protagonist. Then, the terminal 100 can also save the close-up video and the original video at the same time for the user to browse and use, so as to meet the user's more diverse needs. The original video can retain all the image content captured by the camera during the recording process. Close-up videos can focus on displaying the video content of the protagonist selected by the user. Among them, during the process of recording the video, the terminal 100 can also change the protagonist in real time according to the user's operation, so as to meet the user's need to change the shooting protagonist and further enhance the user experience.
不限于在拍摄视频的过程中生成并保存以主角为中心的特写视频。终端100还可对已拍摄的本地视频进行对象识别和主角跟踪。基于跟踪到的每一帧中的主角,终端100可对上述本地视频进行裁剪、组合以及封装等编辑操作,从而得到以主角为中心的特写视频。Not limited to generating and saving close-up videos centered on the protagonist during the video shooting process. The terminal 100 can also perform object recognition and protagonist tracking on the local video that has been captured. Based on the tracked protagonist in each frame, the terminal 100 can perform editing operations such as cropping, combining, and packaging on the above-mentioned local video, thereby obtaining a close-up video centered on the protagonist.
图4A-图4F示出了终端100编辑本地视频得到以主角为中心的特写视频的一组用户界面。首先,图4A示例性示出了终端100展示本地保存的视频和/或图片的用户界面401。4A to 4F show a set of user interfaces for the terminal 100 to edit a local video to obtain a close-up video centered on the protagonist. First, FIG. 4A exemplarily shows a user interface 401 of the terminal 100 displaying locally saved videos and/or pictures.
用户界面401可显示多个对应本地保存的视频和/或图片的缩略图图标,例如图标411。图标411对应终端100上存储的一个本地视频。The user interface 401 may display a plurality of thumbnail icons corresponding to locally saved videos and/or pictures, such as icon 411. The icon 411 corresponds to a local video stored on the terminal 100.
终端100可检测到作用于图标411的用户操作。响应于上述操作,终端100可展示上述本地视频。参考图4B所示的用户界面402,用户界面402可包括窗口412。窗口412可用于展示本地存储的视频和/或图片。这时,终端100可在窗口412中播放上述图标411对应的本地视频。The terminal 100 may detect a user operation on the icon 411. In response to the above operation, the terminal 100 may display the above local video. Referring to user interface 402 shown in FIG. 4B , user interface 402 may include window 412 . Window 412 may be used to display locally stored videos and/or pictures. At this time, the terminal 100 can play the local video corresponding to the above icon 411 in the window 412.
用户界面402还包括菜单栏413。菜单栏413中包括一个或多个用于设置图片或视频的控件,例如分享控件、收藏控件、编辑控件、删除控件等等。菜单栏413还包括用于展示更多设置项的控件414。当检测到作用于控件414的用户操作时,终端100可显示更多设置项。User interface 402 also includes menu bar 413. The menu bar 413 includes one or more controls for setting pictures or videos, such as sharing controls, collection controls, editing controls, deleting controls, and so on. The menu bar 413 also includes controls 414 for displaying more setting items. When a user operation on the control 414 is detected, the terminal 100 may display more setting items.
参考图4C,在检测到作用于控件414的用户操作之后,终端100可显示菜单栏413。菜单栏413可包括更多的设置项,例如“详细信息”、“分类标签”等设置项。“详细信息”可用于显示当前展示的图片或视频的拍摄信息,例如拍摄时间、拍摄地点、摄像头参数等等。“分类标签”可用于设置当前展示的图片或视频标签,以便于用户通过标签快速地获取该图片或视频。Referring to FIG. 4C , after detecting a user operation on control 414 , the terminal 100 may display the menu bar 413 . The menu bar 413 may include more setting items, such as "detailed information", "category tags" and other setting items. "Details" can be used to display the shooting information of the currently displayed picture or video, such as shooting time, shooting location, camera parameters, etc. "Category Tag" can be used to set the tag of the currently displayed image or video, so that users can quickly obtain the image or video through the tag.
在本申请实施例中,菜单栏413还可包括设置项“提取主角”。“提取主角”可用于生成以选定主角为中心的特写视频。如图4C所示,终端100可检测到作用于“提取主角”设置项的用户操作。响应于上述操作,终端100可显示图4D所示的用户界面404。用户界面404可用于确定主角、生成并保存以主角为中心的特写视频。In this embodiment of the present application, the menu bar 413 may also include a setting item “Extract Protagonist”. Extract Protagonist can be used to generate a close-up video centered on a selected protagonist. As shown in FIG. 4C , the terminal 100 may detect a user operation on the “extract protagonist” setting item. In response to the above operation, the terminal 100 may display the user interface 404 shown in FIG. 4D. User interface 404 may be used to determine the protagonist, generate and save a close-up video centered on the protagonist.
如图4D所示,用户界面404可包括窗口420。窗口420可播放上述本地视频,即依次显示上述本地视频的图像帧序列。用户界面404还包括进度条424。进度条424可用于指示播放进度;在视频暂停或播放时,进度条424还可用于切换当前显示的图像帧,即用户 可以通过手动拖动进度条改变播放进度,以切换当前显示的图像帧。As shown in Figure 4D, user interface 404 may include window 420. The window 420 can play the local video, that is, display the image frame sequence of the local video in sequence. User interface 404 also includes progress bar 424. The progress bar 424 can be used to indicate the playback progress; when the video is paused or played, the progress bar 424 can also be used to switch the currently displayed image frame, that is, the user You can change the playback progress by manually dragging the progress bar to switch the currently displayed image frame.
可选的,视频播放和暂停时,窗口420还可显示与当前显示的图像帧中各个对象对应的选择框,例如选择框421、422、423。其中,选择框421对应当前图像帧中的人物1、选择框422对应当前图像帧中的人物2、选择框423对应当前图像帧中的人物3。可选的,视频播放过程中,检测到作用于窗口420的预设操作时,窗口420才显示与当前显示的图像帧中各个对象对应的选择框。例如,上述预设操作为用于暂停视频播放的操作;例如,上述预设操作为拖动进度条以切换图像帧的操作;例如,上述预设操作为作用于当前播放的图像帧的触摸操作、双击操作或长按操作等。Optionally, when the video is played or paused, the window 420 may also display selection boxes corresponding to each object in the currently displayed image frame, such as selection boxes 421, 422, and 423. Among them, the selection box 421 corresponds to the person 1 in the current image frame, the selection box 422 corresponds to the person 2 in the current image frame, and the selection box 423 corresponds to the person 3 in the current image frame. Optionally, during video playback, the window 420 only displays selection boxes corresponding to each object in the currently displayed image frame when a preset operation on the window 420 is detected. For example, the above-mentioned preset operation is an operation for pausing video playback; for example, the above-mentioned preset operation is an operation for dragging the progress bar to switch image frames; for example, the above-mentioned preset operation is a touch operation for the currently playing image frame , double-click operation or long-press operation, etc.
示例性的,参考图4D,在显示上述本地视频的第一帧图像时,终端100可检测到用户选择人物3作为主角的操作,例如点击与人物3对应的选择框423的操作。响应于上述操作,终端100可确定人物3为主角,然后,终端100可依次确定后续图像帧中人物3的位置,并确定以人物3为中心的特写图像的大小。组合各个以人物3为中心的特写图像,终端100可得到人物3的特写视频。For example, referring to FIG. 4D , when displaying the first frame image of the above-mentioned local video, the terminal 100 may detect the user's operation of selecting character 3 as the protagonist, such as the operation of clicking the selection box 423 corresponding to character 3. In response to the above operation, the terminal 100 may determine that the character 3 is the protagonist, and then the terminal 100 may sequentially determine the position of the character 3 in subsequent image frames, and determine the size of the close-up image centered on the character 3. By combining the close-up images centered on the character 3, the terminal 100 can obtain the close-up video of the character 3.
终端100也可在窗口420显示本地视频的第一帧图像之后的任意一帧图像时,检测到确定主角的用户操作。例如,参考图4E所示的用户界面405,当播放到第i帧时,终端100可检测到用户选择人物3(或其他对象)作为主角的操作。可选的,用户选择主角后,窗口420还可自动显示与当前显示的每一图像帧中各个对象对应的选择框。这样,可以便于用户随时可以通过后续图像帧对应的选择框切换主角。可选的,用户选择主角后,检测到作用于窗口420的预设操作时,窗口420才显示与当前显示的图像帧中各个对象对应的选择框。这样,仅在用户意图切换主角时,基于预设操作显示选择框,可以节省对象识别的能耗。The terminal 100 may also detect the user operation of determining the protagonist when the window 420 displays any image frame after the first frame image of the local video. For example, referring to the user interface 405 shown in FIG. 4E, when the i-th frame is played, the terminal 100 may detect the user's operation of selecting character 3 (or other objects) as the protagonist. Optionally, after the user selects the protagonist, the window 420 may also automatically display selection boxes corresponding to each object in each currently displayed image frame. In this way, it is convenient for the user to switch the protagonist at any time through the selection box corresponding to the subsequent image frame. Optionally, after the user selects the protagonist and detects a preset operation on the window 420, the window 420 displays a selection box corresponding to each object in the currently displayed image frame. In this way, the selection box is displayed based on the preset operation only when the user intends to switch the protagonist, which can save the energy consumption of object recognition.
用户界面404还可包括控件425。控件425可用于保存当前生成的特写视频。例如,在检测到作用于控件425的用户操作之后,终端100可将上述人物3的特写视频保存到本地存储空间中。在完成保存操作之后,终端100可显示图4G所示的用户界面407,展示上述被保存的特写视频。此时,用户可随时浏览上述特写视频。User interface 404 may also include controls 425 . Control 425 may be used to save the currently generated close-up video. For example, after detecting the user operation on the control 425, the terminal 100 may save the close-up video of the above-mentioned character 3 to the local storage space. After completing the saving operation, the terminal 100 may display the user interface 407 shown in FIG. 4G to display the above-mentioned saved close-up video. At this time, users can browse the above close-up video at any time.
当然,在编辑本地视频的特写视频时,终端100也可支持用户切换主角,以获取包括多个对象的特写视频。Of course, when editing a close-up video of a local video, the terminal 100 can also support the user to switch the protagonist to obtain a close-up video including multiple objects.
示例性的,在图4D所示的用户界面404(第一帧图像)中,响应于用户点击选择框423的操作,终端100可确定当前主角为人物3。然后,参考图4H所示的用户界面408,终端100显示第一帧图像之后的任意一帧包括至少一个对象的图像(例如第N帧)时,还可以显示所述至少一个对象分别对应的选择框,例如人物2对应的选择框422;终端100可检测到用户点击选择框422(对应人物2)的操作,响应于上述操作,终端100可切换主角为人物2。这时,第M帧图像到第N-1帧图像的主角为人物3,第N帧图像至视频结束的主角为人物2。For example, in the user interface 404 (first frame image) shown in FIG. 4D , in response to the user's operation of clicking the selection box 423 , the terminal 100 may determine that the current protagonist is character 3. Then, referring to the user interface 408 shown in FIG. 4H , when the terminal 100 displays an image including at least one object in any frame after the first frame of the image (for example, the Nth frame), the terminal 100 may also display the selection corresponding to the at least one object. frame, for example, the selection box 422 corresponding to character 2; the terminal 100 can detect the user's operation of clicking the selection box 422 (corresponding to character 2), and in response to the above operation, the terminal 100 can switch the protagonist to character 2. At this time, the protagonist from the M-th frame to the N-1th frame is character 3, and the protagonist from the N-th frame to the end of the video is character 2.
需要说明的是,若在播放本地视频的第M帧图像(例如图4D所示的第一帧图像)时,用户开始选定主角为人物3,在播放本地视频的第N帧图像(例如图4H所示的图像)时,将主角由人物3切换为人物2,然后保存特写视频,则本地视频中的第M帧图像到第N-1 帧图像的主角为人物3,第N帧图像至视频结束的主角为人物2。可选的,终端100保存的特写视频的前半段视频是以人物3为中心的特写视频,是基于本地视频的第M帧图像至第N-1帧图像中包括人物3的图像生成的;上述特写视频的后半段视频是以人物2为中心的特写视频,是基于本地视频的第M帧图像至最后一帧图像中包括人物2的图像生成的。可选的,终端100也可以分别保存两个特写视频,即以人物3为中心的特写视频和以人物2为中心的特写视频。It should be noted that if the user starts to select the protagonist as character 3 when playing the M-th frame image of the local video (for example, the first frame image shown in Figure 4D), and when playing the N-th frame image of the local video (for example, the first frame image shown in Figure 4D (image shown in 4H), switch the protagonist from character 3 to character 2, and then save the close-up video, then the Mth frame image in the local video will be the N-1th The protagonist of the frame image is character 3, and the protagonist from the Nth frame image to the end of the video is character 2. Optionally, the first half of the close-up video saved by the terminal 100 is a close-up video centered on character 3, which is generated based on the image of character 3 in the M-th frame image to the N-1-th frame image of the local video; the above The second half of the close-up video is a close-up video centered on character 2, which is generated based on images including character 2 from the Mth frame to the last frame of the local video. Optionally, the terminal 100 can also save two close-up videos respectively, that is, the close-up video centered on person 3 and the close-up video centered on person 2.
在一些实施例中,图4D所示的编辑主角和特写视频的用户界面还可为图5A所示的样子。In some embodiments, the user interface for editing protagonist and close-up videos shown in Figure 4D may also be as shown in Figure 5A.
如图5A所示,终端100可首先遍历当前展示的本地视频,确定该视频中包括的全部对象。这时,终端100可显示上述全部对象,例如用户界面501中的人物1、人物2、人物3。然后,终端100可检测到作用于上述任一人物的用户操作,确定被选中的人物为主角,然后基于本地视频中包括主角的图像,获取主角的特写图像;进而将上述主角的特写图像组合起来,获取以上述主角为中心的特写视频。As shown in FIG. 5A , the terminal 100 may first traverse the currently displayed local video and determine all objects included in the video. At this time, the terminal 100 can display all the above objects, such as Person 1, Person 2, and Person 3 in the user interface 501. Then, the terminal 100 can detect the user operation on any of the above characters, determine that the selected character is the protagonist, and then obtain a close-up image of the protagonist based on the image of the protagonist in the local video; and then combine the close-up images of the above protagonists , for close-up videos centered on the aforementioned protagonists.
当然,在图5A所示的用户界面501中,用户可也设定多个主角,从而得到包括多个主角的特写视频,或多个主角分别对应的特写视频。Of course, in the user interface 501 shown in FIG. 5A , the user can also set multiple protagonists to obtain a close-up video including multiple protagonists, or a close-up video corresponding to multiple protagonists.
可选的,用户界面501还可包括分割控件511。分割控件511可将窗口420中展示的本地视频分割为多个视频段。示例性的,参考图5B-1所示的用户界面502-1,终端100可检测到作用于分割控件511的用户操作。响应于上述用户操作,终端100可显示图5B-2所示的用户界面502-2。此时,用户可通过对进度条424的分割操作,将本地视频分割为一个或多个视频段。Optionally, the user interface 501 may also include a split control 511. The split control 511 can split the local video displayed in the window 420 into multiple video segments. For example, referring to the user interface 502-1 shown in FIG. 5B-1, the terminal 100 may detect a user operation acting on the split control 511. In response to the above user operation, the terminal 100 may display the user interface 502-2 shown in FIG. 5B-2. At this time, the user can divide the local video into one or more video segments by performing a dividing operation on the progress bar 424 .
例如,如用户界面502-2所示,终端100可检测用户点击进度条424的操作。响应于上述用户操作,终端100可显示图5B-3所示的用户界面502-3。此时,终端100可在进度条424上显示分割框512。然后,用户可通过上述分割框512将本地视频分割为两个视频段。For example, as shown in the user interface 502-2, the terminal 100 may detect the user's operation of clicking the progress bar 424. In response to the above user operation, the terminal 100 may display the user interface 502-3 shown in FIG. 5B-3. At this time, the terminal 100 may display the dividing frame 512 on the progress bar 424. Then, the user can divide the local video into two video segments through the above-mentioned dividing box 512.
参考图5B-4所示的用户界面502-4,终端100可将原本地视频划分为2段。此时,0:00-2:30为一段视频(视频段1);2:30-4:00为一段视频(视频段2)。当前选中的视频段可用黑色表示。进一步的,在选中一个视频段时,用户还可通过分割控件511将该视频段进一步分割为两个视频段。这样,终端100可以将原本地视频分割为多个视频段。Referring to the user interface 502-4 shown in Figure 5B-4, the terminal 100 can divide the original local video into two segments. At this time, 0:00-2:30 is a video (video segment 1); 2:30-4:00 is a video (video segment 2). The currently selected video segment can be represented in black. Furthermore, when a video segment is selected, the user can further divide the video segment into two video segments through the split control 511. In this way, the terminal 100 can divide the original video into multiple video segments.
以用户界面502-4为例,在选中视频段1的场景下,终端100可确定视频段1中包括的全部对象,然后显示出来,例如人物1、2、3。用户可从显示的全部对象中选择任意对象最为主角。例如,终端100可根据检测到的作用于人物3上的用户操作确定人物3为视频段1的主角。Taking user interface 502-4 as an example, in the scenario where video segment 1 is selected, terminal 100 can determine all objects included in video segment 1, and then display them, such as characters 1, 2, and 3. The user can select any object from all displayed objects as the protagonist. For example, the terminal 100 may determine that the character 3 is the protagonist of the video segment 1 based on the detected user operation acting on the character 3 .
然后,用户可更换选中的视频段,确定更换后的视频段中的主角。例如,参考图5C,终端100可检测到用户点击视频段2的操作。响应于上述操作,终端100可显示图5D所示的用户界面504。此时,在用户界面504中,终端100可显示视频段2中包括的全部对象,例如人物1、人物2(视频段2中不包括人物3)。这时,用户可选择人物2作为视频段2的主角。Then, the user can change the selected video segment and determine the protagonist in the changed video segment. For example, referring to FIG. 5C , the terminal 100 may detect the user's operation of clicking on video segment 2. In response to the above operation, the terminal 100 may display the user interface 504 shown in FIG. 5D. At this time, in the user interface 504, the terminal 100 can display all objects included in the video segment 2, such as character 1 and character 2 (the video segment 2 does not include character 3). At this time, the user can select character 2 as the protagonist of video segment 2.
参考图5E,终端100可检测到作用于控件425的操作。响应于上述操作,终端100可 保存基于上述本地视频得到的特写视频。此时,本地视频的0:00-2:30内的主角为人物3,2:30-4:00内的主角为人物2,特写视频的前半段视频是以人物3为中心的特写视频,是基于本地视频的0:00-2:30中包括人物3的图像生成的;上述特写视频的后半段视频是以人物2为中心的特写视频,是基于本地视频的2:30-4:00中包括人物2的图像生成的。类似的,终端100也可以分别保存两个特写视频,即以人物3为中心的特写视频和以人物2为中心的特写视频。Referring to FIG. 5E , terminal 100 may detect operations on control 425 . In response to the above operations, the terminal 100 may Save the close-up video based on the above local video. At this time, the protagonist in the local video from 0:00 to 2:30 is character 3, and the protagonist from 2:30 to 4:00 is character 2. The first half of the close-up video is a close-up video centered on character 3. It is generated based on the image of character 3 from 0:00-2:30 of the local video; the second half of the above close-up video is a close-up video centered on character 2 and is based on 2:30-4 of the local video: 00 includes the image of character 2. Similarly, the terminal 100 can also store two close-up videos respectively, that is, the close-up video centered on character 3 and the close-up video centered on character 2.
实施上述实施例介绍的视频编辑方法,终端100可以对已拍摄的本地视频进行对象识别和主角跟踪,然后,终端100可生成并保存以主角为中心的特写视频。这样,对于终端100上存储的任意视频,用户可以随时随地利用上述方法,获取该视频中的任意对象的特写视频,从而满足用户的个性化编辑需求。By implementing the video editing method introduced in the above embodiment, the terminal 100 can perform object recognition and protagonist tracking on the local video that has been captured, and then the terminal 100 can generate and save a close-up video centered on the protagonist. In this way, for any video stored on the terminal 100, the user can use the above method anytime and anywhere to obtain a close-up video of any object in the video, thereby meeting the user's personalized editing needs.
图6示例性示出了终端100在拍摄的过程中生成主角特写视频的流程图。FIG. 6 exemplarily shows a flow chart for the terminal 100 to generate a close-up video of the protagonist during the shooting process.
S601:检测到作用于第一控件的用户操作。S601: A user operation acting on the first control is detected.
参考图1B所示的用户界面102,实施主角模式对应的视频编辑方法需要实时地识别并标记摄像头采集的图像中的对象(例如人物、动物、植物等)。这需要占用大量的终端100的计算资源。因此,在本申请实施例中,默认的,在开启摄像头时,主角模式是关闭的。Referring to the user interface 102 shown in FIG. 1B , implementing the video editing method corresponding to the protagonist mode requires real-time identification and marking of objects (such as people, animals, plants, etc.) in images collected by the camera. This requires occupying a large amount of computing resources of the terminal 100. Therefore, in the embodiment of the present application, the protagonist mode is turned off by default when the camera is turned on.
终端100可为用户提供开启或关闭主角模式的控件,记为第一控件。当检测到作用于上述第一控件的用户操作之后,终端100可开启主角模式,执行主角模式对应的拍摄算法,例如识别图像中的对象、主角追踪等等。例如,在图1B所示的用户界面102中,模式栏111中的主角模式选项可称为第一控件。当检测到作用于主角模式选项的用户操作之后,终端100可为用户提供图1B-图1I所示的拍摄服务。The terminal 100 can provide the user with a control for turning on or off the protagonist mode, which is recorded as the first control. After detecting a user operation on the first control, the terminal 100 can turn on the protagonist mode and execute the shooting algorithm corresponding to the protagonist mode, such as identifying objects in the image, protagonist tracking, and so on. For example, in the user interface 102 shown in FIG. 1B , the protagonist mode option in the mode bar 111 may be called a first control. After detecting a user operation on the protagonist mode option, the terminal 100 may provide the user with the shooting service shown in FIGS. 1B-1I.
这样,用户可根据自身需求确定是否开启主角模式,进而避免占用终端100计算资源,降低终端100的计算效率,影响用户使用体验。In this way, users can determine whether to turn on the protagonist mode according to their own needs, thereby avoiding occupying the computing resources of the terminal 100, reducing the computing efficiency of the terminal 100, and affecting the user experience.
S602:对摄像头采集的第i帧图像进行对象检测,确定第i帧图像中的包括的对象。S602: Perform object detection on the i-th frame image collected by the camera, and determine the objects included in the i-th frame image.
参考图1B-图1D所示的用户界面,在主角模式下,终端100需要根据用户的选择操作确定主角。这时,终端100需要首先识别摄像头采集到的图像中包括的对象,然后将识别到的对象标记出来。这样,用户可以在上述识别到的对象中选择任意对象作为主角。相应地,终端100也才能根据用户操作确定主角。Referring to the user interfaces shown in FIGS. 1B to 1D , in the protagonist mode, the terminal 100 needs to determine the protagonist according to the user's selection operation. At this time, the terminal 100 needs to first recognize the objects included in the images collected by the camera, and then mark the recognized objects. In this way, the user can select any object as the protagonist among the above-mentioned recognized objects. Correspondingly, the terminal 100 can also determine the protagonist based on user operations.
图7A示例性示出了在开启主角模式之后终端100识别图像中的对象的流程图。FIG. 7A exemplarily shows a flowchart for the terminal 100 to recognize objects in the image after turning on the protagonist mode.
S701:对摄像头采集的第i帧图像进行人脸识别和人体识别,确定第i帧中的人脸图像和人体图像。S701: Perform face recognition and human body recognition on the i-th frame image collected by the camera, and determine the face image and human body image in the i-th frame.
终端100中可阈值有人脸识别算法和人体识别算法。人脸识别算法可用于识别图像中的人脸图像。人体识别算法可用于识别图像中的人体图像,包括人脸、躯体、四肢。The terminal 100 may include a face recognition algorithm and a human body recognition algorithm as thresholds. Face recognition algorithms can be used to identify faces in images. Human body recognition algorithms can be used to identify human body images in images, including faces, bodies, and limbs.
以摄像头采集的第i帧图像为例,终端100可分别执行人脸识别算法和人体识别算法,进而确定第i帧图像中的人脸图像和人体图像。其中,第i帧图像是开启主角模式后摄像头采集的任意一帧图像。 Taking the i-th frame image collected by the camera as an example, the terminal 100 can execute the face recognition algorithm and the human body recognition algorithm respectively, and then determine the face image and human body image in the i-th frame image. Among them, the i-th frame image is any frame image collected by the camera after turning on the protagonist mode.
如图7B所示的第i帧图像,通过人脸识别算法,终端100可确定该帧图像包括人脸face1、face2、face3;通过人体识别算法,终端100可确定该帧图像包括人体body1、body2、body3。As shown in the i-th frame image shown in Figure 7B, through the face recognition algorithm, the terminal 100 can determine that the frame image includes faces face1, face2, and face3; through the human body recognition algorithm, the terminal 100 can determine that the frame image includes human bodies body1, body2 , body3.
S702:对识别到的人脸图像和人体图像进行匹配,确定第i帧图像中包括的对象。S702: Match the recognized face image and human body image to determine the object included in the i-th frame image.
在确定第i帧图像中的人脸图像和人体图像之后,终端100可计算各个人脸图像和人体图像的交并比(intersection over union,IoU),记为IoUface&body。然后,终端100可利用上述IoUface&body将识别到的人脸图像和人体图像进行匹配,确定第i帧图像中包括的对象。After determining the face image and body image in the i-th frame image, the terminal 100 can calculate the intersection over union (IoU) of each face image and body image, which is recorded as IoU face&body . Then, the terminal 100 can use the above-mentioned IoU face&body to match the recognized face image and the human body image, and determine the object included in the i-th frame image.
根据经验可知,图像中两个不重叠人中的任意一个人的人脸与对方的人体的交集为0,而与自身的人体的交集基本上接近于自身人脸。因此,IoUface&body越小越接近0,则该IoUface&body对应的人脸和人体不匹配,即不可视为同一人的人脸和人体。According to experience, the intersection between the face of any one of the two non-overlapping people in the image and the other person's human body is 0, while the intersection with the own human body is basically close to the own face. Therefore, the smaller the IoU face&body is, the closer it is to 0, and the face and human body corresponding to the IoU face& body do not match, that is, they cannot be regarded as the face and human body of the same person.
因此,终端100中可预设有第一阈值M1。当IoUface&body≥M1时,该IoUface&body对应的人脸和人体匹配,反之,则不匹配。匹配的一组人脸图像和人体图像可确定一个对象。这样,终端100可基于已识别到的人脸图像和人体图像确定出第i帧图像中包括的M个对象。Therefore, the first threshold M1 may be preset in the terminal 100 . When IoU face&body ≥M1, the face corresponding to the IoU face&body matches the human body, otherwise, it does not match. A matching set of face images and body images identifies an object. In this way, the terminal 100 can determine the M objects included in the i-th frame image based on the recognized face image and human body image.
具体的,以图7B所示的人脸face1、face2、face3和人体body1、body2、body3为例,终端100可分别计算face1、face2、face3与body1、body2、body3的IoU。以face1为例,face1与body2、body3的IoU的取值均为0,face1与body1的IoU的取值不为0且满足M1,这时,终端100可确定face1与body1可构成一个对象(即人物1)。同样的,终端100可确定face2与body2可构成一个对象(即人物2),face3与body3可构成一个对象(即人物3)。Specifically, taking the human faces face1, face2, and face3 and the human bodies body1, body2, and body3 shown in FIG. 7B as an example, the terminal 100 can calculate the IoU of face1, face2, face3, and body1, body2, and body3 respectively. Taking face1 as an example, the IoU values of face1, body2, and body3 are all 0, and the IoU values of face1 and body1 are not 0 and satisfy M1. At this time, the terminal 100 can determine that face1 and body1 can constitute an object (i.e. Character 1). Similarly, the terminal 100 can determine that face2 and body2 can constitute an object (ie, character 2), and face3 and body3 can constitute an object (ie, character 3).
其中,为提升计算效率,在确定一个对象之后,在后续计算各个人脸图像与人体图像的IoU时,终端100可不再计算上述已构成一个对象的人脸图像与人体图像的IoU。例如,终端100可首先计算face1与所有body(body1、body2、body3)的IoU。这时,终端100可确定与face1匹配的body1,于是,终端100可确定face1与body1构成一个对象。然后,终端100可计算face2与剩余所有body(body2、body3)的IoU。这时,终端100可不再计算face2与body1的IoU,以减少冗余计算,提升计算效率。In order to improve calculation efficiency, after determining an object, when subsequently calculating the IoU of each face image and human body image, the terminal 100 may no longer calculate the IoU of the face image and human body image that have constituted an object. For example, the terminal 100 may first calculate the IoU between face1 and all bodies (body1, body2, and body3). At this time, the terminal 100 can determine body1 that matches face1. Therefore, the terminal 100 can determine that face1 and body1 constitute an object. Then, the terminal 100 can calculate the IoU of face2 and all remaining bodies (body2, body3). At this time, the terminal 100 can no longer calculate the IoU of face2 and body1 to reduce redundant calculations and improve calculation efficiency.
可选的,在S701中,终端100也可直接使用人体检测算法识别第i帧图像中的对象。这时,终端100也无需进行人脸图像和人体图像匹配。Optionally, in S701, the terminal 100 may also directly use the human body detection algorithm to identify the object in the i-th frame image. At this time, the terminal 100 does not need to match the face image and the human body image.
上述方法在单对象场景中,以及多对象且多对象之间不重叠的场景中,能够较好的识别第i帧图像中包括的对象。但是,当拍摄的人物较多且存在人物重叠的场景下,上述方法识别第i帧图像中包括的对象的准确率就较低,容易出现识别错位或根本识别不到重叠的人物。The above method can better identify the objects included in the i-th frame image in single-object scenes and in scenes with multiple objects and non-overlapping between the objects. However, when there are many people photographed and there are overlapping people, the accuracy of the above method in identifying the objects included in the i-th frame image is low, and it is easy to cause recognition misalignment or not recognize the overlapping people at all.
因此,在多对象场景中,特别是人物重叠的多对象场景中,S701-S702所示对象识别方法可以更稳定地且正确的识别到图像帧中包括的多个对象。Therefore, in a multi-object scene, especially in a multi-object scene with overlapping people, the object recognition method shown in S701-S702 can more stably and correctly identify multiple objects included in the image frame.
可以理解的,在终端100支持识别动物、植物等对象的场景下,上述对象识别算法还包括特定动物的识别算法,以及特定植物的识别算法。这样,终端100可以识别第i帧图像中是否包括动物、植物等类型的对象。进一步的,终端100可以将上述动物、植物等对象设定为主角。对象识别算法识别可支持识别的对象取决于开发人员的预先设定。 It can be understood that in a scenario where the terminal 100 supports the recognition of objects such as animals and plants, the above-mentioned object recognition algorithm also includes a recognition algorithm for specific animals and a recognition algorithm for specific plants. In this way, the terminal 100 can identify whether the i-th frame image includes objects of types such as animals, plants, etc. Furthermore, the terminal 100 can set the above-mentioned animals, plants and other objects as protagonists. The object recognition algorithm's identification of objects that can support recognition depends on the developer's presets.
S603:显示第i帧图像及与第i帧图像中各对象对应的标记。S603: Display the i-th frame image and markers corresponding to each object in the i-th frame image.
在确定第i帧图像中包括M个对象之后,终端100可创建与上述M个对象分别对应的标记。在显示上述第i帧图像时,终端100可同时显示上述标记。该标记可用于提示用户终端100识别到的可确定为主角的对象。进一步的,该标记可用于用户指示终端100确定哪一对象为主角。After determining that M objects are included in the i-th frame image, the terminal 100 may create tags respectively corresponding to the above-mentioned M objects. When displaying the i-th frame image, the terminal 100 may display the above mark at the same time. This mark can be used to prompt the user terminal 100 to recognize an object that can be determined as a protagonist. Further, the mark can be used by the user to instruct the terminal 100 to determine which object is the protagonist.
结合图7B所示的第i帧图像,在确定第i帧图像中包括的3个对象(人物1、人物2、人物3)之后,终端100可确定与上述3个对象对应的3个标记。参考图1B所示的用户界面102,上述标记可以为预览窗113中的选择框。终端100在显示上述第i帧图像时,终端100可显示选择框121、122、123。其中,选择框121、122、123分别用于标记图像中的人物1、人物2、人物3。Combined with the i-th frame image shown in FIG. 7B, after determining the three objects (person 1, person 2, and person 3) included in the i-th frame image, the terminal 100 can determine three markers corresponding to the above three objects. Referring to the user interface 102 shown in FIG. 1B , the above mark may be a selection box in the preview window 113 . When the terminal 100 displays the i-th frame image, the terminal 100 may display the selection boxes 121, 122, and 123. Among them, the selection boxes 121, 122, and 123 are respectively used to mark Person 1, Person 2, and Person 3 in the image.
这样,用户在预览窗113中既可以浏览到摄像头实施采集的图像,又可以同时获取到终端100识别到的对象,即支持设定的主角。进一步的,用户还可以点击任一选择框(例如选择框123),确定该选择框对应的对象(人物3)为主角。在检测到用户点击任意选择框的用户操作之后,终端100可将上述被点击的选择框对应的对象设定为主角。后续,终端100可在摄像头采集的图像序列中定位上述主角,从而实现主角追踪,生成主角特写视频。In this way, the user can not only browse the images collected by the camera in the preview window 113, but also obtain the objects recognized by the terminal 100, that is, the protagonists that support setting. Furthermore, the user can also click on any selection box (for example, selection box 123) to determine that the object (character 3) corresponding to the selection box is the protagonist. After detecting the user operation of clicking any selection box, the terminal 100 may set the object corresponding to the clicked selection box as the protagonist. Subsequently, the terminal 100 can locate the above-mentioned protagonist in the image sequence collected by the camera, thereby realizing protagonist tracking and generating a close-up video of the protagonist.
可以理解的,在终端100支持识别动物、植物等对象的场景下,终端100可相应地在上述动物、植物的图像上显示选择框。用户也可选择上述动物、植物作为主角。It can be understood that in a scenario where the terminal 100 supports the recognition of objects such as animals and plants, the terminal 100 can display a selection box on the image of the above-mentioned animal or plant accordingly. Users can also choose the above-mentioned animals and plants as protagonists.
具体的,选择框的显示位置可基于人脸图像和人体图像确定。图7C示例性示出了终端100确定选择框的显示位置的示意图。如图7C所示,在识别到人脸图像和人体图像之后,终端100可确定人脸图像和人体图像的中点:人脸图像的中点P1,人体图像的中点P2。基于上述P1、P2,终端100可确定上述人脸图像和人体图像对应的对象(即人物3)的中点P3。选择框123的中点即上述P3。Specifically, the display position of the selection box can be determined based on the face image and the human body image. FIG. 7C exemplarily shows a schematic diagram in which the terminal 100 determines the display position of the selection box. As shown in FIG. 7C , after recognizing the face image and the human body image, the terminal 100 can determine the midpoints of the face image and the human body image: the midpoint P1 of the face image and the midpoint P2 of the human body image. Based on the above P1 and P2, the terminal 100 can determine the midpoint P3 of the object (ie, the person 3) corresponding to the above face image and the human body image. The midpoint of the selection frame 123 is the above-mentioned P3.
S604:判断是否检测到选择第i帧图像中第一对象的用户操作。如果是,则确定所述第一对象为主角,将第i帧图像的帧索引号FrameID设置为1;如果否,则获取第i帧图像的下一帧图像(i=i+1),重复S602。S604: Determine whether a user operation of selecting the first object in the i-th frame image is detected. If yes, determine that the first object is the protagonist, and set the frame index number FrameID of the i-th frame image to 1; if not, obtain the next frame image of the i-th frame image (i=i+1), and repeat S602.
在终端100执行完S603所示的操作之后,用户可在终端100的屏幕上看到摄像头相机的第i帧图像,以及与第i帧图像中各个对象对应标记(选择框),参考图1B所示的用户界面102。After the terminal 100 completes the operation shown in S603, the user can see the i-th frame image of the camera on the screen of the terminal 100, as well as the marks (selection boxes) corresponding to each object in the i-th frame image. Refer to Figure 1B The user interface 102 shown.
在显示上述携带有标记的第i帧图像帧之后,终端100可检测到作用于任一标记的用户操作。响应于上述操作,终端100可确定上述标记对应的对象为主角,并将第i帧图像的帧索引号FrameID设置为1。After displaying the i-th image frame carrying the mark, the terminal 100 may detect a user operation acting on any mark. In response to the above operation, the terminal 100 may determine that the object corresponding to the above mark is the protagonist, and set the frame index number FrameID of the i-th frame image to 1.
例如,参考图1C所示的用户界面103,终端100可检测到作用于选择框123的用户操作。响应于上述操作,终端100可确定选择框123对应的对象为主角,即人物3为主角,并将该帧图像的帧索引号FrameID设置为1,即FrameID=1。上述人物3即第一对象,作用于选择框123的用户操作即选择第一对象的操作。FrameID可用于反映该帧图像是确定主角的第几帧图像。For example, referring to the user interface 103 shown in FIG. 1C , the terminal 100 may detect a user operation acting on the selection box 123 . In response to the above operation, the terminal 100 may determine that the object corresponding to the selection box 123 is the protagonist, that is, the character 3 is the protagonist, and set the frame index number FrameID of the frame image to 1, that is, FrameID=1. The above-mentioned character 3 is the first object, and the user operation performed on the selection box 123 is the operation of selecting the first object. FrameID can be used to reflect which frame image the frame image is to determine the protagonist.
终端100显示第i帧图像的时间是较短的。在显示第i帧图像的时间内,终端100不一 定能检测到用户选择某一对象作为主角的操作。同时,在显示第i帧图像之后,终端100需要继续显示摄像头采集的第i帧之后的图像帧。因此,如果在显示上述显示期间内,终端100未检测到作用于任一选择框的用户操作,则终端100可执行i=i+1的操作,获取第i帧图像的下一帧图像,并重复S602。这样,终端100可以实时地识别摄像头采集到的对象,并显示与之对应的标记,以供用户在任意时刻选定一个主角。The time for the terminal 100 to display the i-th frame image is short. During the time it takes to display the i-th frame image, the terminal 100 varies It will be able to detect the user's operation of selecting an object as the protagonist. At the same time, after displaying the i-th frame image, the terminal 100 needs to continue to display the image frames after the i-th frame collected by the camera. Therefore, if the terminal 100 does not detect a user operation on any selection box during the above display period, the terminal 100 can perform the operation of i=i+1, obtain the next frame image of the i-th frame image, and Repeat S602. In this way, the terminal 100 can identify the objects captured by the camera in real time and display the corresponding marks so that the user can select a protagonist at any time.
S604:确定以主角为中心的特写图像。S604: Determine the close-up image centered on the protagonist.
上述特写图像是指在摄像头采集的原始图像(预览窗中显示的图像)的基础上,以选定的主角为中心进行裁剪,得到的图像内容为主角的图像。The above close-up image refers to an image in which the main image is cropped with the selected protagonist as the center based on the original image collected by the camera (the image displayed in the preview window), and the resulting image content is the protagonist.
在确定第i帧图像中第一对象为主角之后,终端100可基于第i帧图像确定第i帧图对应的以主角为中心的特写图像。例如,在用户界面403中,在确定人物3为主角之后,终端100可以以人物3为中心对当前预览窗113中显示的图像进行裁剪,得到图像内容为人物3的特写图像。After determining that the first object in the i-th frame image is the protagonist, the terminal 100 may determine a close-up image centered on the protagonist corresponding to the i-th frame image based on the i-th frame image. For example, in the user interface 403, after determining that the character 3 is the protagonist, the terminal 100 can crop the image currently displayed in the preview window 113 with the character 3 as the center to obtain a close-up image of the character 3 as the image content.
图8A示例性示出了终端100确定主角为中心的特写图像的流程图。FIG. 8A exemplarily shows a flowchart in which the terminal 100 determines a close-up image centered on a protagonist.
S801:根据第i帧图像中主角的人体图像确定第i帧图像帧的缩放比ZoomRatio。S801: Determine the zoom ratio ZoomRatio of the i-th image frame based on the human body image of the protagonist in the i-th frame image.
如果选定的主角在距离摄像头较远,则主角图像在整个原始图像中所占的图像面积越小。这时,以主角为中心的特写图像的尺寸越小。反之,主角图像在整个原始图像中所占的图像面积越大,则主角为中心的特写图像的尺寸越大。If the selected protagonist is farther away from the camera, the protagonist image occupies a smaller image area in the entire original image. At this time, the size of the close-up image centered on the protagonist becomes smaller. On the contrary, the larger the image area occupied by the protagonist image in the entire original image, the larger the size of the close-up image with the protagonist as the center.
具体的,参考图8B所示的第i帧图像,如果主角为人物1,则小窗中期望展示的人物1的特写图像应该为虚线框61围成的图像。这时,虚线框61的尺寸即为第i帧图像中主角的特写图像的尺寸。如果主角为人物3,则小窗中期望展示的人物3的特写图像应该为虚线框62围成的图像。这时,虚线框62的尺寸即为第i帧图像中主角的特写图像的尺寸。由此可知,为了保证主角的完整性,终端100需要根据原始图像中的主角的大小确定特写图像的尺寸。Specifically, referring to the i-th frame image shown in FIG. 8B, if the protagonist is character 1, the close-up image of character 1 expected to be displayed in the small window should be an image surrounded by a dotted frame 61. At this time, the size of the dotted frame 61 is the size of the close-up image of the protagonist in the i-th frame image. If the protagonist is character 3, the close-up image of character 3 expected to be displayed in the small window should be an image surrounded by a dotted frame 62. At this time, the size of the dotted frame 62 is the size of the close-up image of the protagonist in the i-th frame image. It can be seen from this that in order to ensure the integrity of the protagonist, the terminal 100 needs to determine the size of the close-up image according to the size of the protagonist in the original image.
缩放比ZoomRatio可用于反映原始图像中的主角的大小。在确定ZoomRatio之后,终端100可确定当前帧中主角的特写图像的尺寸。The zoom ratio ZoomRatio can be used to reflect the size of the main character in the original image. After determining the ZoomRatio, the terminal 100 may determine the size of the close-up image of the protagonist in the current frame.
具体的,终端100确定ZoomRatio的计算过程如下:Specifically, the calculation process for terminal 100 to determine ZoomRatio is as follows:
首先,在S602所示的对象识别步骤中,终端100可利用预设的人体识别算法识别图像中的人体图像,例如body1、body2、body3等。在确定人物3为主角之后,终端100可利用人物3的人体图像(body3)的尺寸确定ZoomRatio。First, in the object recognition step shown in S602, the terminal 100 can use a preset human body recognition algorithm to recognize human body images in the image, such as body1, body2, body3, etc. After determining that character 3 is the protagonist, the terminal 100 may determine the ZoomRatio using the size of the human body image (body3) of character 3.
其中,利用人体图像确定ZoomRatio的计算公式(Q1)如下:
Among them, the calculation formula (Q1) for determining ZoomRatio using human body images is as follows:
其中,maxBboxSize是指识别到的最大的人体图像的尺寸;detectBboxSize是指主角的人体图像的尺寸;minZoomRatio是预设的ZoomRatio的最小值;maxZoomRatio为预设的ZoomRatio的最大值。Among them, maxBboxSize refers to the size of the largest recognized human body image; detectBboxSize refers to the size of the protagonist's human body image; minZoomRatio is the minimum value of the preset ZoomRatio; maxZoomRatio is the maximum value of the preset ZoomRatio.
将第i帧图像中的maxBboxSize[i]和detectBboxSize[i]输入Q1,终端100可确定第i帧图像的ZoomRatio[i]。 By inputting maxBboxSize[i] and detectBboxSize[i] in the i-th frame image into Q1, the terminal 100 can determine the ZoomRatio[i] of the i-th frame image.
S802:根据ZoomRatio[i]确定第i帧图像对应的主角特写图像的尺寸:CropRagionWidth、CropRagionHeight。S802: Determine the size of the close-up image of the protagonist corresponding to the i-th frame image according to ZoomRatio[i]: CropRagionWidth, CropRagionHeight.
CropRagionWidth用于表示特写图像的宽,CropRagionHeight用于表示特写图像的高。CropRagionWidth、CropRagionHeight可基于前述介绍的ZoomRatio来确定。具体的,CropRagionWidth、CropRagionHeight的计算公式(Q2、Q3)如下:

CropRagionWidth is used to represent the width of the close-up image, and CropRagionHeight is used to represent the height of the close-up image. CropRagionWidth and CropRagionHeight can be determined based on the ZoomRatio introduced above. Specifically, the calculation formulas (Q2, Q3) of CropRagionWidth and CropRagionHeight are as follows:

其中,WinWidth用于表示小窗的宽;WinHeight用于表示小窗的高。基于WinWidth、WinHeight和ZoomRatio得到的CropRagionWidth、CropRagionHeight刚好可以与小窗的宽、高对应,从而避免在小窗中显示特写图像时发生图像变形的问题。优选的,当小窗为竖窗时,WinWidth的取值可以为1080p(pixel),WinHeight的取值可以为1920p。当小窗为横窗时,WinWidth的取值可以为1920p,WinHeight的取值可以为1080p。Among them, WinWidth is used to represent the width of the small window; WinHeight is used to represent the height of the small window. The CropRagionWidth and CropRagionHeight obtained based on WinWidth, WinHeight and ZoomRatio can just correspond to the width and height of the small window, thus avoiding the problem of image deformation when displaying close-up images in the small window. Preferably, when the small window is a vertical window, the value of WinWidth may be 1080p (pixel), and the value of WinHeight may be 1920p. When the small window is a horizontal window, the value of WinWidth can be 1920p and the value of WinHeight can be 1080p.
S803:根据CropRagionWidth、CropRagionHeight以及对象中点对第i帧图像进行裁剪,确定第i帧图像对应的以主角特写图像。S803: Crop the i-th frame image according to CropRagionWidth, CropRagionHeight and the object midpoint, and determine the protagonist's close-up image corresponding to the i-th frame image.
在确定CropRagionWidth、CropRagionHeight之后,结合已知的主角的人物中点(P3),终端100可对原始图像进行裁剪得到以主角为中心的特写图像。参考图8B,以P3为中心,宽和高分别为CropRagionWidth、CropRagionHeight所构成的区域内的图像即主角(人物3)的特写图像。After determining CropRagionWidth and CropRagionHeight, combined with the known character midpoint (P3) of the protagonist, the terminal 100 can crop the original image to obtain a close-up image centered on the protagonist. Referring to FIG. 8B , the image in the area formed by P3 as the center, the width and height being CropRagionWidth and CropRagionHeight respectively, is a close-up image of the protagonist (Character 3).
S606:在小窗中显示上述特写图像,生成特写视频的一帧。S606: Display the above close-up image in a small window to generate a frame of the close-up video.
在检测到确定第i帧图像中第一对象为主角的用户操作之后,终端100可生成一个用于展示特写图像的小窗。优选的,该小窗可以以画中画的形式嵌入在预览窗中。After detecting a user operation that determines the first object in the i-th frame image as the protagonist, the terminal 100 may generate a small window for displaying a close-up image. Preferably, the small window can be embedded in the preview window in a picture-in-picture form.
参考图1D所示的用户界面104,小窗141以画中画的形式嵌入在预览窗113中。优选的,上述小窗可以为宽高比为9:16(竖窗)或16:9(横窗)的矩形。当然,在其他实施例中,预览窗与小窗与还可以其他方式排列,小窗还可以是呈现其他尺寸和形状。例如,在检测到确定主角之后,终端100可将预览窗113分为左右两个并排排列的窗口。一个窗口用于展示摄像头实时采集的原始图像;另一个窗口用于展示以主角为中心的特写图像。本申请实施例对用于展示特写图像具体形式不作限制。Referring to the user interface 104 shown in FIG. 1D , a small window 141 is embedded in the preview window 113 in the form of a picture-in-picture. Preferably, the above-mentioned small window may be a rectangle with an aspect ratio of 9:16 (vertical window) or 16:9 (horizontal window). Of course, in other embodiments, the preview window and the small window can also be arranged in other ways, and the small window can also present other sizes and shapes. For example, after detecting that the protagonist is determined, the terminal 100 may divide the preview window 113 into two windows arranged side by side on the left and right. One window is used to display the original image collected by the camera in real time; the other window is used to display the close-up image centered on the protagonist. The embodiments of this application do not limit the specific form used to display close-up images.
在经过S605的处理之后,终端100可确定第i帧图对应的以主角为中心的特写图像。这时,终端100可在上述小窗中显示以主角为中心的特写图像。After the processing of S605, the terminal 100 may determine the close-up image centered on the protagonist corresponding to the i-th frame. At this time, the terminal 100 may display a close-up image centered on the protagonist in the above-mentioned small window.
在一些示例中,特写图像的宽CropRagionWidth、高CropRagionHeight与用于展示特写图像的小窗的宽WinWidth、高WinHeight分别相等,参考图8C。例如,CropRagionWidth=1080p、CropRagionHeight=1920p,同时,WinWidth=1080p、WinHeight=1920p。这时,按1080p、1920p裁剪得到的特写图像刚好以小窗匹配,终端100 可以直接将上述特写图像显示在小窗中。In some examples, the width CropRagionWidth and height CropRagionHeight of the close-up image are respectively equal to the width WinWidth and height WinHeight of the small window used to display the close-up image, see Figure 8C. For example, CropRagionWidth=1080p, CropRagionHeight=1920p, and at the same time, WinWidth=1080p, WinHeight=1920p. At this time, the close-up images cropped according to 1080p and 1920p just match the small window, and the terminal 100 The above close-up image can be displayed directly in the small window.
但是,在另一些示例中,特写图像的CropRagionWidth、CropRagionHeight与WinWidth、WinHeight不相等。这时,终端100可以对特写图像进行自适应调整,以获得与小窗尺寸匹配的特写图像,然后在小窗中显示上述特写图像。上述自适应调整包括等比例扩大和等比例缩小。例如,参考图8D,CropRagionWidth=540p、CropRagionHeight=960p,而用于展示特写图像的小窗的WinWidth=1080p、WinHeight=1920p。这时,终端100可以将对540p、960p的特写图像进行等比例扩大,从而得到1080p、1920p的特写图像。这样,终端100也可以将上述特写图像显示在小窗中。However, in other examples, the CropRagionWidth, CropRagionHeight and WinWidth, WinHeight of the close-up image are not equal. At this time, the terminal 100 can adaptively adjust the close-up image to obtain a close-up image that matches the size of the small window, and then display the close-up image in the small window. The above-mentioned adaptive adjustment includes equal proportional expansion and equal proportional reduction. For example, referring to Figure 8D, CropRagionWidth=540p, CropRagionHeight=960p, and the small window used to display a close-up image has WinWidth=1080p, WinHeight=1920p. At this time, the terminal 100 can enlarge the close-up images of 540p and 960p in equal proportions to obtain close-up images of 1080p and 1920p. In this way, the terminal 100 can also display the above close-up image in a small window.
在录制视频的过程中,上述经过自适应调整处理的,送往小窗显示的特写图像即特写视频的一帧。During the video recording process, the above-mentioned close-up image that has undergone adaptive adjustment processing and is sent to the small window for display is one frame of the close-up video.
S607:获取FrameID=1之后的第j帧图像,FrameID=1+j,确定第j帧图像中的主角。S607: Obtain the jth frame image after FrameID=1, FrameID=1+j, and determine the protagonist in the jth frame image.
参考S603中的介绍,在检测到用户作用在第i帧图像上的确定第一对象为主角的操作之后,终端100可以将第i帧图像的FrameID设置为1,以表示该帧为确定主角的第一帧图像。Referring to the introduction in S603, after detecting the user's operation on the i-th frame image to determine the first object as the protagonist, the terminal 100 can set the FrameID of the i-th frame image to 1 to indicate that the frame is the protagonist. The first frame of the image.
在显示FrameID=1中主角的特写图像时,终端100还可同时获取FrameID=1之后的摄像头采集的图像帧。在接收到FrameID=1之后的图像帧之后,终端100可以识别FrameID=1之后的图像帧中的对象,确定该帧图像是否包括主角。When displaying the close-up image of the protagonist in FrameID=1, the terminal 100 can also simultaneously obtain the image frames collected by the camera after FrameID=1. After receiving the image frame after FrameID=1, the terminal 100 can identify the object in the image frame after FrameID=1 and determine whether the image frame includes the protagonist.
下面以FrameID=1之后的第j帧图像帧(FrameID=1+j)为例,具体介绍终端100定位FrameID=1之后的图像帧中的主角的方法。Taking the j-th image frame after FrameID=1 (FrameID=1+j) as an example, the method for the terminal 100 to locate the protagonist in the image frame after FrameID=1 is specifically introduced below.
方法一:method one:
在一些示例中,在获取到第j帧图像之后,终端100可以利用人体识别算法首先识别第j帧图像中包括的对象,参考S602。然后,终端100可利用相似度算法,计算上述各个对象与第j-1帧图像中主角的相似度,进而确定上述各个对象与第j-1帧图像中主角的相似距离(相似距离=1-相似度)。相似距离越小则表示该对象与主角的区别越小,即越相似。因此,第j帧图像中相似距离最小且低于相似距离阈值的对象可被确定为主角。In some examples, after acquiring the j-th frame image, the terminal 100 may use a human body recognition algorithm to first identify objects included in the j-th frame image, refer to S602. Then, the terminal 100 can use the similarity algorithm to calculate the similarity between the above-mentioned objects and the protagonist in the j-1th frame image, and then determine the similarity distance between each of the above-mentioned objects and the protagonist in the j-1th frame image (similarity distance = 1- similarity). The smaller the similarity distance, the smaller the difference between the object and the protagonist, that is, the more similar the object is. Therefore, the object in the jth frame image with the smallest similarity distance and lower than the similarity distance threshold can be determined as the protagonist.
当然,终端100也可直接使用相似度确定第j帧图像中的主角。这时,第j帧图像中相似度最高且高于相似度阈值的对象可被确定为主角。Of course, the terminal 100 can also directly use the similarity to determine the protagonist in the j-th frame image. At this time, the object with the highest similarity and higher than the similarity threshold in the j-th frame image can be determined as the protagonist.
然而,每次都计算第j帧图像中各个对象与第j-1帧图像中主角的相似度是十分耗费计算资源的。而且,当第j帧图像中,如果两个或两个以上的对象的图像内容重叠,则上述对象与第j-1帧图像中主角的相似度会受到影响,从而影响主角识别结果的准确性。However, it is very computationally expensive to calculate the similarity between each object in the j-th frame image and the protagonist in the j-1th frame image every time. Moreover, if the image contents of two or more objects overlap in the j-th frame image, the similarity between the above-mentioned objects and the protagonist in the j-1th frame image will be affected, thus affecting the accuracy of the protagonist recognition results. .
方法二:Method Two:
针对方法一所存在的缺陷,终端100也可以使用方法二定位第j帧图像中的主角。图9示例性示出了第二种定位第j帧图像中的主角的流程图。In view of the shortcomings of method one, the terminal 100 can also use method two to locate the protagonist in the j-th frame image. Figure 9 exemplarily shows a second flow chart for locating the protagonist in the j-th frame image.
首先,终端100仍然需要先对第j帧图像进行对象识别,确定第j帧图像中包括的对象。然后,终端100可判断该第j帧图像帧中的对象是否重叠,然后根据第j帧图像中的对象是否重叠,确定使用不同的主角定位方法确定第j帧图像中的主角。First, the terminal 100 still needs to perform object recognition on the j-th frame image to determine the objects included in the j-th frame image. Then, the terminal 100 may determine whether the objects in the j-th image frame overlap, and then determine to use different protagonist positioning methods to determine the protagonist in the j-th image frame based on whether the objects in the j-th image frame overlap.
当第j帧图像帧中的对象不存在重叠的情况时,终端100可通过第j帧图像中所有对象 的与第j-1中主角的交并比距离(IoU距离,记为[IoU])来确定第j帧图像的主角。反之,当重叠时,终端100可通过第j帧图像中所有对象的与第j-1中主角的IoU距离和重识别距离(ReID距离,记为[ReID])来确定第j帧图像的主角。When the objects in the j-th image frame do not overlap, the terminal 100 can pass all objects in the j-th image frame The intersection-union ratio distance (IoU distance, recorded as [IoU]) with the protagonist in j-1 is used to determine the protagonist of the j-th frame image. On the contrary, when overlapping, the terminal 100 can determine the protagonist of the j-th frame image through the IoU distance and re-identification distance (ReID distance, denoted as [ReID]) of all objects in the j-th frame image from the protagonist in j-1 .
参考S602的介绍,在确定第j帧中包括的对象时,终端100可以通过人体检测算法识别到的多个对象的人体范围(即人体框)。这时,终端100可利用上述人体框是否相交来判断第j帧图像中的对象是否存在重叠。如图10A所示,第j帧图像任意两个对象不重叠(任意两个对象的人体框不相交),例如人物3与人物4,则终端100确定第j帧图像中的对象不重叠。如图10B所示,第j帧图像中存在至少两个对象重叠,例如人物3与人物4,则第j帧图像中的对象重叠。Referring to the introduction of S602, when determining the objects included in the j-th frame, the terminal 100 may use the human body ranges (ie, human body frames) of the multiple objects recognized by the human body detection algorithm. At this time, the terminal 100 can use whether the human body frames intersect to determine whether the objects in the j-th frame image overlap. As shown in FIG. 10A , if any two objects in the j-th frame image do not overlap (the human body frames of any two objects do not intersect), such as person 3 and person 4, then the terminal 100 determines that the objects in the j-th frame image do not overlap. As shown in FIG. 10B , if there are at least two overlapping objects in the j-th frame image, such as person 3 and person 4, then the objects in the j-th frame image overlap.
在不重叠的情况下,终端100可通过第j帧图像中所有对象的与第j-1中主角的IoU距离确定第j帧图像的主角。In the case of no overlap, the terminal 100 can determine the protagonist of the j-th frame image through the IoU distance of all objects in the j-th frame image to the protagonist in the j-1 th frame.
这是因为,在录制视频的过程中,前后两帧图像的时间间隔非常短。以帧率30fps为例,前后两帧的时间间隔为1/30s。这时,在相邻两帧的时间内,一个对象在前后两帧图像中存在较大的IoU距离是比较困难的。因此,在不重叠的情况下,终端100可首先确定第j帧图像中所有对象的与第j-1中主角的IoU距离,并确定第j帧图像中的最小IoU距离[IoU]minThis is because during the video recording process, the time interval between the two frames before and after is very short. Taking the frame rate of 30fps as an example, the time interval between the two frames before and after is 1/30s. At this time, within the time period of two adjacent frames, it is more difficult for an object to have a large IoU distance in the two frames before and after the image. Therefore, in the case of non-overlapping, the terminal 100 may first determine the IoU distances of all objects in the j-th frame image from the protagonist in j-1, and determine the minimum IoU distance [IoU] min in the j-th frame image.
具体的,参考图10C,其中,虚线框3可表示第j-1帧中主角的人体框;虚线框1'可表示第j帧中人物1的人体框;虚线框2'可表示第j帧中人物2的人体框;虚线框3'可表示第j帧中人物3的人体框;虚线框4'可表示第j帧中人物4的人体框。Specifically, refer to Figure 10C, in which the dotted box 3 can represent the human body frame of the protagonist in the j-1th frame; the dotted box 1' can represent the human body frame of the character 1 in the jth frame; the dotted box 2' can represent the jth frame The human body frame of character 2 in the j-th frame; the dotted-line frame 3' can represent the human body frame of character 3 in the j-th frame; the dotted-line frame 4' can represent the human body frame of character 4 in the j-th frame.
以虚线框3和虚线框1'为例,利用上述两个虚线框所构成的区域,终端100可确定虚线框3和虚线框1'的交并比,记为IoU31。于是,终端100可确定第j帧中人物1与第j-1帧中主角的IoU距离[IoU31]:Taking the dotted box 3 and the dotted box 1' as an example, using the area formed by the two dotted boxes, the terminal 100 can determine the intersection ratio of the dotted box 3 and the dotted box 1', which is recorded as IoU 31 . Therefore, the terminal 100 can determine the IoU distance [IoU 31 ] between character 1 in the j-th frame and the protagonist in the j-1th frame:
[IoU31]=1-IoU31[IoU 31 ]=1-IoU 31 ;
同样的,终端100可得到第j帧中人物2、3、4与第j-1帧中主角的IoU距离:[IoU32]、[IoU33]、[IoU34]。参考图10C,此时,终端100可确定第j帧图像中的[IoU]min为[IoU33]。IoU距离越小表示两个对象之间位置变化越小。结合相邻两帧之间时间较短,主角在前后两帧中难以产生较大的位移,因此,IoU距离越小对象越可能为主角。Similarly, the terminal 100 can obtain the IoU distances between characters 2, 3, and 4 in the j-th frame and the protagonist in the j-1th frame: [IoU 32 ], [IoU 33 ], [IoU 34 ]. Referring to FIG. 10C, at this time, the terminal 100 can determine that [IoU] min in the j-th frame image is [IoU 33 ]. The smaller the IoU distance, the smaller the position change between the two objects. Combined with the short time between two adjacent frames, it is difficult for the protagonist to have a large displacement in the two frames before and after. Therefore, the smaller the IoU distance, the more likely the object is to be the protagonist.
但是,第j帧图像中与第j-1中主角的IoU距离最小的对象也不一定为主角。例如,参考图10D,第j帧图像中人物1、2、4与第j-1中主角的IoU距离均为1,第j帧图像中人物3与第j-1中主角的IoU距离为0.9。这时,人物3与主角的IoU距离是最小的。但是,实际上,人物3与主角的IoU距离是非常大的。因此,若直接将IoU距离最小的对象(人物3)确定为主角,容易发生主角误识别,进而导致针对主角的自动追踪失败,影响用户使用体验。However, the object in the j-th frame image that has the smallest IoU distance from the protagonist in j-1 is not necessarily the protagonist. For example, referring to Figure 10D, the IoU distance between characters 1, 2, and 4 in the j-th frame image and the protagonist in j-1 is 1, and the IoU distance between character 3 in the j-th frame image and the protagonist in j-1 is 0.9. . At this time, the IoU distance between character 3 and the protagonist is the smallest. However, in fact, the IoU distance between character 3 and the protagonist is very large. Therefore, if the object with the smallest IoU distance (Character 3) is directly determined as the protagonist, misidentification of the protagonist will easily occur, which will lead to the failure of automatic tracking of the protagonist and affect the user experience.
因此,在确定第j帧图像中的[IoU]min之后,终端100还需确定上述[IoU]min是否小于预设的IoU距离阈值(记为D1)。Therefore, after determining [IoU] min in the j-th frame image, the terminal 100 also needs to determine whether the above-mentioned [IoU] min is less than the preset IoU distance threshold (denoted as D1).
如果[IoU]min<D1,则终端100可确定上述[IoU]min对应的对象为主角。示例性的,D1=0.2。结合图10C中确定的最小[IoU]距离:[IoU33]。当[IoU33]<0.2时,终端100可确定第j帧图像中的人物3与第j-1帧图像中的主角为同一对象,即确定第j帧图像的人物3为主角。If [IoU] min <D1, the terminal 100 may determine that the object corresponding to the above [IoU] min is the protagonist. For example, D1=0.2. Combined with the minimum [IoU] distance determined in Figure 10C: [IoU 33 ]. When [IoU 33 ]<0.2, the terminal 100 can determine that the character 3 in the j-th frame image and the protagonist in the j-1-th frame image are the same object, that is, determine that the character 3 in the j-th frame image is the protagonist.
如果[IoU]min<D1不成立,则终端100可标记该帧图像中主角丢失(未匹配到主角)。然 后终端100可根据当前累计的主角丢失的图像帧的数量确定是否终止主角追踪,这里先不展开。If [IoU] min <D1 does not hold, the terminal 100 can mark that the protagonist in the frame image is missing (the protagonist is not matched). Ran The rear terminal 100 may determine whether to terminate the protagonist tracking based on the currently accumulated number of image frames lost by the protagonist, which will not be discussed here.
IoU距离为可选的一种判断后一帧视频中各个对象与前一帧中主角的相似程度的指标。当然,终端100可也选用其他指标。例如,终端100还可直接使用IoU确定第j帧图像的所有对象中与第j-1中主角的相似程度。这时,第j帧图像中与第j-1中主角的IoU最大且大于IoU阈值的对象可被确认为主角。IoU distance is an optional indicator to determine the similarity between each object in the next frame of video and the protagonist in the previous frame. Of course, the terminal 100 may also choose other indicators. For example, the terminal 100 can also directly use IoU to determine the degree of similarity between all objects in the j-th frame image and the protagonist in the j-1th frame. At this time, the object in the j-th frame image that has the largest IoU with the protagonist in j-1 and is greater than the IoU threshold can be confirmed as the protagonist.
在重叠的情况下,终端100可通过第j帧图像中所有对象的与第j-1中主角的IoU距离和ReID距离确定第j帧图像的主角。In the case of overlap, the terminal 100 may determine the protagonist of the j-th frame image through the IoU distance and ReID distance of all objects in the j-th frame image from the protagonist in the j-1th frame.
如果第j帧图像中的对象存在重叠,那么重叠的对象彼此之间的距离是较近的。这时,在相邻两帧的时间内,一个对象从前一帧中的自己所处的位置移动到另一对象所处的位置是容易实现的。因此,在重叠的场景下,非主角的对象极有可能在一下帧中移动到原来主角所处的位置。这时,仅仅通过第j帧图像中所有对象的与第j-1中主角的IoU距离,终端100不能确定第j帧图像的主角。If the objects in the j-th frame image overlap, then the distance between the overlapping objects is relatively close to each other. At this time, within two adjacent frames, it is easy for an object to move from its position in the previous frame to the position of another object. Therefore, in overlapping scenes, the non-protagonist object is very likely to move to the original protagonist's position in the next frame. At this time, the terminal 100 cannot determine the protagonist of the j-th frame image only through the IoU distance of all objects in the j-th frame image to the protagonist in the j-1th frame.
例如,第j-1帧图像与主角重叠的某一对象可能在第j帧图像中就出现在了原来第j-1帧图像中主角的位置。这时,上述某一对象与主角的IoU距离是最近的,但是,该对象却不是主角。这样容易导致误识别。For example, an object that overlaps with the protagonist in the j-1th frame image may appear in the j-th frame image at the position of the original protagonist in the j-1th frame image. At this time, the IoU distance between the above-mentioned object and the protagonist is the closest, but the object is not the protagonist. This can easily lead to misidentification.
因此,在重叠的情况下,终端100除了利用前后两帧图像中各个对象的IoU距离确定主角之外,终端100还需要判断各个位置中的对象是否为用户原来确定的主角。这时,终端100还需计算第j帧图像中所有对象与第j-1图像中主角的ReID距离。ReID距离是基于利用神经网络得到的,用于反映图像内容之间相似程度的参数。Therefore, in the case of overlap, in addition to using the IoU distance of each object in the two frames of images to determine the protagonist, the terminal 100 also needs to determine whether the object in each position is the protagonist originally determined by the user. At this time, the terminal 100 also needs to calculate the ReID distance between all objects in the j-th image and the protagonist in the j-1-th image. ReID distance is based on the use of neural networks and is a parameter that reflects the degree of similarity between image contents.
图11示例性示出了终端100确定第j帧图像中各个对象与第j-1图像中主角的ReID距离示意图。如图11所示,利用卷积神经网络(convolutional neural network,CNN),终端100可确定第j-1帧图像中主角的特征向量F0。同样的,利用CNN,终端100可确第j帧图像中各个对象(人物1~4)的特征向量F1~F4。然后,终端100可计算第j帧图像中各个对象的特征向量(F1~F4)与第j-1帧图像中主角的特征向量F0的内积:<F0,F1>、<F0,F2>、<F0,F3>、<F0,F4>。FIG. 11 exemplarily shows a schematic diagram of the terminal 100 determining the ReID distance between each object in the j-th image and the protagonist in the j-1-th image. As shown in Figure 11, using a convolutional neural network (CNN), the terminal 100 can determine the feature vector F0 of the protagonist in the j-1th frame image. Similarly, using CNN, the terminal 100 can determine the feature vectors F1 to F4 of each object (people 1 to 4) in the j-th frame image. Then, the terminal 100 can calculate the inner product of the feature vectors (F1˜F4) of each object in the j-th frame image and the feature vector F0 of the protagonist in the j-1th frame image: <F0, F1>, <F0, F2>, <F0,F3>, <F0,F4>.
以人物1为例,在确定特征向量F1与第j-1帧图像中主角的特征向量F0的内积<F0,F1>之后,终端100可确定人物1与主角的ReID距离(记为[ReID]31):Taking character 1 as an example, after determining the inner product <F0, F1> of the feature vector F1 and the feature vector F0 of the protagonist in the j-1th frame image, the terminal 100 can determine the ReID distance between the character 1 and the protagonist (denoted as [ReID ] 31 ):
[ReID]31=1-<F0,F1>;[ReID] 31 =1-<F0,F1>;
同样的,终端100可得到第j帧中人物2、3、4与第j-1帧中主角的ReID距离:[ReID]32、[ReID]33、[ReID]34。ReID距离越小则表示该对象与主角的相似度越高。在确定第j帧各个对象与第j-1帧中主角的ReID距离之后,终端100可确定第j帧图像中的最小ReID距离[ReID]min。参考图11,此时,终端100可确定第j帧图像中的[ReID]min为[ReID]33Similarly, the terminal 100 can obtain the ReID distances between characters 2, 3, and 4 in the j-th frame and the protagonist in the j-1th frame: [ReID] 32 , [ReID] 33 , [ReID] 34 . The smaller the ReID distance, the higher the similarity between the object and the protagonist. After determining the ReID distance between each object in the jth frame and the protagonist in the j-1th frame, the terminal 100 may determine the minimum ReID distance [ReID] min in the jth frame image. Referring to FIG. 11 , at this time, the terminal 100 can determine that [ReID] min in the j-th frame image is [ReID] 33 .
然后,终端100可确定上述各个对象与主角的IoU+ReID距离,即IoU距离与ReID距离的和,记为[IoU+ReID]。[IoU+ReID]越小意味着对象与主角的IoU越小,同时ReID距离也越小。从图像上看,该对象与原主角的位置相近且图像内容相似。因此,终端100可利用[IoU+ReID]确定第j帧图像的主角。并且,[IoU+ReID]越小的对象越可能为主角。Then, the terminal 100 can determine the IoU+ReID distance between each of the above objects and the protagonist, that is, the sum of the IoU distance and the ReID distance, which is recorded as [IoU+ReID]. The smaller [IoU+ReID] means the smaller the IoU between the object and the protagonist, and the smaller the ReID distance. Image-wise, the object is located close to the original protagonist and has similar image content. Therefore, the terminal 100 can determine the protagonist of the j-th frame image using [IoU+ReID]. Moreover, the smaller the [IoU+ReID], the more likely the object is to be the protagonist.
同样的,在重叠的情况下,第j帧图像中与第j-1中主角的[IoU+ReID]最小的对象也不 一定为主角。因此,在确定第j帧图像中的[IoU+ReID]min之后,终端100还需确定上述[IoU+ReID]min是否小于预设的IoU+ReID距离阈值(记为D2)。如果[IoU+ReID]min<D2,则终端100可确定上述[IoU+ReID]min对应的对象为主角。反之,如果[IoU+ReID]min<D2不成立,则终端100可标记该帧图像中主角丢失。Similarly, in the case of overlap, the object with the smallest [IoU+ReID] in the j-th frame image and the protagonist in j-1 will not Must be the protagonist. Therefore, after determining [IoU+ReID] min in the j-th frame image, the terminal 100 also needs to determine whether the above-mentioned [IoU+ReID] min is less than the preset IoU+ReID distance threshold (denoted as D2). If [IoU+ReID] min <D2, the terminal 100 may determine that the object corresponding to the above [IoU+ReID] min is the protagonist. On the contrary, if [IoU+ReID] min <D2 does not hold, the terminal 100 can mark that the main character in the frame image is missing.
在一些实施例中,为了提升计算效率,终端100还可周期性的执行图9所示的定位第j帧图像中的主角的方法,参考图12A。In some embodiments, in order to improve calculation efficiency, the terminal 100 may also periodically execute the method of locating the protagonist in the j-th frame image shown in FIG. 9 , see FIG. 12A .
如图12A所示,在获取FrameID=1之后的第j帧图像之后,终端100可首先判断该图像帧的帧索引号FrameID是否能被N整除(本申请实施例以N=4为例进行说明)。As shown in Figure 12A, after acquiring the jth frame image after FrameID=1, the terminal 100 can first determine whether the frame index number FrameID of the image frame is divisible by N (the embodiment of this application uses N=4 as an example for explanation). ).
当FrameID%4=0成立时,即FrameID能被4整除,终端100可通过图9中介绍的方法确定第j帧图像中的主角。反之,当FrameID%4=0不成立时,即FrameID不能被4整除,终端100可利用核相关滤波算法(Kernel Correlation Filter,KCF)确定第j帧图像中的主角。KCF算法是现有的,这里不再赘述。When FrameID%4=0 is established, that is, FrameID is divisible by 4, the terminal 100 can determine the protagonist in the j-th frame image through the method introduced in Figure 9. On the contrary, when FrameID%4=0 does not hold, that is, FrameID is not divisible by 4, the terminal 100 can use the Kernel Correlation Filter algorithm (Kernel Correlation Filter, KCF) to determine the protagonist in the j-th frame image. The KCF algorithm is existing and will not be described in detail here.
这样,终端100可以避免每次都计算IoU距离、ReID距离,从而节省计算资源,提升计算效率。In this way, the terminal 100 can avoid calculating the IoU distance and ReID distance every time, thereby saving computing resources and improving computing efficiency.
S608:确认是否匹配到主角,如果是,则执行S605所示的步骤;如果否,则判断丢失帧是否小于丢失帧数阈值Y。S608: Confirm whether the protagonist is matched. If so, perform the steps shown in S605; if not, determine whether the lost frames are less than the lost frame number threshold Y.
在确定第j帧图像中的主角之后,终端100可执行两方面的操作:一是执行S605所示的步骤:基于第j帧图像中的主角确定以该主角为中心的特写图像,然后在小窗中显示上述特写图像,生成特写视频的一帧;二是获取第j帧图像的下一帧图像(j=j+1),重复S607所示的步骤,进而确定下一帧图像中的主角、显示下一帧图像中的主角的特写图像,以及生成特写视频的又一帧图像。After determining the protagonist in the j-th frame image, the terminal 100 can perform two operations: First, perform the steps shown in S605: determine a close-up image centered on the protagonist based on the protagonist in the j-th frame image, and then The above close-up image is displayed in the window to generate a frame of the close-up video; the second is to obtain the next frame image of the j-th frame image (j=j+1), repeat the steps shown in S607, and then determine the protagonist in the next frame image , display a close-up image of the protagonist in the next frame of the image, and generate another frame of the close-up video.
其中,确定第j帧图像中的主角的特写图像的方法可参考S605的介绍,这里不再赘述。特别的,在一些示例中,终端100可以间隔几帧计算一次ZoomRatio,例如每4帧计算一次ZoomRatio。For the method of determining the close-up image of the protagonist in the j-th frame image, please refer to the introduction of S605, which will not be described again here. In particular, in some examples, the terminal 100 may calculate the ZoomRatio every few frames, for example, calculate the ZoomRatio every 4 frames.
这是因为,在连续的4帧(以4帧为例)的时间内,图像中的某一对象也难以发生较大的变化,这时,这4帧对应的ZoomRatio几乎是一致的。因此,在第k帧确定了一个ZoomRatio之后,第k+1帧、第k+2帧、第k+3帧可沿用上述ZoomRatio,从而节省ZoomRatio的计算频率,节省计算资源。This is because within a period of 4 consecutive frames (taking 4 frames as an example), it is difficult for a certain object in the image to change significantly. At this time, the ZoomRatio corresponding to these 4 frames is almost the same. Therefore, after a ZoomRatio is determined in the kth frame, the k+1th frame, the k+2th frame, and the k+3th frame can continue to use the above ZoomRatio, thereby saving the calculation frequency of the ZoomRatio and saving computing resources.
在一些示例中,当两次ZoomRatio变化较大,终端100可在确定特写图像的过程中,进行平滑处理,从而避免图像跳跃。In some examples, when the ZoomRatio changes significantly between two times, the terminal 100 can perform smoothing processing in the process of determining the close-up image to avoid image jumps.
如果在第j帧图像中未匹配到主角(例如第j帧图像的[IoU]min≥D1,或者第j帧图像的[IoU+ReID]min≥D2),这时,终端100可修改丢失帧数:将丢失帧数的计数增加1。丢失帧数是指终端100连续未识别到的主角的图像帧的数量。然后,终端100可基于丢失帧数判断是否结束主角追踪。If the protagonist is not matched in the j-th frame image (for example, [IoU] min ≥ D1 of the j-th frame image, or [IoU+ReID] min ≥ D2 of the j-th frame image), at this time, the terminal 100 can modify the missing frame Count: Increases the count of lost frames by 1. The number of lost frames refers to the number of image frames of the protagonist that the terminal 100 has not recognized continuously. Then, the terminal 100 may determine whether to end the protagonist tracking based on the number of lost frames.
具体的,终端100可设置有丢失帧数阈值Y。如果当前记录的丢失帧数≥Y,则终端100可确定摄像头采集的对象已经不包括用户初始选定的主角。这时,终端100可确认结束主角追踪。如果当前记录的丢失帧数<Y,则终端100可获取下一帧图像(第j+1帧图像), 确定下一帧图像中是否包括主角。可以理解的,在确定所述下一帧图像是否包括主角时,终端100可以将所述下一帧图像与之前匹配到主角的最后一帧图像进行图9所示的主角追踪计算,确定所述下一帧图像中是否有主角。这里,所述下一帧图像为第j帧图像,所述匹配到主角的最后一帧图像为第j-1帧图像。当后续图像帧均未能识别到初始选中的主角时,丢失帧数会持续增加,直到丢失帧数≥Y,结束主角追踪。Specifically, the terminal 100 may be set with a lost frame number threshold Y. If the number of currently recorded lost frames ≥ Y, the terminal 100 may determine that the objects captured by the camera no longer include the protagonist initially selected by the user. At this time, the terminal 100 can confirm that the protagonist tracking is completed. If the number of currently recorded lost frames is <Y, the terminal 100 can obtain the next frame image (the j+1th frame image), Determine whether the next frame includes the main character. It can be understood that when determining whether the next frame image includes the protagonist, the terminal 100 can perform the protagonist tracking calculation shown in Figure 9 on the next frame image and the last frame image previously matched to the protagonist, and determine that the protagonist Whether there is a protagonist in the next frame of the image. Here, the next frame image is the j-th frame image, and the last frame image matched to the protagonist is the j-1th frame image. When the initially selected protagonist cannot be recognized in subsequent image frames, the number of lost frames will continue to increase until the number of lost frames ≥ Y, and the protagonist tracking ends.
在一些实施例中,在确认结束主角追踪之后,终端100可保持初始设定的主角,并在后续摄像头采集的图像中继续定位上述主角。当重新检测到上述主角后,终端100可继续录制该主角的特写视频。In some embodiments, after confirming the end of protagonist tracking, the terminal 100 may maintain the initially set protagonist and continue to locate the protagonist in subsequent images collected by the camera. After re-detecting the above-mentioned protagonist, the terminal 100 can continue to record the close-up video of the protagonist.
结合图1F-图1H所示的用户界面,在检测到连续多帧图像中不包括初始设定的主角人物3时,终端100可关闭小窗141并停止录制人物3的特写视频。当重新检测到人物3时,终端100可重新显示小窗141,并重新开始录制人物3的特写视频。Combined with the user interface shown in FIGS. 1F to 1H , when it is detected that the initial set protagonist character 3 is not included in the consecutive multi-frame images, the terminal 100 can close the small window 141 and stop recording the close-up video of the character 3 . When the person 3 is detected again, the terminal 100 can re-display the small window 141 and restart recording the close-up video of the person 3.
在一些实施例中,在确认结束主角追踪,终端100还可指示用户选择新的主角。在再次检测到用户选择主角的用户操作,终端100可确定新的主角,并定位后续图像中的新的主角,同时显示并保存新主角的特写视频。In some embodiments, after confirming to end protagonist tracking, the terminal 100 may also instruct the user to select a new protagonist. After detecting the user operation of selecting the protagonist again, the terminal 100 can determine the new protagonist, locate the new protagonist in subsequent images, and simultaneously display and save a close-up video of the new protagonist.
结合图2A-图2B所示的用户界面,在确定结束追踪人物3之后,当重新检测到选择人物2为主角的用户操作之后,终端100可重新确定人物2为主角,生成小窗141并在小窗141中显示人物2的特写图像。Combined with the user interface shown in FIGS. 2A and 2B , after determining to end tracking character 3 and re-detecting the user operation of selecting character 2 as the protagonist, the terminal 100 can re-determine character 2 as the protagonist, generate a small window 141 and A close-up image of the character 2 is displayed in the small window 141 .
结合图2A-图2B所示的用户界面,在一些实施例中,终端100支持在拍摄的过程中切换主角。这时,终端100还可应在获取第j帧图像之后,确定该帧图像是否对应有切换主角的用户操作,以更换主角,更换小窗141中显示特写图像。Combined with the user interface shown in FIGS. 2A and 2B , in some embodiments, the terminal 100 supports switching protagonists during shooting. At this time, the terminal 100 may also determine whether the j-th frame image corresponds to a user operation of switching the protagonist after acquiring the j-th frame image, so as to change the protagonist and display the close-up image in the small window 141.
参考图12B,在S607所示的步骤之后,终端100可确定是否检测到切换主角的用户操作,例如图2A-图2B所示的点击与人物2对应的选择框122将主角人物3切换为人物2的用户操作。当检测到上述切换主角的用户操作时,终端100可确定第j帧图像中的人物2为主角。然后,终端100可将上述第j帧图像的FrameID重新设定为1(S604),并获取该图像帧之后的图像帧,定位后续图像帧中的人物2,从而实现在小窗141中显示新主角人物2的特写图像。Referring to Figure 12B, after the step shown in S607, the terminal 100 can determine whether a user operation of switching the protagonist is detected, for example, clicking the selection box 122 corresponding to the character 2 shown in Figures 2A-2B switches the protagonist character 3 to the character. 2 user operations. When detecting the above-mentioned user operation of switching the protagonist, the terminal 100 may determine that character 2 in the j-th frame image is the protagonist. Then, the terminal 100 can reset the FrameID of the jth frame image to 1 (S604), obtain the image frame after the image frame, and locate the person 2 in the subsequent image frame, thereby displaying the new image in the small window 141. A close-up image of the main character 2.
下面,图13例性示出了终端100编辑已拍摄的本地视频、生成并保存特写视频的流程图。Next, FIG. 13 illustrates a flow chart for the terminal 100 to edit the captured local video, generate and save the close-up video.
参考图4C所示的用户界面403,此时,第一控件可以为菜单栏413中的“提取主角”选项控件。用户点击上述“提取主角”的用户操作可称为作用于第一控件的用户操作。Referring to the user interface 403 shown in FIG. 4C , at this time, the first control may be the "Extract Protagonist" option control in the menu bar 413 . The user operation of the user clicking on the above-mentioned "Extract Protagonist" can be called a user operation acting on the first control.
然后,终端100可以获取本地视频的图像帧序列,确定各个图像帧中包括的对象。参考图4D所示的用户界面404,终端100可以显示本地视频中的各个图像帧。在显示上述图像帧时,终端100还会显示对应各个对象的标记(选择框)。进而,终端100可以根据用户操作确定多个对象中的主角。然后,终端100可一次遍历后续图像帧,确定后续图像帧中的主角,从而得到后续以主角为中心的特写图像,生成特写视频。这一过程与图1A-图1I所示的实时拍摄过程中的确定主角的方法相同,这里不再赘述。 Then, the terminal 100 can obtain the image frame sequence of the local video and determine the objects included in each image frame. Referring to the user interface 404 shown in FIG. 4D, the terminal 100 can display each image frame in the local video. When displaying the above image frame, the terminal 100 will also display marks (selection boxes) corresponding to each object. Furthermore, the terminal 100 can determine the protagonist among the plurality of objects based on user operations. Then, the terminal 100 can traverse subsequent image frames at a time, determine the protagonist in the subsequent image frame, thereby obtain a subsequent close-up image centered on the protagonist, and generate a close-up video. This process is the same as the method of determining the protagonist in the real-time shooting process shown in Figures 1A-1I, and will not be described again here.
不同的,在图13所示的编辑本地视频、生成特写视频的方法中,在确定第j帧图像未匹配到主角时,终端100可以不记录丢失帧数。这里,终端100只需判断是否遍历完该本地视频,即第j帧图像是否为本地视频的最后一帧。若视频未结束,即第j帧图像不是本地视频的最后一帧,则终端100可以继续获取下一帧图像,定位下一帧图像中的主角。Differently, in the method of editing local video and generating close-up video shown in Figure 13, when it is determined that the j-th frame image does not match the protagonist, the terminal 100 may not record the number of missing frames. Here, the terminal 100 only needs to determine whether the local video has been traversed, that is, whether the j-th frame image is the last frame of the local video. If the video is not over, that is, the j-th frame image is not the last frame of the local video, the terminal 100 can continue to obtain the next frame image and locate the protagonist in the next frame image.
此外,在图13所示的编辑本地视频、生成特写视频的方法中,在确定某一帧图像中的主角以及主角的特写图像之后,终端100可不显示该特写图像。这是因为,在对本地视频的编辑过程中,因为,终端100无需播放上述本地视频,于是,终端100也就无需在编辑过程中的播放特写视频。在编辑完成后,终端100可以保存上述特写视频。随后,用户可以随时浏览上述特写视频。In addition, in the method of editing a local video and generating a close-up video shown in FIG. 13 , after determining the protagonist in a certain frame image and the close-up image of the protagonist, the terminal 100 may not display the close-up image. This is because, during the editing process of the local video, the terminal 100 does not need to play the local video, so the terminal 100 does not need to play the close-up video during the editing process. After the editing is completed, the terminal 100 can save the above close-up video. Users can then browse the above close-up video at any time.
图14是本申请实施例提供的终端100的系统结构示意图。Figure 14 is a schematic system structure diagram of the terminal 100 provided by the embodiment of the present application.
分层架构将系统分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将系统分为五层,从上至下分别为应用层,应用框架层、硬件抽象层、驱动层以及硬件层。The layered architecture divides the system into several layers, and each layer has clear roles and division of labor. The layers communicate through software interfaces. In some embodiments, the system is divided into five layers, from top to bottom: application layer, application framework layer, hardware abstraction layer, driver layer and hardware layer.
应用层可以包括一系列应用程序包。在本申请实施例中,应用程序包可以包括相机、图库等。The application layer can include a series of application packages. In this embodiment of the present application, the application package may include a camera, a gallery, etc.
应用框架层为应用层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用框架层包括一些预先定义的函数。在本申请实施例中,应用框架层可以包括相机访问接口,视频编辑接口。其中,相机访问接口可以包括相机管理以及相机设备。相机访问接口用于为相机应用提供应用编程接口和编程框架。视频编辑接口用于为图库应用提供编辑图片和/或视频的应用编程接口和编程框架。本申请实施例中主要使用视频编辑接口提供的编辑视频的应用编程接口和编程框架。The application framework layer provides application programming interface (API) and programming framework for applications in the application layer. The application framework layer includes some predefined functions. In this embodiment of the present application, the application framework layer may include a camera access interface and a video editing interface. Among them, the camera access interface may include camera management and camera equipment. The camera access interface is used to provide an application programming interface and programming framework for camera applications. The video editing interface is used to provide an application programming interface and programming framework for editing pictures and/or videos for the gallery application. In the embodiments of this application, the application programming interface and programming framework for editing videos provided by the video editing interface are mainly used.
硬件抽象层为位于应用框架层以及驱动层之间的接口层,为操作系统提供虚拟硬件平台。本申请实施例中,硬件抽象层可以包括相机硬件抽象层以及相机算法库。The hardware abstraction layer is the interface layer between the application framework layer and the driver layer, providing a virtual hardware platform for the operating system. In this embodiment of the present application, the hardware abstraction layer may include a camera hardware abstraction layer and a camera algorithm library.
其中,相机硬件抽象层可以提供相机设备1、相机设备2或更多的相机设备的虚拟硬件。相机算法库可包括实现本申请实施例提供的视频编辑方法的运行代码和数据。Among them, the camera hardware abstraction layer can provide virtual hardware of camera device 1, camera device 2 or more camera devices. The camera algorithm library may include running code and data to implement the video editing method provided by the embodiment of the present application.
驱动层为硬件和软件之间的层。驱动层包括各种硬件的驱动。驱动层可以包括相机设备驱动、数字信号处理器驱动以及图像处理器驱动等。The driver layer is the layer between hardware and software. The driver layer includes drivers for various hardware. The driver layer may include camera device drivers, digital signal processor drivers, image processor drivers, etc.
其中,相机设备驱动用于驱动摄像头的传感器采集图像以及驱动图像信号处理器对图像进行预处理。数字信号处理器驱动用于驱动数字信号处理器处理图像。图像处理器驱动用于驱动图形处理器处理图像。Among them, the camera device driver is used to drive the sensor of the camera to collect images and drive the image signal processor to preprocess the images. The digital signal processor driver is used to drive the digital signal processor to process images. The image processor driver is used to drive the graphics processor to process images.
下面结合上述系统结构,对本申请实施例中的视频编辑方法进行具体描述:The following is a detailed description of the video editing method in the embodiment of the present application in conjunction with the above system structure:
1、在录制视频的同时裁剪原始视频生成主角特写视频:1. While recording the video, crop the original video to generate a close-up video of the protagonist:
响应于用户打开相机应用的操作,例如点击相机应用图标的操作,相机应用调用应用框架层的相机访问接口,启动相机应用,进而通过调用相机硬件抽象层中的相机设备(相机设备1和/或其他相机设备)发送启动摄像头的指令。相机硬件抽象层将该指令发送到内核层的相机设备驱动。该相机设备驱动可以启动相应的摄像头传感器,并通过传感器采集图像光信号。相机硬件抽象层中的一个相机设备对应硬件层的一个摄像头传感器。 In response to the user's operation of opening the camera application, such as clicking the camera application icon, the camera application calls the camera access interface of the application framework layer, starts the camera application, and then calls the camera device (camera device 1 and/or Other camera devices) send instructions to start the camera. The camera hardware abstraction layer sends this instruction to the camera device driver at the kernel layer. The camera device driver can start the corresponding camera sensor and collect image light signals through the sensor. A camera device in the camera hardware abstraction layer corresponds to a camera sensor in the hardware layer.
然后,摄像头传感器可将采集到的图像光信号传输到图像信号处理器进行预处理,得到图像电信号(原始图像),并将上述原始图像通过相机设备驱动传输至相机硬件抽象层。Then, the camera sensor can transmit the collected image optical signal to the image signal processor for pre-processing to obtain the image electrical signal (original image), and transmit the above-mentioned original image to the camera hardware abstraction layer through the camera device driver.
一方面,相机硬件抽象层可以将上述原始图像发送到显示器,进行显示。On the one hand, the camera hardware abstraction layer can send the above raw image to the display for display.
另一方面,相机硬件抽象层可将原始图像发送的相机算法库。相机算法库中存储有实现本申请实施例提供的视频编辑方法(对象识别、主角追踪、裁剪特写图像等处理流程)的程序代码。基于数字信号处理器、图像处理器,执行上述代码,相机算法库还可输出识别到的图像帧中的对象、确定以主角为中心的特写图像,从而实现定位原始图像中的主角,裁剪以主角为中心的特写图像的功能。The camera hardware abstraction layer, on the other hand, sends raw images to the camera algorithm library. The camera algorithm library stores program codes that implement the video editing methods (processing processes such as object recognition, protagonist tracking, and cropping close-up images) provided by the embodiments of this application. Based on the digital signal processor and image processor, executing the above code, the camera algorithm library can also output the recognized objects in the image frame and determine the close-up image centered on the protagonist, thereby locating the protagonist in the original image and cropping the protagonist. Features centered close-up image.
相机算法库可以将确定的特写图像发送到相机硬件抽象层。然后,相机硬件抽象层可以将其进行送显。这样,相机应用可以在显示原始图像的同时,显示以选定主角为中心的特写图像。The camera algorithm library can send determined close-up images to the camera hardware abstraction layer. The camera hardware abstraction layer can then send it to the display. This allows the camera app to display a close-up image centered on the selected protagonist alongside the original image.
在送显的同时,相机硬件抽象层还可将原始图像序列和特写图像序列写入的特定存储空间。这样,终端100可以实现录制视频的功能,将摄像头实时采集的原始图像流和基于原始图像得到的特写图像流保存为本地视频(原始视频和特写视频)。While sending to display, the camera hardware abstraction layer can also write the original image sequence and the close-up image sequence to a specific storage space. In this way, the terminal 100 can realize the function of recording video, and save the original image stream collected by the camera in real time and the close-up image stream obtained based on the original image as local videos (original video and close-up video).
2、裁剪本地视频生成主角特写视频:2. Crop the local video to generate a close-up video of the protagonist:
响应于用户执行的对本地视频进行提取主角的操作,例如图4C所示的点击“提取主角”的操作,相机应用调用应用框架层的图像编辑接口,进而调用相机算法库中存储有实现本申请实施例提供的视频编辑方法的程序代码。基于数字信号处理器、图像处理器,相机算法库执行上述代码,可实现定位原始图像中的主角,裁剪以主角为中心的特写图像的功能,进而实现编辑原始视频得到特写视频的功能。In response to the user's operation of extracting the protagonist from the local video, such as the click of "Extract Protagonist" as shown in Figure 4C, the camera application calls the image editing interface of the application framework layer, and then calls the camera algorithm library stored in the camera algorithm library to implement the present application. The embodiment provides the program code of the video editing method. Based on the digital signal processor and image processor, the camera algorithm library executes the above code to realize the function of locating the protagonist in the original image, cropping the close-up image centered on the protagonist, and then realizing the function of editing the original video to obtain the close-up video.
本申请实施例中,图像中的各对象对应的标记也可以称为选择框;第二视频也可以称为特写视频。In this embodiment of the present application, the mark corresponding to each object in the image may also be called a selection box; the second video may also be called a close-up video.
在一些实施例中,第一界面可以是图1B所示的用户界面102;所述第一图像可以是用户界面102中预览窗113显示的摄像头采集的图像,例如,第一图像为图1B中预览窗113显示的图像,或者,前述摄像头采集的第i帧图像;参考图1C,第一标记可以为人物3对应的选择框123,第一操作可以是作用于选择框123的输入操作,第一对象可以为人物3;第二图像可以是图2A所示的用户界面201的预览窗显示的图像,第二标记可以是上述图像中人物2对应的选择框122,第二对象可以是前述人物2,第五操作可以是作用于选择框122的输入操作;第一子视频可以是前述以人物3为中心的特写视频,第二子视频是可以是前述以人物2为中心的特写视频。第二控件可以是前述控件161;第一窗口可以是前述小窗141;In some embodiments, the first interface may be the user interface 102 shown in Figure 1B; the first image may be an image collected by the camera displayed in the preview window 113 in the user interface 102. For example, the first image is the image in Figure 1B The image displayed in the preview window 113, or the i-th frame image collected by the aforementioned camera; referring to FIG. 1C, the first mark may be the selection box 123 corresponding to the character 3, and the first operation may be an input operation acting on the selection box 123. One object may be character 3; the second image may be the image displayed in the preview window of the user interface 201 shown in FIG. 2A; the second mark may be the selection box 122 corresponding to character 2 in the above image; the second object may be the aforementioned character 2. The fifth operation may be an input operation acting on the selection box 122; the first sub-video may be the aforementioned close-up video centered on character 3, and the second sub-video may be the aforementioned close-up video centered on character 2. The second control may be the aforementioned control 161; the first window may be the aforementioned small window 141;
在一些实施例中,第一视频也可以称为本地视频,参考图4A和图4B,第一视频可以为前述图标411对应的本地视频,第一视频的缩略图可以为本地视频的图标411,第二操作可以为点击图标411的操作;第一界面可以是图4D所示的用户界面404;所述第一图像可以本地视频中的一帧图像,例如,第一图像可以是图4D所示的用户界面404中窗口420中显示的图像,或者,前述本地视频的第i帧图像;参考图4D,第一标记可以为人物3对应的选择框423,第一操作可以是点击选择框423的操作,第一对象可以为人物3;第二图像可以是图4H所示的用户界面404的窗口显示的图像,第二标记可以是上述图像中人物2 对应的选择框422,第二对象可以是前述人物2,第五操作可以是点击选择框422的输入操作;第一子视频可以是前述以人物3为中心的特写视频,第二子视频是可以是前述以人物2为中心的特写视频。In some embodiments, the first video may also be called a local video. Referring to Figures 4A and 4B, the first video may be the local video corresponding to the aforementioned icon 411, and the thumbnail of the first video may be the icon 411 of the local video. The second operation may be an operation of clicking the icon 411; the first interface may be the user interface 404 shown in Figure 4D; the first image may be a frame of image in the local video, for example, the first image may be as shown in Figure 4D The image displayed in the window 420 in the user interface 404, or the i-th frame image of the aforementioned local video; referring to Figure 4D, the first mark may be the selection box 423 corresponding to the character 3, and the first operation may be to click the selection box 423 In operation, the first object may be character 3; the second image may be the image displayed in the window of user interface 404 shown in FIG. 4H, and the second mark may be character 2 in the above image. Corresponding to the selection box 422, the second object can be the aforementioned character 2, and the fifth operation can be an input operation of clicking the selection box 422; the first sub-video can be the aforementioned close-up video centered on the character 3, and the second sub-video can be It is the aforementioned close-up video centered on character 2.
图15是本申请实施例提供的终端100的硬件结构示意图。Figure 15 is a schematic diagram of the hardware structure of the terminal 100 provided by the embodiment of the present application.
终端100可以包括处理器119,外部存储器接口120,内部存储器129,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池149,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达199,指示器198,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The terminal 100 may include a processor 119, an external memory interface 120, an internal memory 129, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 149, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 199, indicator 198, camera 193, display screen 194, and user Identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本发明实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the terminal 100. In other embodiments of the present application, the terminal 100 may include more or fewer components than shown in the figures, or some components may be combined, or some components may be separated, or may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
处理器119可以包括一个或多个处理单元,例如:处理器119可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 119 may include one or more processing units. For example, the processor 119 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器119中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器119中的存储器为高速缓冲存储器。该存储器可以保存处理器119刚用过或循环使用的指令或数据。如果处理器119需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器119的等待时间,因而提高了系统的效率。The processor 119 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 119 is cache memory. This memory may hold instructions or data that have been recently used or recycled by the processor 119 . If the processor 119 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 119 is reduced, thus improving the efficiency of the system.
在一些实施例中,处理器119可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, processor 119 may include one or more interfaces. Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在本申请另一些实施例中,终端100也可以采用上述实施 例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationships between the modules illustrated in the embodiment of the present invention are only schematic illustrations and do not constitute a structural limitation on the terminal 100 . In other embodiments of this application, the terminal 100 may also adopt the above implementation. Different interface connection methods in the example, or a combination of multiple interface connection methods.
充电管理模块140用于从充电器接收充电输入。充电管理模块140为电池149充电的同时,还可以通过电源管理模块141为电子设备供电。电源管理模块141用于连接电池149,充电管理模块140与处理器119。电源管理模块141接收电池149和/或充电管理模块140的输入,为处理器119,内部存储器129,显示屏194,摄像头193,和无线通信模块160等供电。The charging management module 140 is used to receive charging input from the charger. While the charging management module 140 charges the battery 149, it can also provide power to the electronic device through the power management module 141. The power management module 141 is used to connect the battery 149, the charging management module 140 and the processor 119. The power management module 141 receives input from the battery 149 and/or the charging management module 140 and supplies power to the processor 119, internal memory 129, display screen 194, camera 193, wireless communication module 160, etc.
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器119。无线通信模块160还可以从处理器119接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellite system. (global navigation satellite system, GNSS), frequency modulation (FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 119 . The wireless communication module 160 can also receive the signal to be sent from the processor 119, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器119可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The terminal 100 implements the display function through the GPU, the display screen 194, and the application processor. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 119 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)。显示面板还可以采用有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),miniled,microled,micro-oled,量子点发光二极管(quantum dot light emitting diodes,QLED)等制造。在一些实施例中,电子设备可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, etc. Display 194 includes a display panel. Display 194 includes a display panel. The display panel can use a liquid crystal display (LCD). The display panel can also use organic light-emitting diode (OLED), active matrix organic light-emitting diode or active matrix organic light-emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode ( Manufacturing of flex light-emitting diodes (FLED), miniled, microled, micro-oled, quantum dot light emitting diodes (QLED), etc. In some embodiments, the electronic device may include 1 or N display screens 194, where N is a positive integer greater than 1.
在本申请实施例中,终端100追踪主角并确定主角的特写图像,以及显示图1A-图1M、图2A-图2D、图3A-图3C、图4A-图4H、图5A-图5E所示的用户界面的能力,依赖于上述GPU,显示屏194,以及应用处理器提供的显示功能。In the embodiment of the present application, the terminal 100 tracks the protagonist and determines the close-up image of the protagonist, and displays the figures shown in Figures 1A-1M, 2A-2D, 3A-3C, 4A-4H, and 5A-5E. The ability to display the user interface depends on the display functions provided by the above-mentioned GPU, display screen 194, and application processor.
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传 递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193. For example, when taking a photo, you open the shutter and light is transmitted through the lens. Passed to the camera photosensitive element, the optical signal is converted into an electrical signal. The camera photosensitive element passes the electrical signal to the ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端100可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object passes through the lens to produce an optical image that is projected onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other format image signals. In some embodiments, the terminal 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. Terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transmission mode between neurons in the human brain, it can quickly process input information and can continuously learn by itself. The NPU can realize intelligent cognitive applications of the terminal 100, such as image recognition, face recognition, speech recognition, text understanding, etc.
在本申请实施例中,终端100通过ISP,摄像头193提供的拍摄能力采集原始图像,通过视频编解码器,GPU提供的图像计算与处理能力,执行追踪主角并确定主角的特写图像等计算处理。其中,终端100可通过NPU提供的计算处理能力,实施人脸识别、人体识别、重识别(ReID)等神经网络算法。In the embodiment of the present application, the terminal 100 collects original images through the ISP and the shooting capabilities provided by the camera 193, and uses the video codec and the image computing and processing capabilities provided by the GPU to perform computing processing such as tracking the protagonist and determining the close-up image of the protagonist. Among them, the terminal 100 can implement neural network algorithms such as face recognition, human body recognition, and re-identification (ReID) through the computing processing capabilities provided by the NPU.
内部存储器129可以包括一个或多个随机存取存储器(random access memory,RAM)和一个或多个非易失性存储器(non-volatile memory,NVM)。Internal memory 129 may include one or more random access memories (RAM) and one or more non-volatile memories (NVM).
随机存取存储器可以包括静态随机存储器(static random-access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)、同步动态随机存储器(synchronous dynamic random access memory,SDRAM)、双倍资料率同步动态随机存取存储器(double data rate synchronous dynamic random access memory,DDR SDRAM,例如第五代DDR SDRAM一般称为DDR5SDRAM)等。Random access memory can include static random-access memory (SRAM), dynamic random-access memory (DRAM), synchronous dynamic random-access memory (SDRAM), double data rate synchronous Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, such as the fifth generation DDR SDRAM is generally called DDR5SDRAM), etc.
非易失性存储器可以包括磁盘存储器件、快闪存储器(flash memory)。快闪存储器按照运作原理划分可以包括NOR FLASH、NAND FLASH、3D NAND FLASH等,按照存储单元电位阶数划分可以包括单阶存储单元(single-level cell,SLC)、多阶存储单元(multi-level cell,MLC)、三阶储存单元(triple-level cell,TLC)、四阶储存单元(quad-level cell,QLC)等,按照存储规范划分可以包括通用闪存存储(英文:universal flash storage,UFS)、嵌入式多媒体存储卡(embedded multi media Card,eMMC)等。Non-volatile memory can include disk storage devices and flash memory. Flash memory can be divided according to the operating principle to include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. According to the storage unit potential level, it can include single-level storage cells (single-level cell, SLC), multi-level storage cells (multi-level cell, MLC), third-level storage unit (triple-level cell, TLC), fourth-level storage unit (quad-level cell, QLC), etc., which can include universal flash storage (English: universal flash storage, UFS) according to storage specifications. , embedded multi media card (embedded multi media Card, eMMC), etc.
随机存取存储器可以由处理器119直接进行读写,可以用于存储操作系统或其他正在 运行中的程序的可执行程序(例如机器指令),还可以用于存储用户及应用程序的数据等。非易失性存储器也可以存储可执行程序和存储用户及应用程序的数据等,可以提前加载到随机存取存储器中,用于处理器119直接进行读写。The random access memory can be directly read and written by the processor 119 and can be used to store the operating system or other ongoing The executable program (such as machine instructions) of a running program can also be used to store user and application data, etc. The non-volatile memory can also store executable programs and user and application data, etc., and can be loaded into the random access memory in advance for direct reading and writing by the processor 119.
在本申请实施例中,实现本申请实施例所述的视频编辑方法的代码可存储在非易失性存储器上。在运行相机应用时,终端100可将非易失性存储器中存储的可执行代码加载到随机存取存储器。In the embodiment of the present application, the code for implementing the video editing method described in the embodiment of the present application may be stored in a non-volatile memory. When running the camera application, the terminal 100 may load the executable code stored in the non-volatile memory into the random access memory.
外部存储器接口120可以用于连接外部的非易失性存储器,实现扩展终端100的存储能力。外部的非易失性存储器通过外部存储器接口120与处理器119通信,实现数据存储功能。The external memory interface 120 can be used to connect an external non-volatile memory to expand the storage capability of the terminal 100 . The external non-volatile memory communicates with the processor 119 through the external memory interface 120 to implement the data storage function.
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。耳机接口170D用于连接有线耳机。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call. Receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear. Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. The headphone interface 170D is used to connect wired headphones.
在本申请实施例中,终端100在启用摄像头采集图像的过程中,可以同时启用麦克风170C采集声音信号,并将声音信号转换为电信号存储下来。这样,用户可以得到有声的视频。In this embodiment of the present application, during the process of enabling the camera to collect images, the terminal 100 can simultaneously enable the microphone 170C to collect sound signals, and convert the sound signals into electrical signals to store them. In this way, users can get videos with sound.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。The pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals. In some embodiments, pressure sensor 180A may be disposed on display screen 194 .
陀螺仪传感器180B可以用于确定终端100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定终端100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测终端100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端100的抖动,实现防抖。The gyro sensor 180B may be used to determine the movement posture of the terminal 100 . In some embodiments, the angular velocity of terminal 100 about three axes (ie, x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the angle at which the terminal 100 shakes, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shake of the terminal 100 through reverse movement to achieve anti-shake.
气压传感器180C用于测量气压。在一些实施例中,终端100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。磁传感器180D包括霍尔传感器。终端100可以利用磁传感器180D检测翻盖皮套的开合。加速度传感器180E可检测终端100在各个方向上(一般为三轴)加速度的大小。当终端100静止时可检测出重力的大小及方向。距离传感器180F,用于测量距离。终端100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,终端100可以利用距离传感器180F测距以实现快速对焦。接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端100通过发光二极管向外发射红外光。终端100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端100附近有物体。当检测到不充分的反射光时,终端100可以确定终端100附近没有物体。环境光传感器180L用于感知环境光亮度。终端100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光 传感器180L也可用于拍照时自动调节白平衡。指纹传感器180H用于采集指纹。终端100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。温度传感器180J用于检测温度。在一些实施例中,终端100利用温度传感器180J检测的温度,执行温度处理策略。Air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation. Magnetic sensor 180D includes a Hall sensor. The terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip cover. The acceleration sensor 180E can detect the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. Distance sensor 180F for measuring distance. The terminal 100 can measure distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 can use the distance sensor 180F to measure distance to achieve fast focusing. Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 100 emits infrared light through a light emitting diode. The terminal 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100 . When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100 . The ambient light sensor 180L is used to sense ambient light brightness. The terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. ambient light Sensor 180L can also be used to automatically adjust white balance when taking pictures. Fingerprint sensor 180H is used to collect fingerprints. The terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photography, fingerprint answering incoming calls, etc. Temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 uses the temperature detected by the temperature sensor 180J to execute the temperature processing policy.
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also known as "touch device". The touch sensor 180K can be disposed on the display screen 194. The touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K. The touch sensor can pass the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal 100 in a position different from that of the display screen 194 .
在本申请实施例中,终端100可利用触摸传感器180K检测用户作用于显示屏194上的点击、滑动等操作,以实现图1A-图1M、图2A-图2D、图4A-图4H、图5A-图5E所示的视频编辑方法。In the embodiment of the present application, the terminal 100 can use the touch sensor 180K to detect the user's click, slide and other operations on the display screen 194 to implement FIGS. 1A-1M, 2A-2D, 4A-4H, and FIG. 5A - The video editing method shown in Figure 5E.
骨传导传感器180M可以获取振动信号。按键190包括开机键,音量键等。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。马达199可以产生振动提示。马达199可以用于来电振动提示,也可以用于触摸振动反馈。指示器198可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。Bone conduction sensor 180M can acquire vibration signals. The buttons 190 include a power button, a volume button, etc. The terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100. The motor 199 can generate vibration prompts. The motor 199 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback. The indicator 198 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc. The SIM card interface 195 is used to connect a SIM card. The terminal 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
本申请的说明书和权利要求书及附图中的术语“用户界面(user interface,UI)”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。应用程序的用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在终端设备上经过解析,渲染,最终呈现为用户可以识别的内容,比如图片、文字、按钮等控件。控件(control)也称为部件(widget),是用户界面的基本元素,典型的控件有工具栏(toolbar)、菜单栏(menu bar)、文本框(text box)、按钮(button)、滚动条(scrollbar)、图片和文本。界面中的控件的属性和内容是通过标签或者节点来定义的,比如XML通过<Textview>、<ImgView>、<VideoView>等节点来规定界面所包含的控件。一个节点对应界面中一个控件或属性,节点经过解析和渲染之后呈现为用户可视的内容。此外,很多应用程序,比如混合应用(hybrid application)的界面中通常还包含有网页。网页,也称为页面,可以理解为内嵌在应用程序界面中的一个特殊的控件,网页是通过特定计算机语言编写的源代码,例如超文本标记语言(hyper text markup language,GTML),层叠样式表(cascading style sheets,CSS),java脚本(JavaScript,JS)等,网页源代码可以由浏览器或与浏览器功能类似的网页显示组件加载和显示为用户可识别的内容。网页所包含的具体内容也是通过网页源代码中的标签或者节点来定义的,比如GTML通过<p>、<img>、<video>、<canvas>来定义网页的元素和属性。The term "user interface (UI)" in the description, claims and drawings of this application is a media interface for interaction and information exchange between an application or operating system and a user, which implements the internal form of information. Conversion to and from a user-acceptable form. The user interface of an application is source code written in specific computer languages such as Java and extensible markup language (XML). The interface source code is parsed and rendered on the terminal device, and finally presented as content that the user can recognize. Such as pictures, text, buttons and other controls. Control, also called widget, is the basic element of user interface. Typical controls include toolbar, menu bar, text box, button, and scroll bar. (scrollbar), images and text. The properties and contents of controls in the interface are defined through tags or nodes. For example, XML specifies the controls contained in the interface through nodes such as <Textview>, <ImgView>, and <VideoView>. A node corresponds to a control or property in the interface. After parsing and rendering, the node is rendered into user-visible content. In addition, many applications, such as hybrid applications, often include web pages in their interfaces. A web page, also known as a page, can be understood as a special control embedded in an application interface. A web page is source code written in a specific computer language, such as hypertext markup language (GTML), cascading styles Tables (cascading style sheets, CSS), java scripts (JavaScript, JS), etc., web page source code can be loaded and displayed as user-recognizable content by a browser or a web page display component with functions similar to the browser. The specific content contained in the web page is also defined through tags or nodes in the web page source code. For example, GTML defines the elements and attributes of the web page through <p>, <img>, <video>, and <canvas>.
用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的一个图 标、窗口、控件等界面元素,其中控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。The commonly used form of user interface is graphical user interface (GUI), which refers to a user interface related to computer operations that is displayed graphically. It can be an image displayed on the display of an electronic device Interface elements such as icons, windows, and controls, where controls can include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, and widgets.
在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。As used in the specification and appended claims of this application, the singular expressions "a," "an," "the," "above," "the" and "the" are intended to also include Plural expressions unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used in this application refers to and includes any and all possible combinations of one or more of the listed items. As used in the above embodiments, the term "when" may be interpreted to mean "if..." or "after" or "in response to determining..." or "in response to detecting..." depending on the context. Similarly, depending on the context, the phrase "when determining..." or "if (stated condition or event) is detected" may be interpreted to mean "if it is determined..." or "in response to determining..." or "on detecting (stated condition or event)” or “in response to detecting (stated condition or event)”.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。 Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments are implemented. This process can be completed by instructing relevant hardware through a computer program. The program can be stored in a computer-readable storage medium. When the program is executed, , may include the processes of the above method embodiments. The aforementioned storage media include: ROM, random access memory (RAM), magnetic disks, optical disks and other media that can store program codes.

Claims (21)

  1. 一种视频编辑方法,应用于电子设备,其特征在于,所述方法包括:A video editing method, applied to electronic devices, characterized in that the method includes:
    在第一界面中显示第一图像以及所述第一图像关联的一个或多个标记;所述第一图像包括一个或多个对象,所述第一图像关联的一个或多个标记分别与所述第一图像中的一个或多个对象对应;所述第一图像为所述电子设备的摄像头当前采集的图像,或所述电子设备存储的第一视频中的一帧图像;A first image and one or more markers associated with the first image are displayed in the first interface; the first image includes one or more objects, and the one or more markers associated with the first image are respectively related to the Corresponding to one or more objects in the first image; the first image is an image currently collected by the camera of the electronic device, or a frame of image in the first video stored by the electronic device;
    检测到作用于第一标记的第一操作;detecting a first operation acting on the first marker;
    响应于所述第一操作,确定第一对象为主角,获取以所述主角为中心的特写图像;所述第一图像关联的一个或多个标记包括所述第一标记,所述第一图像中的一个或多个对象包括所述第一对象,所述第一标记对应所述第一对象;In response to the first operation, the first object is determined to be the protagonist, and a close-up image centered on the protagonist is obtained; one or more markers associated with the first image include the first marker, and the first image One or more objects in include the first object, and the first mark corresponds to the first object;
    基于以所述主角为中心的特写图像,生成以所述主角为中心的第二视频。Based on the close-up image centered on the protagonist, a second video centered on the protagonist is generated.
  2. 根据权利要求1所述的方法,其特征在于,在确定第一对象为主角之后,还包括:The method according to claim 1, characterized in that, after determining the first object as the protagonist, further comprising:
    在所述第一界面中显示第二图像和所述第二图像关联的一个或多个标记,所述第二图像包括一个或多个对象,所述第二图像关联的一个或多个标记分别与所述第二图像中的一个或多个对象对应;所述第二图像为所述电子设备的摄像头采集的所述第一图像之后的图像,或所述第一视频中所述第一图像之后的一帧图像;A second image and one or more markers associated with the second image are displayed in the first interface, the second image includes one or more objects, and the one or more markers associated with the second image are respectively Corresponding to one or more objects in the second image; the second image is an image after the first image collected by the camera of the electronic device, or the first image in the first video The next frame of image;
    检测到作用于第二标记的第五操作;detecting a fifth operation acting on the second marker;
    响应于所述第五操作,将所述主角切换为第二对象,所述第二图像关联的一个或多个标记包括所述第二标记,所述第二图像中的一个或多个对象包括所述第二对象,所述第二标记对应所述第二对象;In response to the fifth operation, the protagonist is switched to a second object, one or more markers associated with the second image include the second marker, and one or more objects in the second image include The second object, the second mark corresponds to the second object;
    所述获取以所述主角为中心的特写图像,包括:根据所述第一图像至所述第二图像之间包括所述第一对象的图像,生成以所述第一对象为中心的特写图像,根据所述第二图像及其之后的图像,生成以所述第二对象为中心的特写图像;Obtaining a close-up image centered on the protagonist includes: generating a close-up image centered on the first object based on an image including the first object between the first image and the second image. , generate a close-up image centered on the second object based on the second image and subsequent images;
    所述第二视频包括第一子视频和第二子视频,所述第一子视频是基于以所述第一对象为中心的特写图像生成的视频,所述第二子视频是基于以所述第二对象为中心的特写图像生成的视频。The second video includes a first sub-video and a second sub-video. The first sub-video is a video generated based on a close-up image centered on the first object. The second sub-video is based on a close-up image centered on the first object. Video generated from a close-up image centered on the second object.
  3. 根据权利要求1所述的方法,其特征在于,所述获取以所述主角为中心的特写图像,具体为:The method according to claim 1, characterized in that the obtaining a close-up image centered on the protagonist is specifically:
    根据所述第一视频中的所述第一图像至最后一帧图像中包括所述第一对象的图像,生成以所述第一对象为中心的特写图像。A close-up image centered on the first object is generated according to an image including the first object from the first image to the last frame image in the first video.
  4. 根据权利要求1所述的方法,其特征在于,当所述第二图像为第一视频中所述第一图像之后的一帧图像时,在第一界面显示第一图像以及所述第一图像关联的一个或多个标记之前,所述方法还包括:The method according to claim 1, characterized in that when the second image is a frame of image after the first image in the first video, the first image and the first image are displayed on the first interface Before associating the one or more markers, the method further includes:
    显示第一视频的缩略图; Display the thumbnail of the first video;
    检测到作用于所述第一视频的缩略图的第二操作;detecting a second operation acting on the thumbnail of the first video;
    所述在第一界面显示第一图像以及所述第一图像关联的一个或多个标记,包括:Displaying the first image and one or more marks associated with the first image on the first interface includes:
    响应于所述第二操作,在所述第一界面显示第一视频的第一帧图像,以及与所述第一帧图像中一个或多个对象对应的一个或多个标记,所述第一图像为所述第一帧图像。In response to the second operation, the first frame of the first video is displayed on the first interface, and one or more markers corresponding to one or more objects in the first frame of the image, the first The image is the first frame image.
  5. 根据权利要求1所述的方法,其特征在于,当所述第二图像为第一视频中所述第一图像之后的一帧图像时,在第一界面显示第一图像以及所述第一图像关联的一个或多个标记之前,所述方法还包括:The method according to claim 1, characterized in that when the second image is a frame of image after the first image in the first video, the first image and the first image are displayed on the first interface Before associating the one or more markers, the method further includes:
    在所述第一界面显示第一视频的第一帧图像和第一控件;Display the first frame image of the first video and the first control on the first interface;
    检测到作用于所述第一控件的第三操作;detecting a third operation acting on the first control;
    响应于所述第三操作,播放所述第一视频;In response to the third operation, playing the first video;
    在第一界面显示第一图像以及所述第一图像关联的一个或多个标记,包括:Displaying the first image and one or more marks associated with the first image on the first interface includes:
    当所述第一视频播放到第M帧图像时,在所述第一界面显示所述第M帧图像,以及所述第M帧图像关联的一个或多个标记。When the first video is played to the M-th frame image, the M-th frame image and one or more markers associated with the M-th frame image are displayed on the first interface.
  6. 根据权利要求5所述的方法,其特征在于,所述当所述第一视频播放到第M帧图像时,在所述第一界面显示所述第M帧图像,以及所述第M帧图像关联的一个或多个标记,包括:The method according to claim 5, characterized in that when the first video is played to the Mth frame image, the Mth frame image is displayed on the first interface, and the Mth frame image One or more associated tags, including:
    当所述第一视频播放到第M帧图像时,检测到作用于所述第一控件的第四操作;When the first video is played to the M-th frame image, a fourth operation acting on the first control is detected;
    响应于所述第四操作,暂停播放所述第一视频,显示当前播放到的第M帧图像;In response to the fourth operation, pause the playback of the first video and display the Mth frame image currently played;
    响应于所述暂停播放的操作,在所述第M帧图像上显示所述第M帧图像关联的一个或多个标记。In response to the operation of pausing playback, one or more markers associated with the M-th frame image are displayed on the M-th frame image.
  7. 根据权利要求1或2所述的方法,其特征在于,所述第一界面还包括第二控件,所述基于以所述主角为中心的特写图像,生成以所述主角为中心的第二视频,包括:The method according to claim 1 or 2, characterized in that the first interface further includes a second control that generates a second video centered on the protagonist based on the close-up image centered on the protagonist. ,include:
    检测到作用于所述第二控件的第六操作;detecting a sixth operation acting on the second control;
    响应于所述第六操作,基于以所述主角为中心的特写图像,生成以所述主角为中心的第二视频。In response to the sixth operation, a second video centered on the protagonist is generated based on the close-up image centered on the protagonist.
  8. 根据权利要求7所述的方法,其特征在于,当所述第一图像为所述电子设备的摄像头当前采集的图像时,所述第二控件为用于停止录像的控件。The method of claim 7, wherein when the first image is an image currently collected by a camera of the electronic device, the second control is a control for stopping recording.
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:响应于所述第六操作,摄像头停止采集图像,基于摄像头已采集的图像生成并保存原始视频。The method of claim 8, further comprising: in response to the sixth operation, the camera stops collecting images, and the original video is generated and saved based on the images collected by the camera.
  10. 根据权利要求8所述的方法,其特征在于,在确定第一对象为主角之后,所述方法还包括:显示第一窗口,在所述第一窗口中显示所述以所述主角为中心的特写图像。 The method according to claim 8, characterized in that, after determining that the first object is the protagonist, the method further includes: displaying a first window, and displaying the first object centered on the protagonist in the first window. Close-up image.
  11. 根据权利要求1或2所述的方法,其特征在于,当所述第一图像为所述电子设备的摄像头当前采集的图像时,所述方法还包括:检测到第一触发条件,所述第一触发条件为所述第一图像之后的连续的Y帧图像中不包括所述主角;The method according to claim 1 or 2, characterized in that when the first image is an image currently collected by the camera of the electronic device, the method further includes: detecting a first trigger condition, and the third A trigger condition is that the consecutive Y frame images after the first image do not include the protagonist;
    所述基于以所述主角为中心的特写图像,生成以所述主角为中心的第二视频,具体为:The second video centered on the protagonist is generated based on the close-up image centered on the protagonist, specifically:
    响应于所述第一触发条件,基于以所述主角为中心的特写图像,生成以所述主角为中心的第二视频。In response to the first trigger condition, a second video centered on the protagonist is generated based on the close-up image centered on the protagonist.
  12. 根据权利要求2所述的方法,其特征在于,所述根据所述第一图像至所述第二图像之间包括所述第一对象的图像,生成以所述第一对象为中心的特写图像,包括:The method of claim 2, wherein the close-up image centered on the first object is generated based on the image including the first object between the first image and the second image. ,include:
    从所述第一图像中获取以所述第一对象为中心的第一特写图像;Obtain a first close-up image centered on the first object from the first image;
    从所述第三图像中获取以所述第一对象为中心的第三特写图像;所述第三图像是所述第一图像之后、所述第二图像之前的图像;所述第二视频包括所述第一特写图像和所述第二特写图像。A third close-up image centered on the first object is obtained from the third image; the third image is an image after the first image and before the second image; the second video includes The first close-up image and the second close-up image.
  13. 根据权利要求12所述的方法,其特征在于,在从所述第三图像中获取以所述第一对象为中心的第三特写图像之前,所述方法还包括:The method of claim 12, wherein before obtaining a third close-up image centered on the first object from the third image, the method further includes:
    确定所述第三图像中是否包括所述第一对象;Determine whether the first object is included in the third image;
    所述从所述第三图像中获取以所述第一对象为中心的第三特写图像,具体为:The third close-up image centered on the first object is obtained from the third image, specifically:
    当所述第三图像中包括所述第一对象时,从所述第三图像中获取以所述第一对象为中心的第三特写图像。When the first object is included in the third image, a third close-up image centered on the first object is obtained from the third image.
  14. 根据权利要求13所述的方法,其特征在于,所述确定所述第三图像中包括所述第一对象,包括:The method of claim 13, wherein determining that the first object is included in the third image includes:
    利用人体检测算法识别所述第三图像中的人体图像区域;Using a human body detection algorithm to identify the human body image area in the third image;
    当所述第三图像中的人体图像区域不重叠时,计算所述第三图像中的各个人体图像区域与所述第一图像中所述主角的人体图像区域的交并比IoU距离;确定所述IoU距离最小且满足IoU距离阈值的第一人体图像区域;所述第一人体图像区域对应的对象为所述主角;When the human body image areas in the third image do not overlap, calculate the intersection-over-union ratio IoU distance between each human body image area in the third image and the human body image area of the protagonist in the first image; determine the The first human body image area with the smallest IoU distance and satisfying the IoU distance threshold; the object corresponding to the first human body image area is the protagonist;
    当所述第三图像中的人体图像区域重叠时,计算所述第三图像中的各个人体图像区域与所述第一图像中所述主角的人体图像区域的IoU距离和重定位ReID距离;确定所述IoU距离与所述ReID距离的和最小且满足IoU+ReID距离阈值的第一人体图像区域;所述第一人体图像区域对应的对象为所述主角。When the human body image areas in the third image overlap, calculate the IoU distance and relocation ReID distance between each human body image area in the third image and the human body image area of the protagonist in the first image; determine The first human body image area where the sum of the IoU distance and the ReID distance is the smallest and satisfies the IoU+ReID distance threshold; the object corresponding to the first human body image area is the protagonist.
  15. 根据权利要求14所述的方法,其特征在于,所述从所述第三图像中获取以所述主角的为中心的第三特写图像,具体包括:基于所述第一人体图像区域确定包括所述第一人体图像区域的所述第三特写图像。The method according to claim 14, wherein said obtaining a third close-up image centered on said protagonist from said third image specifically includes: determining based on said first human body image area including the The third close-up image of the first human body image area.
  16. 根据权利要求15所述的方法,其特征在于,所述基于所述第一人体图像区域确定包括所述第一人体图像区域的所述第三特写图像,具体包括: The method according to claim 15, wherein determining the third close-up image including the first human body image area based on the first human body image area specifically includes:
    根据所述第一人体图像区域确定第一缩放比;Determine a first scaling ratio according to the first human body image area;
    基于所述第一缩放比确定所述第三特写图像的尺寸。The size of the third close-up image is determined based on the first scaling ratio.
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述第一人体图像区域确定第一缩放比,具体包括:根据所述第三图像中的最大的人体图像区域的尺寸和所述第一人体图像区域的尺寸,确定所述第一缩放比。The method of claim 16, wherein determining the first scaling ratio based on the first human body image area specifically includes: based on the size of the largest human body image area in the third image and the The size of the first human body image area determines the first scaling ratio.
  18. 根据权利要求17所述的方法,其特征在于,所述基于所述第一缩放比确定所述第三特写图像的尺寸,具体包括:根据所述第一缩放比、预设的所述第二视频的尺寸,确定所述第三特写图像的尺寸。The method of claim 17, wherein determining the size of the third close-up image based on the first scaling ratio specifically includes: based on the first scaling ratio, the preset second The size of the video determines the size of the third close-up image.
  19. 根据权利要求18所述的方法,其特征在于,所述第三特写图像的宽高比与预设的所述第二视频的宽高比相同。The method of claim 18, wherein the aspect ratio of the third close-up image is the same as the preset aspect ratio of the second video.
  20. 一种电子设备,其特征在于,包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得执行如权利要求1-19中任一项所述的方法。An electronic device, characterized by comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, and the one or more memories For storing computer program code, the computer program code includes computer instructions that, when executed by the one or more processors, cause the method of any one of claims 1-19 to be performed.
  21. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得执行如权利要求1-19中任一项所述的方法。 A computer-readable storage medium comprising instructions, characterized in that when the instructions are run on an electronic device, the method according to any one of claims 1-19 is executed.
PCT/CN2023/089100 2022-05-30 2023-04-19 Video editing method and electronic device WO2023231622A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210603653.3A CN116055861B (en) 2022-05-30 2022-05-30 Video editing method and electronic equipment
CN202210603653.3 2022-05-30

Publications (2)

Publication Number Publication Date
WO2023231622A1 true WO2023231622A1 (en) 2023-12-07
WO2023231622A9 WO2023231622A9 (en) 2024-02-01

Family

ID=86113880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089100 WO2023231622A1 (en) 2022-05-30 2023-04-19 Video editing method and electronic device

Country Status (2)

Country Link
CN (1) CN116055861B (en)
WO (1) WO2023231622A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010232814A (en) * 2009-03-26 2010-10-14 Nikon Corp Video editing program, and video editing device
US20120062732A1 (en) * 2010-09-10 2012-03-15 Videoiq, Inc. Video system with intelligent visual display
CN109922363A (en) * 2019-03-15 2019-06-21 青岛海信电器股份有限公司 A kind of graphical user interface method and display equipment of display screen shot
CN110301136A (en) * 2017-02-17 2019-10-01 Vid拓展公司 The system and method for selective object of interest scaling are carried out in streamed video
CN111757138A (en) * 2020-07-02 2020-10-09 广州博冠光电科技股份有限公司 Close-up display method and device based on single-shot live video
CN112954219A (en) * 2019-03-18 2021-06-11 荣耀终端有限公司 Multi-channel video recording method and equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1713254A3 (en) * 2005-04-14 2007-06-13 THOMSON Licensing Camera system with PIP in viewfinder
JP2007086269A (en) * 2005-09-21 2007-04-05 Hitachi Kokusai Electric Inc Camera system and focal length adjusting method of zoom lens optical system of camera system
US8390667B2 (en) * 2008-04-15 2013-03-05 Cisco Technology, Inc. Pop-up PIP for people not in picture
KR20110112686A (en) * 2010-04-07 2011-10-13 (주)조아영상기술 Video conference apparatus and method
US9973722B2 (en) * 2013-08-27 2018-05-15 Qualcomm Incorporated Systems, devices and methods for displaying pictures in a picture
US9865062B2 (en) * 2016-02-12 2018-01-09 Qualcomm Incorporated Systems and methods for determining a region in an image
CN105913453A (en) * 2016-04-01 2016-08-31 海信集团有限公司 Target tracking method and target tracking device
US20200351543A1 (en) * 2017-08-30 2020-11-05 Vid Scale, Inc. Tracked video zooming
CN111093026B (en) * 2019-12-30 2022-05-06 维沃移动通信(杭州)有限公司 Video processing method, electronic device and computer-readable storage medium
CN111401238B (en) * 2020-03-16 2023-04-28 湖南快乐阳光互动娱乐传媒有限公司 Method and device for detecting character close-up fragments in video
CN114125179B (en) * 2021-12-07 2024-04-05 维沃移动通信有限公司 Shooting method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010232814A (en) * 2009-03-26 2010-10-14 Nikon Corp Video editing program, and video editing device
US20120062732A1 (en) * 2010-09-10 2012-03-15 Videoiq, Inc. Video system with intelligent visual display
CN110301136A (en) * 2017-02-17 2019-10-01 Vid拓展公司 The system and method for selective object of interest scaling are carried out in streamed video
CN109922363A (en) * 2019-03-15 2019-06-21 青岛海信电器股份有限公司 A kind of graphical user interface method and display equipment of display screen shot
CN112954219A (en) * 2019-03-18 2021-06-11 荣耀终端有限公司 Multi-channel video recording method and equipment
CN111757138A (en) * 2020-07-02 2020-10-09 广州博冠光电科技股份有限公司 Close-up display method and device based on single-shot live video

Also Published As

Publication number Publication date
WO2023231622A9 (en) 2024-02-01
CN116055861A (en) 2023-05-02
CN116055861B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
US11856286B2 (en) Video shooting method and electronic device
WO2021147482A1 (en) Telephoto photographing method and electronic device
CN111526314B (en) Video shooting method and electronic equipment
KR20220080195A (en) Filming method and electronic device
US11914850B2 (en) User profile picture generation method and electronic device
CN110471606B (en) Input method and electronic equipment
WO2022262475A1 (en) Image capture method, graphical user interface, and electronic device
EP4109879A1 (en) Image color retention method and device
WO2022012418A1 (en) Photographing method and electronic device
WO2024007715A1 (en) Photographing method and related device
WO2023093169A1 (en) Photographing method and electronic device
CN115689963A (en) Image processing method and electronic equipment
CN115442509B (en) Shooting method, user interface and electronic equipment
WO2023231697A1 (en) Photographing method and related device
EP4284009A1 (en) Method for acquiring image, and electronic device
CN115883958A (en) Portrait shooting method
WO2023231622A1 (en) Video editing method and electronic device
WO2023231595A9 (en) Filming method and electronic device
WO2023231616A1 (en) Photographing method and electronic device
US20240061549A1 (en) Application switching method, graphical interface, and related apparatus
WO2023231696A1 (en) Photographing method and related device
WO2022228010A1 (en) Method for generating cover, and electronic device
CN117221743A (en) Shooting method and electronic equipment
CN117459825A (en) Shooting method and electronic equipment
CN117119285A (en) Shooting method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814808

Country of ref document: EP

Kind code of ref document: A1