WO2023065885A1 - 一种视频处理方法和电子设备 - Google Patents

一种视频处理方法和电子设备 Download PDF

Info

Publication number
WO2023065885A1
WO2023065885A1 PCT/CN2022/118147 CN2022118147W WO2023065885A1 WO 2023065885 A1 WO2023065885 A1 WO 2023065885A1 CN 2022118147 W CN2022118147 W CN 2022118147W WO 2023065885 A1 WO2023065885 A1 WO 2023065885A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
photo
interface
user
wonderful
Prior art date
Application number
PCT/CN2022/118147
Other languages
English (en)
French (fr)
Inventor
侯伟龙
董振
朱世宇
邵涛
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210114568.0A external-priority patent/CN116033261B/zh
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Priority to EP22826792.8A priority Critical patent/EP4199492A4/en
Publication of WO2023065885A1 publication Critical patent/WO2023065885A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording

Definitions

  • the present application relates to the field of electronic equipment, and more specifically, relates to a video processing method and electronic equipment.
  • the present application provides a video processing method, an electronic device, a computer-readable storage medium, and a computer program product, which can determine a high-scoring highlight moment, thereby obtaining a photo of a highlight moment with higher image quality, so that users Get high-quality photos and videos of exciting moments while recording, which greatly improves the user experience.
  • a video processing method including:
  • the first interface is the playback interface of the first video
  • the first interface includes a first control and a first area
  • the first area displays the thumbnail of the first photo and the second Thumbnails of photos
  • the first photo is automatically taken at the first moment
  • the second photo is automatically taken at the second moment
  • the recording process of the first video includes the first moment and the the second moment
  • the first video includes a first video clip and a second video clip
  • the first video clip is a first scene
  • the second video clip is a second scene
  • the first photo is the first video clip.
  • a photo in the video segment the second photo is a photo in the second video segment
  • the score of the first photo is greater than a first threshold
  • the score of the second photo is greater than a second threshold;
  • a second interface is displayed, the second interface is a playback interface of a second video, the duration of the second video is shorter than the duration of the first video, and the duration of the second video is The second video includes at least the first photo.
  • the foregoing method may be executed by an electronic device (such as a terminal device) or a chip in an electronic device (such as a chip in a terminal device).
  • an electronic device such as a terminal device
  • a chip in an electronic device such as a chip in a terminal device.
  • the first video segment further includes a third photo
  • the third photo is automatically taken at a third moment
  • the score of the third photo is greater than the first threshold
  • the above-mentioned first threshold can be regarded as an absolute threshold for evaluating photos of exciting moments in the first video segment. In other words, if a plurality of photos of wonderful moments are determined in the first video clip, then the photos of these wonderful moments should all satisfy the first threshold. Therefore, by introducing the first threshold, photos of multiple exciting moments can be obtained more accurately.
  • a transition occurs between the first video segment and the second video segment.
  • the first photo is of a first type of action
  • the second photo is of a second type of action
  • the first photo is a landscape
  • the second photo is a person
  • the method further includes:
  • the value of the third threshold is updated to the score of the third photo.
  • the above-mentioned third threshold is a relative threshold.
  • the third threshold is updated to the score of the fourth photo, so that the relative threshold always maintains the latest highest value.
  • the second video segment further includes the fifth photo, and the fifth photo is automatically taken when a transition occurs.
  • automatic photographing may be triggered first to obtain a transition frame (such as a fifth photo).
  • a transition frame such as a fifth photo. The purpose of doing this is to ensure that at least one photo can be output in the second video segment, avoiding the situation that no photo is output in the second video segment, or in other words, it can be guaranteed that at least one photo can be output in a transition segment.
  • the first area further includes a thumbnail of the fifth photo.
  • the score of the fifth photo is also greater than the second threshold, the fifth photo may also be determined as a photo of a wonderful moment in the second video clip.
  • the time between the transition and the previous transition is greater than a time threshold.
  • the purpose of setting the time threshold here is to avoid frequent triggering of transition photos, which helps to save terminal power consumption.
  • the third threshold is smaller than the second threshold.
  • the method further includes:
  • the third interface is an interface of a gallery application, and the third interface includes a second control;
  • the displaying the first interface includes: displaying the first interface in response to a fourth operation on the second control.
  • the third interface further includes a first prompt window, where the first prompt window is used to prompt the user that the first photo and the second photo have been generated.
  • the brightness of the first prompt window and the brightness of the first area are higher than the areas in the first interface except the first area and the first prompt window brightness.
  • the user When entering the first video in the gallery application for the first time, by highlighting the first prompt window, the user can be guided to view photos in the exciting moment area, and the user's attention to the first prompt window can be drawn to achieve a more eye-catching reminder effect. Improve user experience.
  • the method further includes:
  • the fourth interface includes a preview thumbnail option
  • the displaying a third interface in response to the third operation of the user includes:
  • the third interface is displayed.
  • the fourth interface further includes a second prompt window, and the second prompt window is used to prompt the user that the first photo, the second photo, and the second video.
  • the method before recording the first video, the method further includes:
  • the one-record-multiple-get function is enabled.
  • the first interface further includes a play progress bar, and the play progress bar is used to display a play progress of the first video.
  • the second interface further includes a music control; the method further includes:
  • the user can add music to the second video, which enriches the user experience.
  • the second interface further includes a style control; the method further includes:
  • style can be understood as a filter.
  • the user can select a video style for the second video, which enriches user experience.
  • the gallery application includes a first photo album
  • the first photo album includes the first photo and the second photo.
  • the first photo and the second photo can be saved in the same album for easy viewing by the user.
  • the first photo album further includes a virtual video of the second video.
  • a virtual video refers to a data file that does not actually generate a video file.
  • a virtual video may be XML playback logic.
  • the second interface further includes: a share control or a save control;
  • the video file is stored in the first photo album.
  • the storage space occupied by the video file is larger than the storage space occupied by the virtual video.
  • the second video is only generated when the second video is shared or saved, which can effectively reduce the space occupied by the video on the terminal.
  • the first interface further includes a delete option; the method further includes:
  • a third prompt window is displayed, and the third prompt window is used to prompt the user whether to delete the second video, the first photo, and the second photo.
  • the user when the user deletes the first video, the user may be prompted whether to delete the photo of the highlight moment and the short video of the first video, so as to avoid accidental deletion and improve user experience.
  • the method further includes:
  • the first photo is automatically deleted.
  • the second photo if the operation of viewing the second photo by the user is not received after N days, the second photo will be automatically deleted.
  • the second video further includes the second photo. That is to say, the second video may include photos of all wonderful moments (such as the first photo and the second photo), or may include photos of some wonderful moments (such as the first photo), which is not specifically limited.
  • the first moment is determined based on a first time tag.
  • the first time tag is determined based on first-level information, second-level information, and third-level information, the first-level information is used to characterize the theme or scene of the video, and the second-level information A scene used to represent a video changes, and the third-level information is used to represent a wonderful moment.
  • the second video further includes a nearby image frame of the first photo, and the nearby image frame is determined based on the first time tag;
  • the nearby image frames include the image frames corresponding to the first A moments of the first time stamp and the image frames corresponding to the last B moments of the first time stamp, A is greater than or equal to 1, and B is greater than or equal to 1.
  • the image frame corresponding to the moment when a transition occurs is removed from the second video, and the transition refers to a scene change.
  • the method further includes:
  • the first photo is associated with the second video through the first identifier.
  • a resolution of the first photo is greater than a resolution of an image captured in the first video. Compared with the method of taking a screenshot in the video, the resolution of the image obtained in the embodiment of the present application is better.
  • the method further includes: receiving a photographing request when recording a video, where the photographing request carries a capture mark;
  • the photographing is triggered and a first image is obtained, and the EXIF information corresponding to the first image includes the snapshot mark.
  • the user's manual capture request can also be received during the video recording, so that the user can capture photos of exciting moments based on subjective needs, so as to further improve user experience.
  • an electronic device including a unit for performing any method in the first aspect.
  • the electronic device may be a terminal device, or a chip in the terminal device.
  • the electronic device includes an input unit, a display unit and a processing unit.
  • the processing unit may be a processor
  • the input unit may be a communication interface
  • the display unit may be a graphics processing module and a screen
  • the terminal device may also include a memory, which is used to store computer
  • the program code when the processor executes the computer program code stored in the memory, causes the terminal device to execute any one of the methods in the first aspect.
  • the processing unit may be a logic processing unit inside the chip, the input unit may be an output interface, a pin or a circuit, etc., and the display unit may be a graphics processing unit inside the chip;
  • the chip may also include a memory, which may be a memory in the chip (for example, a register, a cache, etc.), or a memory located outside the chip (for example, a read-only memory, a random access memory, etc.); the memory It is used for storing computer program codes, and when the processor executes the computer program codes stored in the memory, the chip is made to execute any one of the methods in the first aspect.
  • the processing unit is configured to record a first video in response to a first user operation
  • the first interface is the playing interface of the first video
  • the first interface includes a first control and a first area
  • the first area displays the image of the first photo
  • the first photo is automatically taken at the first moment
  • the second photo is automatically taken at the second moment
  • the recording process of the first video includes the The first moment and the second moment
  • the first video includes a first video clip and a second video clip
  • the first video clip is a first scene
  • the second video clip is a second scene
  • the first photo is a photo in the first video clip
  • the second photo is a photo in the second video clip
  • the score of the first photo is greater than a first threshold
  • the score of the second photo is greater than a first threshold.
  • the score is greater than a second threshold
  • the display unit In response to the second operation on the first control, the display unit is called to display a second interface, the second interface is a playback interface of a second video, and the duration of the second video is shorter than that of the first video. duration, the second video includes at least the first photo.
  • the first video segment further includes a third photo
  • the third photo is automatically taken at a third moment
  • the score of the third photo is greater than the first threshold
  • a transition occurs between the first video segment and the second video segment.
  • the first photo is of a first type of action
  • the second photo is of a second type of action
  • the first photo is a landscape
  • the second photo is a person
  • the processing unit is further configured to obtain a score of a fourth photo before automatically taking the first photo, where the score of the fourth photo is less than or equal to the first threshold, and , greater than a third threshold; updating the value of the third threshold to the score of the third photo.
  • the second video segment further includes the fifth photo, and the fifth photo is automatically taken when a transition occurs.
  • the first area further includes a thumbnail of the fifth photo.
  • the time between the transition and the previous transition is greater than a time threshold.
  • the third threshold is smaller than the second threshold.
  • the processing unit is further configured to call the display unit to display a third interface in response to a third operation of the user, the third interface is an interface of a gallery application, and the third interface includes a second control;
  • Calling the display unit by the processing unit to display the first interface specifically includes: calling the display unit to display the first interface in response to a fourth operation on the second control.
  • the third interface further includes a first prompt window, and the first prompt window is used to prompt the user that the first photo and the second photo have been generated.
  • the brightness of the first prompt window and the brightness of the first area are higher than the brightness of areas other than the first area and the first prompt window in the first interface .
  • processing unit is also used for:
  • the display unit In response to a fifth user operation, stop recording the first video, call the display unit to display a fourth interface, the fourth interface includes a preview thumbnail option;
  • the display unit In response to the user's sixth operation on the preview thumbnail option, the display unit is called to display the third interface.
  • the fourth interface further includes a second prompt window, and the second prompt window is used to prompt the user that the first photo, the second photo, and the second video have been generated.
  • the processing unit is further configured to, before recording the first video, enable a one-record-multiple-recording function in response to a seventh operation of the user.
  • the first interface further includes a playback progress bar.
  • the second interface further includes a music control; the processing unit is further configured to call the display unit to display a plurality of different music options in response to the user's eighth operation on the music control .
  • the second interface further includes a style control; the processing unit is further configured to call the display unit to display a plurality of different style options in response to the user's ninth operation on the style control .
  • the gallery application includes a first photo album
  • the first photo album includes the first photo and the second photo.
  • the first photo album further includes a virtual video of the second video.
  • the second interface further includes: a share control or a save control; and the processing unit is further configured to: generate the second A video file of the video; storing the video file in the first photo album.
  • the storage space occupied by the video file is larger than the storage space occupied by the virtual video.
  • the first interface further includes a delete option
  • the processing unit is further configured to: in response to the user's eleventh operation on the deletion option, call the display unit to display a third prompt window, and the third prompt window is used to prompt the user whether to delete the second video and photos of said multiple moments.
  • the processing unit is further configured to automatically delete the first photo if no user operation to view the first photo is received after N days.
  • the second video further includes the second photo.
  • the first moment is determined based on a first time tag.
  • the second video further includes a nearby image frame of the first photo, and the nearby image frame is determined based on the first time tag;
  • the nearby image frames include the image frames corresponding to the first A moments of the first time stamp and the image frames corresponding to the last B moments of the first time stamp, A is greater than or equal to 1, and B is greater than or equal to 1.
  • the first time tag is determined based on first-level information, second-level information, and third-level information
  • the first-level information is used to characterize the theme or scene of the video
  • the second The second-level information is used to represent the scene change of the video
  • the third-level information is used to represent the wonderful moment.
  • the image frame corresponding to the moment when a transition occurs is removed from the second video, and the transition refers to a scene change.
  • the processing unit is further configured to generate a request message in response to the first operation, and the request message includes a first identifier; wherein, the first photo and the second video pass The first identifier is associated.
  • the resolution of the first photo is greater than the resolution of the image captured in the first video.
  • a computer-readable storage medium stores computer program codes, and when the computer program codes are run by an electronic device, the electronic device can execute any of the items in the first aspect. a way.
  • a computer program product comprising: computer program code, when the computer program code is run by an electronic device, the electronic device is made to execute any one of the methods in the first aspect.
  • FIG. 1 is a schematic diagram of a hardware system applicable to an electronic device of the present application
  • Figure 2 is a schematic diagram of an example of opening "one record, many benefits" provided by this application;
  • FIG. 3 is a schematic diagram of another example of opening "one record, many benefits" provided by this application.
  • Fig. 4 is a schematic diagram of another example of opening "one record, many benefits" provided by this application.
  • Fig. 5 is a schematic diagram of a graphical user interface GUI of an example of "one record, multiple results" provided by the present application;
  • FIG. 6 is a schematic diagram of another example of a "one record, multiple results" graphical user interface GUI provided by the present application.
  • FIG. 7 is a schematic diagram of another example of a "one record, multiple results" graphical user interface GUI provided by the present application.
  • Fig. 8 is a schematic diagram of a software system applicable to the electronic device of the present application.
  • Fig. 9 is an example diagram of the decision logic of the LV0-3 layer of the embodiment of the present application.
  • Fig. 10 is an example diagram of acquiring level information of LV0-3 based on data flow
  • Fig. 11 is an example diagram of the camera logic of the embodiment of the present application.
  • Figure 12 is an example diagram of the level information of LV0-3
  • Figures 13 to 16 are schematic diagrams of interfaces at different moments when recording videos provided by this application.
  • Figure 17 is a schematic diagram of time stamps related to the interface at different times when recording a video provided by this application;
  • Fig. 18 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 19 is a schematic diagram of transition frames during fast mirror movement provided by the present application.
  • FIG. 20 is a schematic diagram of an example of MM node work provided by the present application.
  • MM can be the best movement moment, the best expression moment or the best check-in action. It can be understood that this application does not limit the term MM, and MM can also be called a beautiful moment, a magical moment, a wonderful moment, a decisive moment, or the best shot (best shot, BS). In different scenes, wonderful moments can be different types of picture moments.
  • the wonderful moment when recording a football game video, can be the moment when the player’s foot touches the football when shooting or passing the ball, or the moment when the football flies into the goal; when recording a video of a person jumping from the ground, the wonderful moment The moment can be the moment when the character is at the highest point in the air, or the moment when the character is most stretched in the air.
  • MM tag that is, time tag, MM tag is used to indicate the location of the wonderful moment in the recorded video file.
  • TAG time tag
  • MM tag is used to indicate the location of the wonderful moment in the recorded video file.
  • one or more MM tags are included in the video file, and the MM tag can indicate that at the 10th second, the 1st minute and 20th second, etc. of the video file, the corresponding image frame in the video file is a wonderful moment.
  • the MM node is used to analyze the captured video stream, identify or make decisions about the wonderful moments, and automatically trigger the photo taking when the exciting moments are identified.
  • the MM node is also called MM decision engine, BS decision engine, MM decision module, etc. These terms have the functions of the MM node shown above.
  • More than one record it can be understood that when the user uses the camera application to shoot a video, by pressing the "shoot" icon once, he can get a function that includes one or more photos of exciting moments and one or more selected videos.
  • the realization process of one record can be as follows: the MM node automatically recognizes the wonderful moment during the recording process and triggers the capture to get the photo of MM; after the recording is over, the user can recommend the photos of the wonderful moment and the wonderful Short videos (or called featured short videos, or wonderful videos, or featured videos). It is understandable that the duration of the wonderful short video obtained through one record is shorter than the duration of the entire complete video.
  • one-click multi-shot can also have other names, such as one-click multi-shot, one-click multi-shot, one-click movie, one-click blockbuster, AI one-click blockbuster, etc.
  • this application introduces the "one record, multiple" mode, that is, when recording a video in the video recording mode, by analyzing the video stream, the wonderful moments are automatically identified, and when the wonderful moments are identified Automatically trigger the camera to capture the photos of the wonderful moments.
  • the video recording is completed, you can view photos of exciting moments and wonderful short videos in the gallery.
  • the video processing method of the embodiment of the present application obtains higher image quality of exciting moments and better user experience.
  • the video processing method provided in the embodiment of the present application may be applicable to various electronic devices.
  • the electronic device may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, a vehicle electronic device, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device , notebook computer, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector and so on.
  • augmented reality augmented reality
  • VR virtual reality
  • notebook computer ultra-mobile personal computer
  • UMPC ultra-mobile personal computer
  • netbook personal digital assistant
  • PDA personal digital assistant
  • FIG. 1 shows a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application.
  • Fig. 1 shows a hardware system applicable to the electronic equipment of this application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure shown in FIG. 1 does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than those shown in FIG. 1 , or the electronic device 100 may include a combination of some of the components shown in FIG. 1 , or , the electronic device 100 may include subcomponents of some of the components shown in FIG. 1 .
  • the components shown in Fig. 1 can be realized in hardware, software, or a combination of software and hardware.
  • Processor 110 may include one or more processing units.
  • the processor 110 may include at least one of the following processing units: an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor) , ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU).
  • an application processor application processor, AP
  • modem processor graphics processing unit
  • graphics processing unit graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the processor 110 may include at least one of the following interfaces: an inter-integrated circuit (inter-integrated circuit, I2C) interface, an inter-integrated circuit sound (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM interface, USB interface.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM interface USB interface
  • the electronic device 100 can realize the display function through the GPU, the display screen 194 and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • Display 194 may be used to display images or video.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible Light-emitting diode (flex light-emitting diode, FLED), mini light-emitting diode (mini light-emitting diode, Mini LED), micro light-emitting diode (micro light-emitting diode, Micro LED), micro OLED (Micro OLED) or quantum dot light emitting Diodes (quantum dot light emitting diodes, QLED).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED active-matrix organic light-emitting diode
  • FLED flexible Light-emitting diode
  • mini light-emitting diode mini light-emitting diode
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the display screen 194 can be used to display the photos of the MM at the wonderful moments and the selected short videos.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 , and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can optimize the algorithm of image noise, brightness and color, and ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard red green blue (red green blue, RGB), YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the processor 110 may determine the wonderful moment MM in the video stream based on the video stream recorded by the camera 193, and when the MM is determined, call the camera 193 to automatically trigger a photo shoot.
  • ISP and DSP can process the image signal of the wonderful moment MM to obtain the image of the wonderful moment.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3 and MPEG4.
  • NPU is a processor that draws on the structure of biological neural networks. For example, it can quickly process input information by drawing on the transmission mode between neurons in the human brain, and it can also continuously learn by itself. Functions such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition and text understanding.
  • the external memory interface 120 can be used to connect an external memory card, such as a secure digital (secure digital, SD) card, so as to expand the storage capacity of the electronic device 100 .
  • an external memory card such as a secure digital (secure digital, SD) card
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the storage program area can store an operating system and an application program required by at least one function (for example, a sound playing function and an image playing function).
  • the storage data area can store data created during the use of the electronic device 100 (for example, audio data and phonebook).
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, for example: at least one magnetic disk storage device, flash memory device, and universal flash storage (universal flash storage, UFS), etc.
  • the processor 110 executes various processing methods of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • audio functions such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • the touch sensor 180K is also referred to as a touch device.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor 180K may transmit the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 and disposed at a different position from the display screen 194 .
  • Keys 190 include a power key and a volume key.
  • the key 190 can be a mechanical key or a touch key.
  • the electronic device 100 may receive a key input signal, and implement a function related to the key input signal.
  • This application will take the electronic device as a mobile phone and a camera application installed in the mobile phone as an example to introduce the video processing method provided by this application in detail.
  • the user can manually enable or disable the "one record multiple" function provided in the embodiments of the present application.
  • the following describes the entry of the one-record function with reference to FIG. 2 to FIG. 4 .
  • FIG. 2 is a schematic diagram of a graphical user interface (graphical user interface, GUI) of an example of a video processing method provided by an embodiment of the present application.
  • GUI graphical user interface
  • the user can instruct the mobile phone to start the camera application by touching a specific control on the screen of the mobile phone, pressing a specific physical key or combination of keys, inputting voice, gestures in the air, and the like.
  • the mobile phone starts the camera and displays a shooting interface.
  • the screen display system of the mobile phone displays currently output interface content, and the interface content displays a variety of application programs (applications, Apps).
  • the user can click the "camera" application icon 401 on the desktop of the mobile phone to instruct the mobile phone to open the camera application, and the mobile phone displays a shooting interface as shown in (2) in FIG. 2 .
  • the user when the mobile phone is in the locked screen state, the user can also instruct the mobile phone to open the camera application by sliding right (or left) on the mobile phone screen, and the mobile phone can also display the image shown in (2) in Figure 2.
  • Shooting interface when the mobile phone is in the locked screen state, the user can also instruct the mobile phone to open the camera application by sliding right (or left) on the mobile phone screen, and the mobile phone can also display the image shown in (2) in Figure 2. Shooting interface.
  • the user can click the shortcut icon of the "Camera" application on the lock screen interface to instruct the mobile phone to open the camera application, and the mobile phone can also display the shooting interface as shown in (2) in Figure 2.
  • the user can also click the corresponding control to make the mobile phone start the camera application to take pictures.
  • the user can also instruct the mobile phone to open the camera application to take pictures and videos by selecting the control of the camera function.
  • the shooting interface of the camera generally includes a viewfinder frame 402 , a camera control, a video control and other functional controls (such as a portrait function control, a night scene function control or more other controls).
  • the user can start the recording mode by clicking the recording control, and the mobile phone can display the recording interface as shown in (3) in FIG. 2 .
  • the user can enter the setting interface by clicking "Settings", and the mobile phone displays the interface as shown in (4) in Figure 2.
  • the interface shown in (4) in FIG. 2 displays the option 404 of enabling "multiple access for one record", which is used to enable the function of multiple access for one record.
  • the mobile phone when the user turns on this function, the mobile phone will automatically adopt the video processing method provided by the embodiment of the present application when the mobile phone is in the video recording mode, intelligently identify the wonderful moments when recording the video, and automatically generate wonderful photos and short video.
  • the user can also manually turn off the one-record-multiple function in the video recording mode through the option 404 .
  • the setting interface shown in (4) in FIG. 2 may also include a minimum time limit control 405 .
  • the control of the minimum time setting is used to limit the minimum recording time that can enable the one-record multiple-recording function. If the recording time of the video is less than the minimum recording time, the one-recording multiple-recording feature of the video cannot be recalled.
  • the minimum time limit can be set to 15s, and when the user's shooting time is less than 15s, it will not call back a recorded photo.
  • the setting interface shown in (4) in Figure 2 can also include other controls about video recording settings, such as, the setting controls of video resolution, the setting controls of video frame rate, etc., shown in (4) in Figure 2 Controls are exemplary descriptions only.
  • the video resolution setting control shown in (4) in Figure 2 above can be used to select the video resolution. It should be understood that the options for video resolution depend on the specific configuration of the handset. For example, the video resolution can be 3840*2160 (Ultra HD 4K), 1920*1080 (1080p Full HD), 1280*720 (720P HD), etc.
  • the video resolution of the mobile phone can be set to 1920*1080.
  • the video resolution is (1080P) 16:9.
  • the resolution of normal photo taking is 4096*3072.
  • the resolution of the photos of the wonderful moments automatically captured during the recording process is 4096*2304, while the resolution of the image frames captured during the recording process is 1920*1080. Therefore, from the perspective of image resolution, when the embodiment of the present application recognizes a wonderful moment, the resolution of the automatically captured photo of the wonderful moment is obviously better than the resolution of the image frame intercepted during the video recording. In other words, the resolution of photos of highlights is higher than the resolution of photos captured in video by conventional means.
  • the quality of automatically snapped photos of highlights is better than the quality of photos captured in video by conventional means.
  • the quality of the photos of the automatically captured wonderful moments may also depend on other factors, for example, the quality of the photos processed by the camera algorithm in the photo pipeline mode component will be better.
  • the camera algorithm in the photo pipeline involved in the embodiment of the present application will be described in detail in Figure 8 below.
  • the setting interface shown in (4) in FIG. 2 may also include controls related to photographing settings, for example, setting controls for photo ratio, setting controls for taking pictures with gestures, setting controls for capturing smiley faces, and the like.
  • the interface shown in (5) in FIG. 2 shows a picture during the recording process (for example, the picture at the 10th second).
  • the interface shown in (5) in FIG. 2 includes a recording stop control 406 , a recording pause control 407 and a camera key 408 .
  • the user can click the camera button 408 to manually capture a photo.
  • the user can click the button 406 to stop recording at 16 minutes and 15 seconds to end the recording process, and a video with a duration of 16 minutes and 15 seconds can be obtained.
  • the mobile phone in response to detecting that the user clicks on the "more" control in the interface shown in (2) in Figure 3, the mobile phone displays the interface shown in (3) in Figure 3 .
  • the "one record multiple" control 501 can also be displayed in the interface as shown in (2) in Figure 3, that is, in the same column as the camera control and video control, and the user can select the one record multiple mode by sliding the control left and right.
  • (1) in FIG. 3 is the same as the interface shown in (1) in FIG. 2 , and will not be repeated here.
  • the way from (1) in FIG. 3 to (2) in FIG. 3 is also similar to the way from (1) in FIG. 2 to (2) in FIG. 2 , and for the sake of brevity, details are not repeated here.
  • the video recording mode of the mobile phone can be set in the setting menu to the mode of "multiple recordings for one recording".
  • the mobile phone in response to detecting that the user clicks on the control 601 , displays the setting interface 602 shown in (2) in FIG. 4 .
  • the user can set the control 603 in the interface to enter the camera setting interface 604 as shown in (3) in FIG. 4 .
  • a control 605 is displayed in the camera setting interface 604 for enabling the function of multiple recordings. That is to say, when the user turns on this function, the mobile phone will automatically adopt the video processing method provided by the embodiment of the present application when the mobile phone is in the video recording mode, and automatically judge the wonderful moment when recording the video, trigger the snapshot, and automatically save the video under the function of recording multiple times. Photos of wonderful moments and short videos of wonderful moments obtained.
  • the user can also manually turn off the one-record-multiple function in the video recording mode through the control 605 .
  • the mobile phone can select the recording mode by default to enable the "one record, get more" function. This application is not limited.
  • clicking the icon with the user's finger may include touching the icon with the user's finger, or it may also be referred to as touching the icon with the user's finger when the distance between the user's finger and the icon is less than a certain distance (for example, 0.5 mm).
  • the "one record, many gains" function of the mobile phone can be turned on. After the mobile phone turns on the above-mentioned one-record function, the video recorded by the user and the one-record files related to the video can be viewed in the gallery. It will be described below in conjunction with FIG. 5 .
  • Fig. 5 is a schematic diagram of a graphical user interface (graphical user interface, GUI) related to "one record, multiple results" provided by the embodiment of the present application.
  • GUI graphical user interface
  • the user can instruct the mobile phone to open the gallery application by touching a specific control on the screen of the mobile phone, pressing a specific physical key or combination of keys, inputting voice, gestures in the air, and the like.
  • the Gallery app is also known as Albums, Photos, etc.
  • the mobile phone displays a photo interface.
  • the screen display system of the mobile phone displays the currently output interface content, and the interface content displays a variety of application programs App.
  • the user can click the "Gallery" application icon 301 on the desktop of the mobile phone to instruct the mobile phone to open the gallery application, and the mobile phone displays an interface as shown in (2) in FIG. 5 .
  • the photo and the video that the user shoots are displayed in the interface, for example, the video 302 that the user shoots (such as the video of 16 minutes and 15 seconds obtained by (6) in Fig. 2), photo 302, Video 304 (12 seconds long).
  • Photos and videos taken by users can be sorted by shooting time.
  • the videos and photos shown in the interface shown in (2) in Figure 5 are arranged in thumbnails, and the video 302 taken by the user (such as the video of 16 minutes and 15 seconds obtained by (6) in Figure 2) is the latest recorded video .
  • the function of multiple recordings is enabled.
  • the interface shown in (2) in FIG. 5 displays all photos and videos taken by the user (or in other words, the photos and videos taken are presented in an unclassified manner in the gallery application).
  • the gallery application may include multiple photo albums, and the multiple photo albums are used for classifying and storing files such as videos, screen captures, and my movies.
  • the multiple photo albums include a photo album used to save a recorded video.
  • the photo album used to save one-record videos can be named one-record album. Photos of wonderful moments and wonderful short videos associated with the original recorded video can also be saved in the One Record Multiple Album.
  • the wonderful short video saved in the album is a virtual video.
  • a virtual video refers to a data file that does not actually generate a video file.
  • a virtual video may be XML playback logic.
  • the virtual video will also have a corresponding video thumbnail in the multi-record album. Since the virtual video is not an actually generated video file, the memory space occupied by the virtual video is smaller than that of the actually generated video file. For example, the actual generated video file occupies 5M, and the virtual video occupies 30k.
  • the actually generated wonderful short video file will also be saved in the one-recorded-multiple album.
  • the thumbnail of the video 302 shown in (2) in Figure 5 and the thumbnail of other photos and videos in the gallery (referring to the video that does not turn on the video with the one-record-multiple function), the size can be different or the same. This is not limited.
  • the thumbnail of the recorded video can be larger than the thumbnails of other photos and videos in the gallery (referring to videos recorded without the one-record-multiple function).
  • the thumbnail of the video 302 is larger than the thumbnail of the photo 303
  • the thumbnail of the video 302 is also larger than the thumbnail of the video 304 .
  • the thumbnail of the recorded video after the one-record-multiple function is turned on should be the same size as other thumbnails.
  • the video 302 when the mobile phone displays the interface shown in (2) in Figure 5, as the latest recorded video, the video 302 can be played automatically for the user to preview. It can be understood that when the video 302 is previewed by the user, the video 302 will not be played in full screen, and the preview window of the video follows the window size of the thumbnail, that is, the thumbnails of other photos and videos can still be seen when the video 302 is previewed.
  • the video 302 can be automatically played, that is, the video 302 is played in full screen for the user to view.
  • the video 302 will not be played automatically.
  • Another optional way to play the video 302 is that the user can click the video 302 shown in (2) in FIG. 5 to view it. After clicking the video 302, the mobile phone displays an interface as shown in (3) in FIG. 5, and a play button 305 appears on the screen. When the user clicks the play button 305, the mobile phone starts to play the video 302, and the mobile phone displays an interface as shown in (4) in FIG. 5 .
  • the video 302 is in a playing state (for example, the video is played to the third second).
  • the user can trigger it with a certain gesture, so that the mobile phone presents an interface of wonderful photos and wonderful short videos obtained by "one record, multiple".
  • the triggering gesture may be a gesture in which the user slides upwards from the bottom of the screen. Users can swipe up the screen to enter one record and get more. It can be understood that this application does not limit the method of how to enter one-record-multiple-get, and the user may also use other UX interaction methods to enter one-record-multiple-get.
  • the mobile phone displays an interface as shown in (6) in FIG. 5 .
  • a part of the preview image of the video 302 is displayed at the top of the screen. At this time, if the user slides the screen downward with a finger, the interface will return to the playback interface of the video 302 .
  • a pause control 307 and a horn control 308 are also included.
  • the pause control 307 is used to pause playing the video; the speaker control 308 is used to select whether to play the video silently.
  • the image frame queue arranged by time is displayed below the video, which is used to display the progress of the current video playback, and allows users to view the image frames to be played soon.
  • the interface shown in (5) in FIG. 5 also includes options such as share, favorite, edit, delete, and more. If the user clicks share, the video 302 can be shared; if the user clicks favorite, the video 302 can be stored in a folder; if the user clicks edit, the video 302 can be edited; if the user clicks delete, the video 302 can be deleted; if the user clicks More, you can enter other operating functions on the video (such as moving, copying, adding notes, hiding, renaming, etc.).
  • the mobile phone presents the interface of photos and short videos of the wonderful moments obtained by "one record for many", and the interface presents the recommended 15-second wonderful short video 309 and 4 high-quality wonderful moments to the user Photos (310, 311, 312, 313) and collage 314.
  • the 15-second wonderful short video 309 consists of wonderful moments.
  • the image frames included in the 15-second wonderful short video 309 are all intercepted from the complete video of 16 minutes and 15 seconds.
  • the interception does not refer to the operation of taking a screenshot (or intercepting an image frame) in a short video of 16 minutes and 15 seconds in a conventional manner.
  • the method for obtaining the 15-second wonderful short video 309 is described below.
  • the 15-second wonderful short video 309 is a video spliced from different clips in the 16 minutes and 15-second video, for example, the 15-second wonderful short video is spliced by the following multiple clips: 5 minutes 9 seconds to 5 minutes and 11 seconds, 7 minutes and 20 seconds to 7 minutes and 22 seconds, 10 minutes and 03 seconds to 10 minutes and 05 seconds, 13 minutes and 13 seconds to 13 minutes and 15 seconds, and 15 minutes 08 seconds to 15 minutes and 10 seconds.
  • the 15-second wonderful short video 309 is a complete video in the 16 minutes and 15 seconds video, for example, the 15-second wonderful short video consists of the 10th minute 3 second to the 10th minute 18th video. At this time, the image frames corresponding to the wonderful moment MM are all in the video from the 10th minute 3 second to the 10th minute 18th second.
  • the embodiment of the present application does not limit the duration and quantity of the exciting short video.
  • the wonderful short video may be a 20-second wonderful short video, or two wonderful short videos, and the duration of the two wonderful short videos is 15 seconds and 20 seconds respectively.
  • the embodiment of the present application does not limit the number of photos of the wonderful moment MM, and there may be one or more photos of the wonderful moment MM, specifically, it may be one to four.
  • the jigsaw puzzle 314 may be a jigsaw puzzle composed of multiple wonderful moment photos MM. It should be understood that the embodiment of the present application does not limit the number of pictures of wonderful moments included in the collage 314 , and the collage 314 may include some or all of the pictures MM of wonderful moments.
  • the lower part of the screen also includes a character tag map. If the user clicks on a certain character tag graph, the mobile phone will display photos related to the character tag graph (or display clusters of the character tag graph).
  • Immersive card play is a way of playing that fills the entire screen. It can be seen that in the interface shown in (7) in FIG. 5 , the picture fills the entire screen of the mobile phone.
  • the interface shown in (7) in FIG. 5 if the user clicks on the screen, the interface displays the interface shown in (8) in FIG. 5 .
  • the interface may include video playback progress bar 315 , share 316 , favorite 317 , edit 318 and delete 319 and other options. Through the progress bar 315, the user can know the progress of the video playback.
  • the mobile phone will generate and store a video file corresponding to the wonderful short video 309 based on the MM tag, so that the user can share. If the user clicks favorite 317, the mobile phone will save the wonderful short video 309 in the favorite folder, and there is no need to generate a video file corresponding to the wonderful short video 309 here. If the user clicks on edit 318, the mobile phone will edit the wonderful short video 309. As for whether to generate the video file of the wonderful short video 309, it may depend on the user's subsequent operations. For example, if the user needs to save, the edited wonderful short video will be generated. The video file of video 309 is saved.
  • the share 316 in the interface shown in (8) in FIG. 5 is essentially different from the sharing options in the interface shown in (5) in FIG. 5 .
  • the share 316 in the interface shown in (8) among Fig. 5 is used for sharing wonderful short video 309, and, after the user clicks share 316 in the interface shown in (8) among Fig. 5, mobile phone just can generate the wonderful short video to be shared.
  • Video file for video 309 .
  • the sharing option in the interface shown in (5) in FIG. 5 is used to share the recorded original video (ie video 302 ).
  • the 15-second wonderful short video 309 displayed on the interface (6) in FIG. 5 , the video played in (7) in FIG. 5 , And the videos displayed on the interface (8) in Figure 5 are all playback strategies generated by the player based on the video tags.
  • the corresponding video files are not actually generated in the internal memory 121 of the mobile phone, that is, they are shared or saved by the user. Before the instruction, the corresponding video file is not stored in the memory.
  • the 15-second wonderful short video 309 shown in the interface of (6) in Figure 5, the video played in (7) in Figure 5, and the video displayed in the interface of (8) in Figure 5 can be generated in the following manner : Through the MM tag, you can know the position of the wonderful moment in the complete video file, and based on the position of the MM tag in the video, you can generate a preview video.
  • the first MM tag is at 5 minutes and 10 seconds
  • the second MM tag is at 7 minutes and 21 seconds
  • the third MM tag is at 10 minutes and 04 seconds
  • the fourth MM tag is 13 minutes and 14 seconds
  • the fifth MM tag is 15 minutes and 09 seconds.
  • the final 15-second featured video consists of the following time segments: 5 minutes 9 seconds to 5 minutes 11 seconds, 7 minutes 20 seconds to 7 minutes 22 seconds, 10 minutes 03 seconds to 10 minutes 05 seconds, From 13 minutes and 13 seconds to 13 minutes and 15 seconds, and from 15 minutes and 08 seconds to 15 minutes and 10 seconds. It should be understood that the examples here are only illustrative descriptions, and the present application is not limited thereto.
  • the 15-second wonderful short video will actually be generated when the user has a need to share or save the 15-second wonderful short video.
  • the mobile phone when the user clicks share 316, the mobile phone generates an actual 15-second wonderful short video based on the playback strategy.
  • FIG. 6 is a schematic diagram of another GUI related to "one record, multiple results" provided by the embodiment of the present application.
  • (1) in Figure 6 presents an interface during the recording process, such as the screen at the 24th second. To end the recording, click the stop control 901 in the interface shown in (1) in FIG. 6 .
  • the mobile phone After the recording is finished, if the user uses the one-record-multiple function for video recording for the first time, the mobile phone will prompt the user that a one-record-multiple file has been generated.
  • the interface shown in (2) in Figure 6 is the preview interface after the recording ends, and a bubble window 902 will pop up in the interface, and the content displayed in the window 902 is: "A record of many wonderful photos and short videos has been generated" .
  • the preview image 903 in (2) in FIG. 6 is a thumbnail display of the recorded original video. If the user clicks 903, the original recorded video can be displayed in the gallery.
  • the gallery can be started, and an interface as shown in (3) in FIG. 6 is displayed.
  • the interface shown in (3) in FIG. 6 is a presentation of the recorded original video in the gallery application.
  • a highlight moment area 904 is included below the recorded video.
  • the highlight moment area 904 is used to display the image frames of the highlight moment.
  • the highlight moment area 904 includes thumbnail images of 5 photos of the highlight moment.
  • the beautiful moment photos included in the wonderful moment area 904 are similar to the wonderful moment high-quality photos 310-313 shown in (6) in FIG. 5 .
  • the photos of the wonderful moments included in the wonderful moment area 904 may include thumbnails of collages, or may not include thumbnails of collages.
  • the definition of the jigsaw puzzle is similar to the jigsaw puzzle 314 shown in (6) in FIG. 5 , and will not be repeated here.
  • a guide frame 905 (or prompt box) will appear in the interface, and the guide frame 905 is used to prompt the user with the following information: "One record gets more" intelligently captures multiple wonderful moments for you . That is to say, the guide frame 905 is used to inform the user that what is included in the guide frame 904 is a thumbnail of a photo of a wonderful moment.
  • the guide frame 905 can be highlighted. At this time, in the interface shown in (3) in FIG. Except for the exciting moment area 904, the display brightness of the remaining parts can be lowered so as to highlight the guide frame 905 and the exciting moment area 904. Of course, if the video is not entered for the first time, the guide frame 905 will not appear.
  • the interface shown in (3) in FIG. 6 also includes options such as playback control 915 , share, favorite, edit, delete, and more, so that the user can perform corresponding operations on the original video.
  • options such as playback control 915 , share, favorite, edit, delete, and more, so that the user can perform corresponding operations on the original video.
  • playback control 915 shares, favorite, edit, delete, and more, so that the user can perform corresponding operations on the original video.
  • the interface shown in (4) in FIG. 6 is an interface for playing recorded video, and the interface plays the picture of the 12th second.
  • the AI one-click blockbuster control 906 will be displayed to the user in the video playing interface.
  • the AI one-key blockbuster control 906 is used to enter wonderful short videos. That is to say, if the user clicks on the control 906, he will enter the playing interface of the wonderful short video, for example, the interface shown in (5) in FIG. 6 .
  • the interface shown in (5) in Figure 6 is the same as the interface shown in (2) in Figure 7 below, and related descriptions can refer to the following description.
  • the interface shown in (4) in FIG. 6 also includes a progress bar 907 .
  • the progress bar 907 shows that the recorded video has a duration of 56 seconds, and the current playback has reached 12 seconds.
  • the progress bar can also be called a slide bar, and the user can adjust the playback progress by dragging the slide bar.
  • the interface shown in (4) in FIG. 6 also includes a wonderful moment area 904, which is also used to display the image frames of the wonderful moment.
  • the interface shown in (4) in FIG. 6 also includes options such as share, favorite, edit, delete, and more.
  • the interface shown in (4) in FIG. 6 may also include the recording time of the video 903, the address and location information of the mobile phone when the video 903 was recorded, and the like.
  • Fig. 7 is a schematic diagram of another GUI related to "one record, multiple results" provided by the embodiment of the present application.
  • (1) in FIG. 7 is an interface for playing recorded video.
  • the interface includes options such as the currently playing video, AI one-key blockbuster control 906, progress bar 907, highlight moment area 904, share, favorite, edit, delete, and more.
  • the areas where 906, 907, and 904 are located in the interface may form a display frame to be highlighted.
  • One way of realizing the protruding display is that the width of the display frame formed by the areas where 906, 907, and 904 are located in the interface may be greater than the width of the video being played.
  • the interface shown in (1) in FIG. 7 if the user clicks on the control 906, the interface will display the interface shown in (2) in FIG. 7 .
  • the interface shown in (2) in FIG. 7 a wonderful short video is being played.
  • the wonderful short video played here is based on the playback strategy generated by the video tag.
  • the corresponding video file is not actually generated in the internal memory 121 of the mobile phone, that is, before the user issues a share or save instruction, there is no video file in the memory.
  • the interface of (2) in FIG. 7 also includes a saving control 908 , a sharing control 909 , a music control 910 , an editing control 911 , a style control 912 and so on.
  • control 908 or control 909 the mobile phone will generate a video file of the 15-second wonderful short video.
  • the interface shown in (3) in FIG. 7 can be entered to add different soundtracks for the wonderful short video.
  • the user can click any soundtrack control in the dotted line box 913 to select a soundtrack for the wonderful short video, such as soothing, romantic, warm, comfortable, and quiet.
  • the interface shown in (4) in FIG. 7 can be entered to select different styles for this wonderful short video.
  • the video style here may be a filter, that is, the video is color-graded by applying a filter. Filter is a kind of video special effect, which is used to realize various special effects of video.
  • the video style here may also be video effects such as fast playback and slow playback.
  • the video style here may also refer to various themes, and different themes include their corresponding filters, music and other content.
  • editing operations such as clipping, splitting, volume adjustment, and frame adjustment can be performed on the wonderful short video.
  • the mobile phone can generate a corresponding edited video file. If the edited wonderful short video is discarded, that is, the edited wonderful short video is not saved, then the mobile phone can not actually generate the video file, and still only save the virtual video in the one-record-multiple album.
  • the wonderful photos in one record are actually stored. That is to say, if you turn on the option of multiple recordings, the wonderful photos that are automatically triggered to take pictures during the video recording process will be automatically stored in the gallery.
  • the wonderful photos automatically captured during the recording process will be saved in the gallery.
  • the wonderful photos automatically captured during the recording process such as the wonderful photos 310-313 shown in (6) in Figure 5, the wonderful photos in (3) in Figure 6 or (4) in Figure 6 or (1) in Figure 7
  • the time zone 904 shows the wonderful photos and the like.
  • N can be set in advance.
  • the photos 310-314 shown in (6) in FIG. 5 will be automatically deleted after being automatically retained for N days.
  • a prompt message may be displayed to the user.
  • the prompt information is used to prompt the user whether to delete the one-record multiple file of the original video (or the wonderful photos and wonderful short videos associated with the original video).
  • the mobile phone can pop up a prompt window, which is used to prompt the user whether to delete the associated wonderful photos and wonderful short videos.
  • the mobile phone can pop up a prompt window, and the prompt window is used to prompt the user whether to delete Associated great photos and great short videos.
  • the exciting short video can also be directly generated based on the MM tag for storage (that is, the exciting short video does not need to be clicked, shared or saved by the user).
  • the exciting short video generated in this embodiment when the user deletes the recorded original video, the user may also be prompted whether to delete the multiple files of the original video.
  • FIG. 8 is a schematic diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system can be divided into five layers, which are application program (application, APP) layer, application program framework layer (abbreviated as FWK), system library, hardware abstraction layer (HAL) and driver layer.
  • the application layer can consist of a series of application packages.
  • the application layer includes a camera application and a gallery application.
  • the camera application supports video recording mode (or movie mode).
  • the application layer can be divided into application interface (UI) and application logic.
  • UI application interface
  • the application interface of the camera may include a video recording mode, a movie mode, and the like.
  • the application logic includes the following modules: capture flow (CaptureFlow), video tag (Video TAG), wonderful moment MM, capture photo callback function (OnPictureTaken), manual capture of JPEG, multiple JPEGs for one record, etc.
  • CaptureFlow supports manual capture operations triggered by users.
  • Video TAG is used to save the time information of the wonderful moment MM tag sent by the framework layer, and the description of the semantic information (including LV0-LV3) of the wonderful moment.
  • the description of the semantic information of the wonderful moment includes but is not limited to: the type of the wonderful moment (for example, the type of the wonderful moment is a smile, a jump, a look back, a goal moment, etc.), and the score of the wonderful moment.
  • OnPictureTaken is a callback function used to call back image data.
  • OnPictureTaken in the application logic layer can be used to call back the manually captured image data.
  • the manual capture JPEG in the application logic layer is used to generate manually captured image data based on the manually captured image data of the OnPictureTaken callback.
  • Highlights MM is used to save one-record multiple JPEG queue data.
  • the one-record-multiple JEPG queue data can be transmitted to the one-record-multiple JEPG module, so that the one-record-multiple JEPG module can generate one-record-multiple JEPG.
  • One record multiple JEPGs can be presented in the gallery as: 310-313 shown in (6) in Figure 5, the highlight moment area in (3) in Figure 6 or (4) in Figure 6 or (1) in Figure 7 904.
  • the application layer may also include other applications, such as calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, browser, WeChat, Alipay, Taobao and other applications.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions.
  • the application framework layer may include a camera framework (or an interface corresponding to the camera application) and a private camera path.
  • the private camera channel is used to transfer the data of the image to the corresponding modules of the application layer.
  • one record more than one JPEG queue is transmitted to the wonderful moment MM module of the application layer through a private photo path, and the photos of the wonderful moment MM are presented in the gallery application, for example, 310 as shown in (6) in Figure 5 -313, or 904 as shown in (3) in FIG. 6 , or 904 as shown in (1) in FIG. 7 .
  • the data of the manually captured image is transmitted to the OnPictureTaken module of the application layer through a private camera path.
  • application framework layer may also include other contents, such as window manager, content provider, view system, phone manager, resource manager, notification manager and so on.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, and phonebook.
  • the view system includes visual controls, such as those that display text and those that display pictures.
  • the view system can be used to build applications.
  • the display interface may be composed of one or more views, for example, a display interface including an SMS notification icon may include a view for displaying text and a view for displaying pictures.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library may include camera service functions.
  • the system library can also include a plurality of functional modules (not shown in Fig. 8), for example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES) and 2D graphics engine (for example: skia graphics library (skia graphics library, SGL)).
  • surface manager surface manager
  • media library Media Libraries
  • three-dimensional graphics processing library for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES)
  • 2D graphics engine for example: skia graphics library (skia graphics library, SGL)
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D layers and 3D layers for multiple applications.
  • the media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, moving picture experts group audio layer III (MP3), advanced audio coding (AAC), auto Adaptive multi-rate (adaptive multi-rate, AMR), joint photographic experts group (joint photographic experts group, JPG) and portable network graphics (portable network graphics, PNG).
  • MP3 moving picture experts group audio layer III
  • AAC advanced audio coding
  • AMR auto Adaptive multi-rate
  • JPG joint photographic experts group
  • portable network graphics portable network graphics
  • the 3D graphics processing library can be used to implement 3D graphics drawing, image rendering, compositing and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the hardware abstraction layer is an interface layer between the operating system kernel and the hardware circuit, and its purpose is to abstract the hardware. It hides the hardware interface details of a specific platform, provides a virtual hardware platform for the operating system, makes it hardware-independent, and can be transplanted on various platforms.
  • the hardware abstraction layer includes video pipeline mode component (video pipeline), highlight moment MM node, photo pipeline mode component (photo pipeline), MM tag, one-record multiple JPEG queue and video encoding MP4.
  • the photo pipeline mode components include RAW queue, RAW domain camera algorithm, Bayer processing segment (Bayer processing segment, BPS) module, image processing engine (Image processing engine, IPE) module, stylization module and JPEG encoder (encoder, Enc).
  • the driver layer is the layer between hardware and software. As shown in FIG. 8 , the driver layer may include driver modules such as a display driver and a camera driver. Among them, the camera driver is the driver layer of the camera device, which is mainly responsible for the interaction with the hardware.
  • the camera application in the application layer can be displayed on the screen of the electronic device in the form of an icon.
  • the electronic device runs the camera application.
  • the camera application runs on the electronic device, and the electronic device can send a corresponding touch event to the driver layer according to the user's operation.
  • the touch screen receives a touch event, start the camera application, and start the camera by calling the camera driver of the driver layer.
  • the camera application in the application layer receives a recording request triggered by the user.
  • the camera application in the application layer can interact with the camera framework in the framework layer, and send recording requests to the camera framework.
  • the camera framework sends recording requests to the camera service in the system library.
  • the camera service in the system library sends recording requests to the video pipeline pattern component of the hardware abstraction layer.
  • the video pipeline mode component of the hardware abstraction layer sends the recorded video stream data to the MM node.
  • the MM node determines the wonderful moment MM based on the recorded video stream, and calls the camera driver to take pictures when the wonderful moment MM is determined, and sends the picture data to the photo pipeline mode component for processing.
  • the MM node can also combine the time information of the wonderful moment MM (or the time position of the MM in the video) and the type of the wonderful moment or the description of the semantic level of the wonderful moment (the LV0-LV3 information corresponding to the wonderful moment, such as , the wonderful moment MM is information such as looking back, smiling, jumping, etc.) to the MM tag module.
  • the MM label module can use the tag of the wonderful moment as metadata (meta) and use the clip as a unit to report the time information of the wonderful moment MM and the type of the wonderful moment to the video pipeline in real time.
  • the time information of the wonderful moment MM and the type of the wonderful moment are transmitted to the camera service of the system library through the video pipeline.
  • the camera service transmits the time information of the wonderful moment MM and the type of the wonderful moment to the camera frame of the frame layer, and sends it to the Video Tag module of the application layer through the camera frame.
  • the photo pipeline mode component can process the photo data of the wonderful moment MM, and output a JPEG queue with more than one record (that is, the JPEG data of the wonderful moment MM photo).
  • the RAW queue in the photo pipeline mode component is used to send RAW data to the RAW domain camera algorithm for processing.
  • the data output by the RAW domain camera algorithm is sent to the BPS module.
  • the BPS module is used to convert RAW data to Bayer data.
  • the Bayer data obtained after being processed by the BPS module enters the IPE module.
  • the IPE module is used to further process the Bayer data to improve imaging clarity, texture details, tone color, sharpening, etc.
  • the data processed by the IPE module is sent to the style module.
  • the stylization module is used to render images (such as rendering images into artistic paintings).
  • the image data processed by the stylization module is sent to the JPEG encoder.
  • the JPEG encoder is used to process the image data obtained from the stylization module to obtain JPEG data.
  • the multi-record JPEG queue of the hardware abstraction layer can call back the JPEG data to the wonderful moment MM of the application layer through the private camera path.
  • the wonderful moment of the application layer MM can pass the one-record JPEG queue to the one-record JPEG module of the application layer.
  • the wonderful moment MM at the application layer can also register MM with the private camera channel.
  • One record multiple JPEG module can generate JPEG based on JPEG data, that is, photos of MM at wonderful moments.
  • the video pipeline mode component in the hardware abstraction layer can transfer the recorded video data to the MP4 module.
  • the MP4 module is used to output the original recorded video.
  • the recorded original video can establish an association relationship with the JPEG in the application layer through the recording identifier in the recording request.
  • the gallery application in the application layer receives a user-triggered view operation, and the view operation is used to view a one-record JPEG image.
  • the gallery application displays the multi-recorded JPEG image on the display screen by calling the display driver.
  • the user enters the interface of (6) in Figure 5 by sliding up the screen, and the interface of (6) in Figure 5 shows a JPEG image with multiple records.
  • the manual capture function is also supported during the recording process.
  • the architecture in Figure 8 provides the relevant structure of the manual capture function.
  • CaptureFlow in the application layer sends a user-triggered manual capture request to the camera framework in the framework layer.
  • the framework layer sends the manual capture request to the video pipeline mode component of the hardware abstraction layer through the camera service in the system library.
  • the video pipeline mode component sends the manual capture request to the manual capture frame selection module.
  • the manual capture and frame selection module calls the camera driver to take pictures, and sends the picture data to the photo pipeline mode component for processing.
  • the processing of each module contained in the photo pipeline mode component refers to the above description, and will not be repeated here.
  • the photo pipeline mode component outputs manually captured image data.
  • the image data captured manually can be fed back to the OnPictureTaken module of the application layer through the private camera channel.
  • the OnPictureTaken module of the application layer can determine which frames are captured manually based on the manually captured image data, and then obtain manually captured JPEG images based on these frames.
  • the user can click the control 802 to trigger a manual capture operation.
  • the gallery in the application layer receives an operation of viewing a manually captured image triggered by the user, and the gallery application also displays the manually captured JPEG image on the display screen by calling a display driver.
  • FIG. 8 does not limit the embodiment of the present application.
  • the MM node in FIG. 8 can determine the highlight moment MM based on the recorded video stream.
  • the MM node determines the wonderful moment MM based on the recorded video stream, including: obtaining hierarchical information of multiple granularities based on the video stream; determining the wonderful moment MM according to the hierarchical information of multiple granularities, wherein the multiple The level information of each granularity includes: first level information, second level information and third level information, the granularity of the first level information is greater than the granularity of the second level information, and the granularity of the second level information is greater than the granularity of the third level information.
  • the first-level information is used to represent a theme or scene of a video
  • the second-level information is used to represent a change in a video scene
  • the third-level information is used to represent a wonderful moment.
  • the first-level information, second-level information, and third-level information above provide decision information in order of granularity from coarse to fine, so as to assist the MM node in identifying wonderful moments in the recording process.
  • the levels corresponding to the above-mentioned first level information include LV0 and LV1
  • the level corresponding to the second level information is denoted as LV2
  • the level corresponding to the third level information is denoted as LV3. It is divided into LV0, LV1, LV2 and LV3 in turn.
  • the information of LV0 is used to give the style or atmosphere TAG of the entire video (for example, children's fun, characters, Spring Festival, Christmas, birthday, wedding, graduation, food, art, travel, night scene, sports, nature, relaxed Cheerful/Little Sad/Dynamic Rhythm/Casual).
  • the information of LV1 is used for scene recognition at the semantic level, divides the video into several segments, and gives the category of each segment, such as mountains, portraits, etc.
  • Table 1 below provides examples of definitions of LV0 and LV1.
  • LV2 information can give the video transition position (for example, the frame number where the transition occurs), and the transition type (character protagonist switching, fast camera movement, scene category changes, image content changes caused by other situations), to prevent similar There are too many scene recommendations.
  • LV2 information is used to represent video scene changes (or simply referred to as transitions), including but not limited to one or more of the following changes: changes in the subject of characters (or protagonists), major changes in the composition of image content, semantic Layer scenes change, and image brightness or color changes.
  • the change of the main body of the character when the main body of the character changes, it is regarded as a transition.
  • the subject of the person can define the person who occupies the largest proportion in the image. For example, if the character subject of the t-1th frame image is A, and the character subject of the t-th frame image is increased by B, but the subject is still A, it does not count as a transition. For another example, if the character subject of the t-1th frame image is A, and the character subject of the tth frame image changes to B, a transition is counted.
  • a major change in the composition of the image content is considered a transition.
  • the camera is basically stable, if there are many moving objects in the recording screen, resulting in a large change in the content of the screen, it is considered a transition.
  • the user records the racing screen through the mobile phone. If there is a racing car passing by in the screen, it can be considered that a transition occurred when the racing car passed by.
  • Quick panning e.g. panning from A to B quickly).
  • the frame rate of the transition detection is 2FPS, for example, the difference between the image content of the t-th frame and the t-16th frame is relatively small. When it is large, it is regarded as a transition.
  • the picture is severely blurred, and the content changes greatly from frame to frame, but the entire mirror movement process can only be regarded as a transition.
  • the start frame a and the end frame b of interval B are regarded as transition frames.
  • a change in image brightness or color is considered a transition. For example, in a concert, if the content of the screen changes slightly, but the color and brightness of the ambient light change, it is considered a transition.
  • the level corresponding to the third level information is denoted as LV3
  • the granularity of the information of LV3 is finer than that of LV2.
  • Information from LV3 is used to determine highlight moments.
  • the information of LV3 can be divided according to the following categories: basic image quality, subjective image evaluation, characters and actions.
  • the basic image quality is to judge the overall clarity of the image from the overall dimension of the image, for example, whether the image is out of focus, whether the motion is blurred, whether the exposure is appropriate, whether the noise is obvious, etc.
  • Subjective image evaluation can be judged from the dimension of composition, for example, whether the composition is beautiful (evaluation criteria can be based on evaluation criteria such as symmetry and rule of thirds).
  • a character can be judged from the following dimensions: whether the face is clear (only one face is judged here), whether the character's eyes are open or closed, and the expression that conveys emotion (such as laughing, surprise, etc., here need to remove meaningless labels , such as muscle twitching, crooked eyes, etc.).
  • actions can be judged from the following dimensions: shooting (the highest point of a layup, the highest point of a jump shot (the highest point of a person)), kicking (the moment of kicking the ball, such as the starting action or completing the action), badminton (playing or smashing the ball Action), jumping (jumping at the highest point), running (leg step, stagnant point), looking back (moment of looking back, long hair flowing (below 45 degrees)), splashing water (creek splashing water punching card) and throwing (parabolic punching photo).
  • shooting the highest point of a layup, the highest point of a jump shot (the highest point of a person)
  • kicking the moment of kicking the ball, such as the starting action or completing the action
  • badminton playing or smashing the ball Action
  • jumping jumping at the highest point
  • running leg step, stagnant point
  • looking back miment of looking back, long hair flowing (below 45 degrees)
  • splashing water creek splashing water punching card
  • the decision logic of the LV0-3 layer is described below in conjunction with the decision module in FIG. 9 . As shown in Figure 9, the decision logic of the LV0-3 layer is output by each algorithm module in cooperation with the decision.
  • the decision-making of LV0-1 layer cooperates with the decision-making output through the scene recognition module.
  • the input of the scene module includes the following information: single frame image, face information and human body information; output scene category.
  • the face information can be obtained through the face information module.
  • the scene recognition module performs scene recognition on the input single frame image, and the scene category of the image can be output.
  • the human body information in the image (including but not limited to the position of the human body in the image, whether the human body is a child, and the ID of the person) can be obtained through the human body detection + ReID module.
  • the face information in the image can be obtained through the face information module (including but not limited to face position, gender, expression, etc.).
  • the decision of the LV2 layer is realized through the transition module.
  • the transition module cooperates with the decision-making output through the above-mentioned human detection + ReID module and the transition detection module.
  • Human body detection + ReID module can output relevant information of human body detection, including but not limited to the following: human body position, whether it is a child, ID and other information.
  • the input of the transition detection module 2 frames of images; the output: whether to transition and the transition frame number.
  • two frames of images can be input to the transition detection module, and the scene information of the two frames of images can be obtained through the scene recognition module. Based on the two frames of images, the transition detection module can obtain information such as whether a transition occurs and the frame number of the transition.
  • the human body detection + ReID module in the transition module can also be used to assist the transition module to determine whether a transition has occurred.
  • the decision-making of the LV3 layer is realized through the MM module.
  • the MM module passes facial expression, motion detection module (output: motion category, motion score), composition evaluation module (input: single frame image; output: score) and image quality evaluation module (input: single frame image; output: score) Match decision output.
  • the human body detection + ReID module in the transition module can input human body detection information to the action detection module in the MM module.
  • the face information module can input the face information into the face expression module in the MM module, so that the MM module can score the face expression.
  • the MM module can output the final score of the image frame based on the scoring of the following four dimensions: facial expression scores, motion detection scores, composition evaluation scores, and image quality evaluation scores.
  • the MM node may obtain hierarchical information (or key data, decision information, etc.) of each level (such as LV0-3) based on a preset interval for use in taking pictures.
  • the interval may be a preset value, and the preset value may depend on hardware resources. For example, scene information is acquired every 10 frames. For another example, the transition information is obtained every 10 frames.
  • FIG. 10 and FIG. 11 are analyzed or detected based on data streams (or video streams).
  • the data stream may be a data stream obtained by performing a resolution reduction operation based on the preview stream. It can be understood that the use of a data stream with reduced resolution for analysis or detection is helpful to improve detection efficiency.
  • the data streams in Figure 10 and Figure 11 can be Tiny streams.
  • the MM node obtains LV1 information by identifying the scene of the video stream at the time stamp (-10), and can obtain the scene of the video, thereby knowing the scene of the video.
  • Information obtained through scene recognition should be fed into the LV0 statistical decision-making module.
  • the LV0 results need to be combined with the final statistical results. That is to say, after the user finishes shooting, a unique LV0 result can be statistically generated, and the LV0 result is used to characterize the theme and atmosphere of the entire video.
  • the MM node can acquire the scene information (obtainable through the scene recognition module) every 10 frames. For example, at the time stamp t(-15) and the time stamp t(-5), the MM nodes obtain the scene information once respectively, and then perform image transition comparison. If the image has a transition, a transition is counted. In addition, at the time stamp t(-5), face recognition can also be performed through the face detection module, and the recognized face can be sent to face ReID. Whether the scene of the image changes or the face changes, the statistical results can be reflected in the LV2 results. The MM node generates LV2 results based on timestamp t(-15) and timestamp t(-5). You can know whether a transition has occurred through the LV2 result.
  • the MM node performs the following human body detection and action detection on the image frame in turn, and scores based on the detected actions, and sends the action scores to the LV3 comprehensive decision-making module; at the same time, combined with the time stamp t(
  • the data output by the face detection module at -5) is used to perform image quality/aesthetic evaluation on the image frame with time stamp t(0), obtain the image quality/aesthetic evaluation, and send it to the LV3 comprehensive decision-making module.
  • the LV3 comprehensive decision-making module makes a comprehensive decision based on the image quality/aesthetic score and action score, and obtains the LV3 result of time stamp t(-1) (ie, the score of the wonderful moment).
  • FIG. 11 shows the relevant judgment logic of the MM node at the current time stamp t(0). As shown in Figure 11, the following steps are included:
  • Step 1 when the recording starts, initialize the relative threshold.
  • the initial threshold can be set to a small value, such as 0.
  • the cache area can be cleared.
  • the purpose of initializing the relative threshold is to ensure that at least one photo can be taken.
  • an absolute threshold (or fractional threshold) can also be configured.
  • the absolute threshold can be considered as a quantitative indicator for evaluating wonderful moments.
  • the absolute threshold can be represented by thd_max_confid.
  • an absolute threshold is used to separate out predefined actions.
  • Each action category may have a corresponding absolute threshold.
  • absolute thresholds need to be tuned as precisely as possible in order to accurately identify action moments in action. This absolute threshold represents a high probability that an action is detected.
  • an absolute threshold corresponding to the jumping action may be preset. If a jumping action is detected in an image frame, the jumping action can be scored and then the score is compared to an absolute threshold corresponding to the jumping action. If the score is greater than the absolute threshold corresponding to the jumping action, the jumping action can be determined as a wonderful moment, and the corresponding image frame can be regarded as a photo of the wonderful moment.
  • Step 1 During the video recording process, it is judged whether the score of the LV3 data of the key frame with LV3 data (such as time stamp t(-1)) is greater than the absolute threshold. If the score of LV3 data is greater than the absolute threshold, go to step 6; if not, go to step 2.
  • the image frame representing the timestamp t(-1) falls into the pre-defined action category with very high confidence. Without the influence of other photo process conflicts, the RAW domain algorithm can be directly triggered to take pictures.
  • the alignment of the Tiny stream and the zero-second delay (zero shutter lag, ZSL) sequence needs to be considered additionally here. This is because the MM node works on Tiny data, so the corresponding RAW data needs to be selected for taking pictures.
  • Step 2 judging whether the score of the LV3 data is greater than a relative threshold. If the score of LV3 data is larger than the relative threshold, go to step 3.
  • step 3 the RAW data is copied into the cache area, and at the same time, the relative threshold is updated to keep the latest and highest value.
  • the above buffer area is used to store RAW data.
  • the preceding analysis or detection utilizes the tiny stream, and the data stored in the buffer here is the original (RAW) image frame corresponding to the tiny stream.
  • the RAW image frames corresponding to the time stamps t(-7) to t(0) are stored in the buffer area.
  • the buffer area may be a separately set buffer.
  • the buffer area can be the ZSL buffer in the ZSL camera system.
  • Step 4. Determine whether a transition occurs at the current time stamp, and whether the time between the transition and the previous transition exceeds the minimum transition time limit threshold (for example, thd_change), and if so, determine that the transition can be triggered once For the photo transition, proceed to step 5.
  • the minimum transition time limit threshold for example, thd_change
  • the purpose of introducing the threshold of the shortest transition time limit is to prevent frequent transitions from taking pictures too frequently.
  • Step 5 judging whether the buffer area is not empty and whether the relative threshold is smaller than the absolute threshold.
  • step 6 can be performed to send the RAW data in the temporary buffer to the photo path to trigger a photo shoot. This design enables at least one photo to be output under one transition.
  • Step 6 send the RAW domain camera algorithm for processing.
  • step 6 before the RAW data is sent to the RAW domain photographing algorithm for processing, it may also be determined whether the current photographing interval is greater than the minimum photographing interval.
  • the minimum photo interval is set to 3s. If it is judged that the current photo interval is greater than 3s, the photo will be triggered.
  • the advantage of this setting is that it can prevent frequent triggering of taking pictures. Moreover, conflicts between manual capture and automatic capture can also be avoided.
  • the automatic triggering of taking pictures may be blocked. For example, if the user's continuous snapping results in automatic photo-taking being blocked, then the images captured manually by the user can be associated with the video, that is, the images captured by the user can be output as photos of exciting moments.
  • Step 7 the camera path is processed and sent to JPEG encoding.
  • the corresponding MM decision data (such as the score of LV3 data) can be saved in EXIF.
  • Step 8 enter and exit the JPEG queue according to the score of the LV3 data.
  • the high and low entry JPEG queue is based on the final elimination of the photo score. For example, assuming that the current JPEG queue retains 5 photos (such as the 5 photos corresponding to t(-4) to t(0) shown in the figure), the photo with the lowest score among the 5 photos is recorded as photo X , at this time, if a photo Y is output, and the rating of the photo Y is higher than that of the photo X, then the photo X will be dequeued, and the photo Y will be in the queue.
  • a queue containing 5 JPEGs For example, output a queue containing 5 JPEGs according to the score of LV3 data, that is, keep the TOP5 photos with the highest score forever. It can be understood that here a queue containing 5 JPEGs is taken as an example for illustration, and the number of JPEGs contained in the JPEG queue can be set.
  • the decision information is divided into LV0, LV1, LV2, and LV3 according to granularity from coarse to fine.
  • LV0 gives a summary of the entire video, or the overall atmosphere of the video.
  • LV1 divides the video into three categories of video clips based on LV0, for example, the categories are portrait (portrait), landscape (landscape) and building (building).
  • scene change information such as the frame number where the transition occurs, specifically including 3 transitions.
  • LV3 gets the following wonderful moments: MM1 (between the first transition and the second transition), MM2 (between the first transition and the second transition), MM3 ( Between the second transition and the third transition), MM4 (after the third transition). It can be seen that two MMs occurred between the first transition and the second transition.
  • the scores of MM1 and MM2 can be compared when making decisions, and the MM with a higher score can be retained.
  • Figures 13-16 are schematic diagrams of user interfaces at different moments when recording a video.
  • Fig. 17 is a schematic diagram of time stamps for the interface shown in Fig. 13-Fig. 16 .
  • the mobile phone has turned on the one-record-multiple-get function.
  • the start time of the recording is 00 minutes 00 seconds 00 milliseconds (expressed as 00:00:00).
  • the interface includes a camera control 802 , a stop control 803 and a pause control 804 . If the user clicks the stop control 803, the recording can end; if the user clicks the control 802, manual capture can be performed during the video recording. If the user clicks on the pause control 804, the recording may be paused.
  • the recording screen at time 00:02:15 is shown in (2) in Figure 13. The screen at this time presents the whole picture of the mountain, and the content of the screen is the mountain (denoted as mountain A). The user holds the mobile phone and continues to move, and the picture at time 00:05:00 is shown as (3) in Figure 13, and the content of the picture displayed in the interface is a part of the mountain.
  • the MM node detects the video stream from time 00:00:00 to 00:05:00, and can recognize that the semantic scene or category of the video clip from time 00:00:00 to 00:05:00 is a mountain .
  • the MM node can recognize that the scene of the video segment is mountain A.
  • the MM node recognizes that from time 00:00:00 to 00:05:00, the picture at time 00:02:15 presents the whole picture of mountain A.
  • the MM node judges factors such as the basic quality of the picture at time 00:02:15, whether the composition is beautiful, etc., and obtains a score of 65 for the picture frame at this moment, and determines this moment as the first MM.
  • the MM node detects the video stream from the time 00:05:00 to the time 00:06:00, and it can be considered that a transition occurs at the time 00:06:00 (the transition refers to the change of the scene), and the transition type is Quick mirror. Therefore, when editing the selected short video at the back end, discard the content from time 00:05:00 to time 00:06:00.
  • the picture at time 00:08:54 is as shown in (2) in Figure 14.
  • the picture at this time presents the whole picture of the mountain, and the content in the picture is the mountain (denoted as mountain B).
  • the picture at time 00:11:00 is shown as (1) in Figure 15, and the content in the picture is the sky.
  • the interface shown in (2) in Figure 14 and the interface shown in (1) in Figure 15 we know that at time 00:11:00, the content of the screen changes, so Consider a transition at time 00:11:00.
  • the MM node detects the video stream from time 00:06:00 to time 00:11:00, and learns that the scene changes at time 00:11:00. Further, the MM node detects the MM of the video stream from time 00:06:00 to time 00:11:00, and obtains that the second MM is at time 00:08:54, with a score of 79.
  • the user moves the phone in hopes of taking a picture of the sky.
  • the picture at time 00:18:50 is shown as (2) in Figure 15, and the content in the picture is the sky.
  • the picture at time 00:20:00 is shown as (3) in Figure 15, and the content in the picture is the sky.
  • the MM node detects the video stream from 00:11:00 to 00:20:00, and can recognize that the scene category of the video clip is sky. Further, the MM node detects the MMs from 00:11:00 to 00:20:00, and can obtain that the third MM is at the time 00:18:50, and the score is 70.
  • the screen at time 00:25:00 is shown as (1) in Figure 16. It can be known from the interface shown in (1) in FIG. 16 that a person enters the camera at time 00:25:00. The screen at time 00:20:00 is different from the screen at time 00:25:00.
  • the MM node detects the video stream from time 00:20:00 to time 00:25:00, and learns that the scene changes at time 00:25:00, and a transition occurs, and the transition type is fast camera movement. Therefore, when editing the selected short video at the back end, discard the content from 00:20:00 to 00:25:00.
  • the screen at time 00:28:78 is shown in (2) in Figure 16, at time 00:28:78 the character looks back.
  • the screen at time 00:30:99 is shown in (3) in Figure 16, and at time 00:30:99 the character has another look back.
  • the user can click the control 803 at time 00:35:00 to end the recording.
  • the MM node From time 00:25:00 to time 00:35:00, the MM node detects that the scene category is a person. Further, the MM node detects the MM of the video stream from time 00:25:00 to time 00:35:00, and learns that the two wonderful moments are 00:28:78 and 00:30:99 respectively, and combines the following Factors were used to score the pictures of these two moments: basic image quality, characters, and character actions, and the scores of these two wonderful moments were 95 and 70, respectively. Based on this, the MM node determines that the fourth MM is at time 00:28:78, and the fifth MM is at time 00:30:99.
  • the five MMs obtained are: time 00:02:15, time 00:08:54, time 00:18:50, time 00:28:78 and time 00:30: 99.
  • exciting short videos can be generated.
  • the wonderful short video is composed of image frames corresponding to these wonderful MMs, and transitions are included between the frames.
  • the wonderful short video also includes image frames near these frames. For example, for the moment 00:25:00, the wonderful short video includes not only the image frame at 00:25:00, but also the time frame from 00:24:58 to 00:25 :02 image frames.
  • the MM node detects the following multiple clips based on the video stream information:
  • Fragment Clip1 the start time is 00:00:00, the end time is 00:05:00, the category of the scene is landscape (for example, the scene shown in Figure 13 is actually landscape A: mountain), the wonderful moment of this scene
  • the MM is 00:02:15, the score of this MM is 65, and the transition type of the start frame is: start.
  • the start time is 00:05:00
  • the end time is 00:06:00
  • the scene type is dynamic rhythm
  • the transition type of the start frame is: fast camera movement. Therefore, it is recommended to ditch content from 00:05:00 to 00:06:00 in featured videos.
  • the start time is 00:06:00
  • the end time is 00:11:00
  • the category of the scene is landscape (for example, the scene shown in Figure 14 is actually landscape B: mountain)
  • the MM of this scene is At 00:08:54
  • the score of the MM is 79
  • the transition type of the starting frame is: content change.
  • the start time is 00:11:00
  • the end time is 00:20:00
  • the category of the scene is sky (for example, the scene shown in Figure 15 is actually landscape C: sky)
  • the MM of the scene is At 00:18:50
  • the score of the MM is 70
  • the transition type of the starting frame is: content change.
  • the start time is 00:20:00
  • the end time is 00:25:00
  • the category of the scene is dynamic rhythm
  • the transition type of the start frame is: fast camera movement. Therefore, it is recommended to ditch the content from 00:20:00 to 00:25:00 in featured videos.
  • the start time is 00:25:00 (this time stamp detects a person entering the shot), the end time is 00:35:00, the category of the scene is people, and the MM of the scene is 00:28:78 (for example, (2) in Figure 16 shows the action of looking back at the time stamp) and 00:30:99 (for example, the action of looking back at the time stamp shown in (3) in Figure 16), the two MMs
  • the ratings are 95 and 70 respectively, and the transition type of the starting frame is: content change.
  • the above 6 clips can be considered as dividing the recorded original video into 6 video clips, or dividing the original video into 6 video clips based on the identified semantic level information (which can be recorded as LV1 information). fragment.
  • the information of the transition that is, the LV2 information
  • the information of the wonderful moment that is, the LV3 information
  • the theme or style of the entire video can be determined based on the original video, that is, the LV0 information.
  • the photos of MMs with higher scores can be reserved first. For example, suppose the recorded video is divided into 4 video clips, and the number of photos of the MM at the wonderful moment is limited to 4. After analysis, it is found that: the first video clip contains 2 MMs, the second video clip to the fourth Each video segment contains 1 MM respectively, that is, a total of 5 MMs are determined, then it is necessary to compare the scores of the 2 MMs in the first video segment, and keep the MM with a higher score in the first video segment. At the same time, in order to ensure At least one MM needs to be output for each video clip, and one MM contained in the second to fourth video clips needs to be used as the final output photo of the wonderful moment, that is, 4 photos of the wonderful moment are finally output.
  • a 15-second featured video can also be generated based on the above-mentioned 5 MMs.
  • FIG. 13-FIG. 17 are only for the convenience of those skilled in the art to understand, and do not limit the protection scope of the embodiment of the present application.
  • FIG. 18 is a schematic flowchart of a video processing method provided by an embodiment of the present application. As shown in Figure 18, the method includes:
  • the first operation may be a recording operation.
  • the first operation may be the operation of the user clicking the recording control 801 , and in response to the operation of the user clicking the recording control 801 , the electronic device starts video recording.
  • the first video is an original video recorded by the user.
  • the first video is video 302 (16 minutes and 15 seconds) in (2) in FIG. 5 .
  • the first video is the video (with a duration of 56 seconds) being played in the interface shown in (4) in FIG. 6 .
  • the first interface is a playback interface of the first video
  • the first interface includes a first control and a first area
  • the first area displays a thumbnail of the first photo and A thumbnail image of a second photo
  • the first photo is automatically taken at a first moment
  • the second photo is automatically taken at a second moment
  • the first video is included in the recording process of the first video and the second moment
  • the first video includes a first video clip and a second video clip
  • the first video clip is a first scene
  • the second video clip is a second scene
  • the first video clip is a second scene
  • a photo is a photo in the first video clip
  • the second photo is a photo in the second video clip
  • the score of the first photo is greater than a first threshold
  • the score of the second photo is greater than the first threshold.
  • the first photo and the second photo can be understood as photos of wonderful moments. It should be understood that the description here takes the first photo and the second photo as an example, and it is not limited that there are only two photos of the wonderful moment. In fact, there may be multiple photos of the wonderful moment, which is not limited in this embodiment of the present application.
  • the first video clip and the second video clip are different video clips in the first video, or video clips in different scenes.
  • the first video clip is a first scene
  • the second video clip is a second scene.
  • first video segment and the second video segment may be continuous video segments in the first video, or discontinuous video segments, which are not specifically limited.
  • the first video segment is the interface shown in (1) to (3) in Figure 13, that is, the start time is 00:00:00, the end time is 00:05:00, the video segment The scene (corresponding to the first scene) is a mountain.
  • the second video segment is the video segment corresponding to the interface shown in (1) to (4) in Figure 16, the start time is 00:25:00, the end time is 00:35:00, the scene of the video segment
  • the category (which may correspond to the second scene) is a character.
  • the first scene and the second scene are different scenes.
  • a transition occurs between the first video segment and the second video segment.
  • the first scene is a mountain (such as the interface shown in Figure 14 (1) and Figure 14 (2), the scene is a mountain), and the second scene is the sky (such as in Figure 15 (1) to Figure 15 In the interface shown in (3), the scene is the sky) and so on.
  • the first scene is the sky (such as the interface shown in Figure 15 (1) to Figure 15 (3), the scene is the sky), and the second scene is a character (such as Figure 16 (1) to Figure 16 In the interface shown in (4), the scene is a character).
  • first scenario and the second scenario are only exemplary descriptions, and this embodiment of the present application is not limited thereto. It should also be understood that, in addition to the first scene and the second scene described above, the first video may also include more scenes, which are not specifically limited in this embodiment of the present application.
  • the first photo is the first type of action
  • the second photo is the second type of action. That is to say, the first photo and the second photo are different types of character actions.
  • the first type of action is jumping.
  • the second type of action is looking back and so on.
  • the first type of action is kicking a football
  • the second type of action is long hair flowing.
  • the first photo is a landscape (such as mountains, sky, etc.), and the second photo is a person (or portrait, portrait, etc.).
  • the first photo is the mountains and the second photo is the sky. It should be understood that the foregoing descriptions about the types of the first photo and the second photo are merely exemplary descriptions, and this embodiment of the present application is not limited thereto.
  • the first moment is the time 00:02:15 in the interface shown in (2) in FIG. 13
  • the first photo is the frame corresponding to the time 00:02:15
  • the score of the first photo is 65.
  • the second moment is the moment 00:08:54 in the interface shown in (2) in FIG. 14
  • the second photo is the frame corresponding to the moment 00:08:54
  • the score of the second photo is 79.
  • the first moment is the moment 00:18:50 in the interface shown in (2) in FIG. 15
  • the first photo is the frame corresponding to the moment 00:18:50
  • the score of the first photo is 70.
  • the second moment is the moment 00:28:78 in the interface shown in (2) in FIG. 16
  • the second photo is the frame corresponding to the moment 00:28:78
  • the score of the second photo is 95.
  • the score of the first photo should satisfy a first threshold.
  • the above-mentioned first threshold is an absolute threshold corresponding to the first type of action. Whether the score of the first photo satisfies the criteria for scoring a wonderful moment can be judged by the first threshold. If it is determined that the score of the first photo is greater than the first threshold, it means that the first photo is a photo of a wonderful moment.
  • the first threshold is set to 60
  • the score of the first photo is 70
  • the score of the first photo satisfies the criteria for scoring a wonderful moment.
  • the first video clip also includes a third photo as an example for illustration.
  • the first video clip also includes a third photo, and the third photo is automatically taken at the third moment. The score at the third moment is greater than the first threshold.
  • the third photo is also a highlight moment photo in the first video segment.
  • a score greater than the first threshold is used as a criterion for evaluating a photo of a wonderful moment.
  • this embodiment of the present application is not limited thereto, and there may be multiple implementation manners.
  • a wonderful moment score range may also be set, and if the score of the photo falls within the score range, the photo is considered to be a wonderful moment photo.
  • the endpoint value of the first threshold may also be included in the category of the photo of the wonderful moment. For example, when the score is equal to the first threshold, the photo may also be considered as the photo of the wonderful moment.
  • the method further includes: before automatically taking the first photo, acquiring a score of a fourth photo, where the score of the fourth photo is less than or equal to the first threshold, and greater than a third threshold; updating the value of the third threshold to the score of the third photo.
  • the third threshold here refers to a relative threshold, and the first threshold is an absolute threshold.
  • the first threshold is an absolute threshold.
  • the first scenario and the second scenario described above are different scenarios.
  • a transition occurs between the first video segment and the second video segment.
  • the second video segment further includes the fifth photo
  • the fifth photo is automatically taken when a transition occurs. That is to say, in order to ensure that at least one photo can be output in the second video clip, so when a transition occurs, an automatic photographing can be triggered first to obtain a transition frame (such as a fifth photo).
  • a transition frame such as a fifth photo
  • whether to keep the fifth photo also depends on whether a photo with a higher score than the fifth photo appears later.
  • the fifth photo can be replaced with the second photo, that is, the second photo is output. photo.
  • the first area further includes a thumbnail of the fifth photo.
  • the fifth photo can also be determined as a photo of a wonderful moment in the second video clip, that is, the thumbnail of the fifth photo can be presented in the first area .
  • the second threshold is an absolute threshold for judging the highlight moments in the second video segment.
  • the time between the transition and the previous transition is greater than a time threshold.
  • the time threshold may correspond to the shortest transition time limit threshold in Step 4 in FIG. 11 above. That is to say, in order to avoid frequently triggering transitions to take photos, a time threshold can be set.
  • the third threshold (relative threshold) is smaller than the second threshold.
  • the second threshold is an absolute threshold used to determine the rating of the second photo.
  • the relative threshold is less than the absolute threshold, it means that automatic shooting has not been triggered in this transition segment (or automatic shooting has not been triggered in the second video segment), so in order to ensure a transition segment At least one photo can be output, and automatic shooting can be triggered at the transition frame, that is, the fifth photo above can be obtained.
  • the first interface further includes a play progress bar, and the play progress bar is used to display the play progress of the first video.
  • the first interface is the interface shown in (4) in FIG. 6 .
  • the first control is 906 in (4) in FIG. 6
  • the first area is 904 .
  • a thumbnail of the first photo and a thumbnail of the second photo may be displayed in 904 .
  • the playback progress bar is 907 of (4) in FIG. 6 .
  • the first interface is the interface shown in (1) in FIG. 7 .
  • the first control is 906 in (1) in FIG. 7
  • the first area is 904 .
  • the resolution of the first photo is greater than the resolution of the image captured in the first video.
  • the embodiment of the present application has already been described at (4) in FIG. 2 above, and will not be repeated here.
  • the second interface is a playback interface of a second video
  • the duration of the second video is shorter than the duration of the first video, so The second video includes at least the first photo.
  • the second video can be understood as a wonderful short video of the first video.
  • the composition method of the wonderful short video is mentioned above, and the relevant description can refer to the previous article, so I won’t go into details here.
  • the second interface is the interface shown in (5) in FIG. 6 .
  • the second video is a 15-second video shown in (5) in FIG. 6 .
  • the 15-second video includes at least one photo in 904.
  • the second video may include photos of some exciting moments, or may include photos of all exciting moments, which is not specifically limited in this embodiment of the present application.
  • the second video further includes the second photo.
  • the method further includes:
  • the third interface is an interface of a gallery application, and the third interface includes a second control;
  • the displaying the first interface includes: displaying the first interface in response to a fourth operation on the second control.
  • the above-mentioned third operation may be an operation for the user to view the above-mentioned first video in the gallery application.
  • the third interface may be the interface shown in (3) in FIG. 6 .
  • the second control is the playback control.
  • the second control may be 915 shown in (3) in FIG. 6 .
  • the third interface further includes a first prompt window, where the first prompt window is used to prompt the user that the first photo and the second photo have been generated.
  • the first prompt window may be 905 shown in (3) in FIG. 6 .
  • the brightness of the first prompt window and the brightness of the first area are higher than the areas in the first interface except the first area and the first prompt window brightness.
  • the user's attention may be drawn to the first prompt window by way of highlighting, so as to achieve a more striking reminder effect and improve user experience.
  • the method further includes: in response to a fifth user operation, stopping the recording of the first video, and displaying a fourth interface, where the fourth interface includes a preview thumbnail option;
  • the displaying a third interface in response to the third operation of the user includes:
  • the third interface is displayed.
  • the fifth operation is an operation of triggering to stop recording.
  • the fifth operation may be an operation in which the user clicks the control 901 shown in (1) in FIG. 6 .
  • the fourth interface may be the interface shown in (2) in FIG. 6 .
  • the preview thumbnail option of the currently recorded video can also be displayed.
  • the user clicks the preview thumbnail option the user can jump to the gallery application to display the currently recorded video (non-playing state).
  • the preview thumbnail option may be 903 in (2) in FIG. 6 .
  • the sixth operation may be an operation in which the user clicks 903 .
  • the interface shown in ( 3 ) in FIG. 6 is displayed, which includes a playback control 915 .
  • the fourth interface further includes a second prompt window, and the second prompt window is used to prompt the user that the first photo, the second photo, and the second video.
  • the prompt window can be used to guide the user to view the one-record multiple-get content.
  • the second prompt window may be 902 shown in (2) in FIG. 6 .
  • the method before recording the first video, the method further includes:
  • the one-record-multiple-get function is enabled.
  • the above-mentioned first video is recorded under the premise that the function of recording more than one is enabled.
  • the implementation manner of enabling the one-record-multiple function has been described above, and details may refer to the descriptions in FIG. 2 to FIG. 4 .
  • enabling the multi-record function can be set through 404 shown in (4) in FIG. 2 .
  • the application can set the minimum duration of the recorded video, and when the recording duration is less than the minimum duration, the feature of one-record-multiple-recording of the video will not be called back.
  • the duration of the first video is greater than or equal to a preset duration. For example, if the preset duration is set to 15 seconds, when the user's recording duration is less than 15 seconds, it will not call back a recorded extra photo.
  • the minimum duration of video recording can be set through 405 shown in (4) in FIG. 2 .
  • the second interface also includes music controls
  • the user can implement the soundtrack operation on the second video.
  • the second interface may be the interface shown in (2) in FIG. 7 .
  • the music control may be the music control 910 shown in (2) in FIG. 7 .
  • the second interface also includes a style control; the method also includes:
  • the style control may be the style control 912 shown in (2) in FIG. 7 .
  • the user can add style to the second video.
  • the description of the style has been mentioned above, and the relevant description can refer to the above, so I won’t repeat it here.
  • the gallery application includes a first photo album, and the first photo album includes the first photo and the second photo.
  • the first photo album further includes a virtual video of the second video.
  • virtual video For the meaning of virtual video, refer to the previous explanation.
  • the second interface further includes: a share control or a save control;
  • the video file is stored in the first photo album.
  • the sharing control is 909 shown in (2) in FIG. 7 .
  • the saving control is 908 shown in (2) in FIG. 7 .
  • the storage space occupied by the video file is larger than the storage space occupied by the virtual video.
  • the first interface further includes a delete option; the method further includes: displaying a third prompt window in response to the user's eleventh operation on the delete option, and the third prompt The window is used to prompt the user whether to delete the second video, the first photo and the second photo.
  • the delete option is as shown in (4) in FIG. 6 .
  • a prompt message is displayed on the user interface to prompt the user whether to delete the images and videos associated with the original video (for example, the first photo , second photo, and second video).
  • the user wishes to keep the images and videos associated with the original video, he can choose to keep the images and videos associated with the original video, which avoids data loss and helps to improve user experience. If the user wishes to delete together, the original video and the images and videos associated with the original video are deleted together, which helps to save space.
  • the above-mentioned photos of the wonderful moments associated with the first video can be automatically kept for a preset time period, such as N days, N hours and other time units.
  • the preset duration may be set by the factory, or may be set by the user independently, which is not limited.
  • the method further includes: automatically deleting the first photo if no user operation to view the first photo is received after N days. It can be understood that the first photo is taken as an example for illustration here, and the second photo may also be automatically deleted after being kept for N days.
  • the second video is automatically deleted.
  • the second video further includes a nearby image frame of the first photo, and the nearby image frame is determined based on the first time tag;
  • the nearby image frames include the image frames corresponding to the first A moments of the first time stamp and the image frames corresponding to the last B moments of the first time stamp, A is greater than or equal to 1, and B is greater than or equal to 1.
  • the method for obtaining the 15-second wonderful short video 309 was introduced. Take a wonderful moment in the acquisition method of the wonderful short video 309 as an example to illustrate nearby image frames. For example, assuming that the 5th minute and 10th second is a wonderful moment (corresponding to the first moment), the image frame corresponding to the 5th minute and 10th second is a wonderful moment photo (corresponding to the first photo), then the image frame corresponding to the 5th minute and 9th second and the image frame corresponding to the 5th minute and 11th second are the so-called nearby image frames.
  • the image frame corresponding to the moment when a transition occurs is removed from the second video, and the transition refers to a scene change.
  • the transition refers to a scene change.
  • a second video is generated based on the photos of 5 MMs, with the time of each MM as the center point, extending to both sides and avoiding the time stamps when transitions occur , and operate in this way for each MM until the duration of the second video satisfies the preset duration (such as 15 seconds), to obtain a selected short video with a preset duration.
  • the preset duration such as 15 seconds
  • the first moment is determined based on a first time tag.
  • the first time tag is determined based on the first level information, the second level information and the third level information, the first level information is used to characterize the theme or scene of the video, and the second level information is used to characterize the video The scene of the scene changes, and the third-level information is used to represent a wonderful moment.
  • the MM node can obtain multiple granularity level information of the video stream in real time, so as to identify wonderful moments.
  • multiple levels of information of the video stream are acquired in real time when the video is recorded, and the exciting moments of the video are identified based on the multiple levels of information; the camera is automatically triggered at the exciting moments of the video to obtain the exciting moments photos (for example, the first photo is automatically taken at the first moment, and the second photo is automatically taken at the second moment); wherein, the plurality of levels of information includes the first level of information, the second level of information and the third level of information, so The first-level information is used to represent the theme or scene of the video, the second-level information is used to represent changes in the video scene, and the third-level information is used to represent exciting moments.
  • a time tag (or video tag) may be generated when a wonderful moment is identified.
  • the time tag refers to the time position of the wonderful moment in the first video.
  • the first moment corresponds to the first time label.
  • Featured videos can be generated based on time tags.
  • the MM node in the HAL when the MM node in the HAL is recording a video, it can judge the wonderful moment (or wonderful moment) in the video recording process in real time, and automatically triggers a photo when it recognizes the wonderful moment, and obtains the image of the wonderful moment.
  • the number of images of wonderful moments obtained by automatically triggering the photo taking can be set or adjusted based on requirements. For example, a maximum of 5 images of wonderful moments can be set to be obtained.
  • the video identification and time stamp may be written into the JPEG information of the captured image.
  • the JPEG information of the multi-frame images carries Exchangeable Image File Format EXIF information
  • the EXIF information includes information of the video identification and the video tag. It can be understood that the EXIF information may also include other JPEG data, such as standard information, thumbnail image, watermark information and so on.
  • the method also includes:
  • the first photo is associated with the second video through the first identification.
  • the request message may be called a recording request.
  • the recording request is used to trigger the camera application to start the recording mode. For example, on the interface shown in (1) in FIG. 13 , the user clicks on the recording control 801 to trigger the camera application to start the recording mode.
  • the first identifier is called a video identifier
  • the video identifier is a UUID
  • the recorded original video (the first video), the first photo, and the second photo can be associated through the first identification.
  • the recorded original video (the first video), the first photo, the second photo and the selected short video (the second video) can be associated through the database in the gallery application.
  • the user when viewing the original video, the user can choose to view photos of exciting moments and selected short videos associated with the first video.
  • the viewing operation refers to sliding up the screen, and a logo 306 is presented on the interface, prompting the user to present the interface of "one record for many" associated with the video 302; Operation, after the finger leaves the screen, the mobile phone displays an interface as shown in (6) in Figure 5 .
  • the viewing operation may be to click 906 shown in (4) in FIG. 6 to enter the interface shown in (5) in FIG. 6 .
  • the time tag above refers to the position of the exciting moment in the video, and may specifically be a timestamp corresponding to the exciting moment.
  • the time tag can be used to generate a featured video, which can be understood as: after the video recording ends, a play strategy can be automatically generated according to the tag position in the video, and the play strategy can be used to generate a featured video (or wonderful short video).
  • the video tag is used to generate the featured video instead of generating the featured video in real time during the video recording process. This saves storage space.
  • FIG. 20 shows an example diagram of the work of an MM node provided by the present application (here, the MM node may be the MM node in the hardware abstraction layer in FIG. 8 ).
  • the cache contains RAW data with 16 timestamps. What the MM node is currently performing is the comparison of the image frames with time stamp 5 and time stamp 14 . Due to the delay in the algorithm, the current local optimal frame obtained in the current frame of the algorithm (time stamp 14) is actually the image frame of time stamp 11.
  • the MM node can recognize that the scene is a birthday by analyzing the LV1 information of the time stamp (such as time stamp 18, time stamp 34 and time stamp 50).
  • the MM node can know that a transition occurred at time stamp 16.
  • the MM node analyzes the LV3 information of the current frame (time stamp 14) fed into the algorithm (for example, the LV3 information includes the following dimensions: face correlation, image composition evaluation, motion detection, and basic image quality evaluation), and uses the MM comparison strategy to compare the time
  • the score of the image frame corresponding to stamp 14 is compared with the score of timestamp 5 (the previous locally optimal frame).
  • the MM node obtains the RAW data of the image frame, it can temporarily store the RAW data in the buffer area.
  • the MM node When the MM node recognizes a data frame with a higher score, it sends the RAW data temporarily stored in the buffer into the photo path, and triggers the RAW domain algorithm to take a photo.
  • the MM node can use the database (the database contains decision information of different granularities shown in Figure 20, for example, the information of time stamp 5: ID5, cls1, pri2, score 96; the information of time stamp 14: ID14, cls1, pri2, score 99; transition time stamp: 16, 82, 235, ... subject: birthday) is fed back to the camera frame layer.
  • the user can also be supported to manually capture images when recording a video, in order to improve the user's shooting experience. For example, referring to the interface shown in (2) in FIG. 13 , during the video recording process, the user can click the control 802 to receive a snapshot.
  • the method further includes: receiving a photographing request when recording a video, the photographing request carrying a (manual) capture mark; in response to the photographing request, triggering photographing and obtaining a first image, the The EXIF information corresponding to the first image includes the snapshot mark.
  • the HAL layer supports manual capture capability, and the first image and corresponding EXIF information can be generated through the camera channel processing.
  • the user can simultaneously obtain high-quality photos and videos of exciting moments during the video recording process, which greatly improves user experience.
  • the present application also provides a computer program product, which implements the method described in any method embodiment in the present application when the computer program product is executed by a processor.
  • the computer program product can be stored in a memory, and finally converted into an executable object file that can be executed by a processor after preprocessing, compiling, assembling, linking and other processing processes.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the method described in any method embodiment in the present application is implemented.
  • the computer program may be a high-level language program or an executable object program.
  • the computer readable storage medium may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM enhanced synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the disclosed systems, devices and methods may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented.
  • the device embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system.
  • the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above coupling includes electrical, mechanical or other forms of connection.
  • sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • system and “network” are often used herein interchangeably.
  • the term “and/or” in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and A and B exist alone. There are three cases of B.
  • the character "/" in this article generally indicates that the associated objects are an "or" relationship.

Landscapes

  • Television Signal Processing For Recording (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

提供了一种视频处理方法和电子设备。该方法包括:显示第一界面,第一界面是第一视频的播放界面,第一界面中包括第一控件和第一区域,第一区域显示第一照片的缩略图和第二照片的缩略图,第一照片是在第一时刻自动拍摄的,第二照片是在第二时刻自动拍摄的,第一视频片段为第一场景,第二视频片段为第二场景,第一照片是第一视频片段中的照片,第二照片是第二视频片段中的照片,第一照片的评分大于第一阈值,第二照片的评分大于第二阈值;响应于对第一控件的第二操作,显示第二界面,第二界面是第二视频的播放界面。本申请实施例的视频处理方法能够决策出评分较高的精彩时刻,从而获得图像质量更高的精彩时刻照片及精选短视频,用户体验更好。

Description

一种视频处理方法和电子设备
本申请要求于2021年10月22日提交国家知识产权局、申请号为202111236229.1、申请名称为“一种录像中拍照的方法和电子设备”的中国专利申请的优先权,以及,要求于2022年01月30日提交国家知识产权局、申请号为202210114568.0、申请名称为“一种视频处理方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子设备领域,并且更具体地,涉及一种视频处理方法和电子设备。
背景技术
随着智能终端的不断发展,拍照以及视频录制功能成为智能终端的必备功能。用户对录制和拍照的需求与体验也不断增强。在某些拍摄场景下,用户期望同时捕捉到值得纪念的精彩瞬间照片以及视频。而目前智能终端相机技术中,在视频拍摄过程中获得的精彩瞬间的图像质量不好,导致用户体验不佳。
发明内容
有鉴于此,本申请提供了一种视频处理方法、电子设备、计算机可读存储介质以及计算机程序产品,能够决策出评分较高的精彩时刻,从而获得图像质量更高的精彩时刻照片,使得用户在录像时同时获得精彩瞬间的高质量照片以及视频,极大提升了用户体验。
第一方面,提供了一种视频处理方法,包括:
响应于用户的第一操作,录制第一视频;
显示第一界面,所述第一界面是所述第一视频的播放界面,所述第一界面中包括第一控件和第一区域,所述第一区域显示第一照片的缩略图和第二照片的缩略图,所述第一照片是在第一时刻自动拍摄的,所述第二照片是在第二时刻自动拍摄的,所述第一视频的录制过程中包括所述第一时刻和所述第二时刻;
其中,所述第一视频包括第一视频片段和第二视频片段,所述第一视频片段为第一场景,所述第二视频片段为第二场景,所述第一照片是所述第一视频片段中的照片,所述第二照片是所述第二视频片段中的照片,所述第一照片的评分大于第一阈值,所述第二照片的评分大于第二阈值;
响应于对所述第一控件的第二操作,显示第二界面,所述第二界面是第二视频的播放界面,所述第二视频的时长小于所述第一视频的时长,所述第二视频中至少包括所述第一照片。
上述方法可以由电子设备(比如终端设备)或电子设备中的芯片(比如终端设备中的芯片)执行。基于上述技术方案,在录制视频过程中,通过自动识别精彩时刻并在精彩时 刻触发拍照,能够决策出评分较高的精彩时刻,从而获得图像质量更高的精彩时刻照片,得到高质量的精彩时刻照片以及视频。用户可以查看与录制的原始视频(第一视频)关联的精彩时刻的照片(第一照片和第二照片)以及精选短视频(第二视频),提升了用户体验。
在一种可能的实现方式中,所述第一视频片段中还包括第三照片,所述第三照片是在第三时刻自动拍摄的,所述第三照片的评分大于所述第一阈值。
上述第一阈值可以认为是用于评价第一视频片段中的精彩时刻照片的绝对阈值。换句话说,如果第一视频片段中决策出了多种精彩时刻的照片,那么这些精彩时刻的照片均应该满足第一阈值。因此,通过引入第一阈值,可以更准确的获得多个精彩时刻的照片。
可选地,所述第一视频片段到所述第二视频片段间发生了一次转场。
可选地,所述第一照片是第一类型的动作,所述第二照片是第二类型的动作。
可选地,所述第一照片是风景,所述第二照片是人物。
在一种可能的实现方式中,所述方法还包括:
在自动拍摄所述第一照片之前,获取第四照片的评分,所述第四照片的评分小于或等于所述第一阈值,且,大于第三阈值;
将所述第三阈值的取值更新为所述第三照片的评分。
上述第三阈值是相对阈值。在得到第四照片的评分时,如果第四照片的评分不满足大于绝对阈值(第一阈值)的情形,那么通过判断第四照片的评分与第三阈值的关系。如果第四照片的评分大于第三阈值,则将第三阈值更新为第四照片的评分,以使得相对阈值始终保持最新最高值。
在一种可能的实现方式中,所述第二视频片段中还包括所述第五照片,所述第五照片是在发生转场时自动拍摄的。
这里,在发生转场时可以先触发自动拍照,获得转场帧(比如第五照片)。这样做的目的是保证在第二视频片段中至少能够输出一张照片,避免第二视频片段中没有任何照片输出的情形,或者说,可以保证一个转场片段下至少能输出一张照片。
可选地,所述第一区域中还包括所述第五照片的缩略图。当然,所述第五照片的评分如果也大于第二阈值,则第五照片也可以判定为第二视频片段中的精彩时刻照片。
可选地,所述转场距离上一次转场的时间大于时间阈值。
这里设置时间阈值的目的在于避免频繁触发转场拍照,有助于节省终端功耗。
可选地,所述第三阈值小于所述第二阈值。
在一种可能的实现方式中,所述方法还包括:
响应于用户的第三操作,显示第三界面,所述第三界面为图库应用的界面,所述第三界面包括第二控件;
所述显示第一界面,包括:响应于对所述第二控件的第四操作,显示所述第一界面。
在一种可能的实现方式中,所述第三界面还包括第一提示窗口,所述第一提示窗口用于向用户提示已生成了所述第一照片和所述第二照片。
在一种可能的实现方式中,所述第一提示窗口的亮度以及所述第一区域的亮度,高于所述第一界面中除去所述第一区域和所述第一提示窗口以外的区域的亮度。
在图库应用中首次进入第一视频时,通过高亮显示第一提示窗口,可以引导用户查看 精彩时刻区域的照片,引起用户对所述第一提示窗口的注意,以达到更醒目的提醒效果,提升用户使用体验。
在一种可能的实现方式中,所述方法还包括:
响应于用户的第五操作,停止对所述第一视频的录制,显示第四界面,所述第四界面包括预览缩略图选项;
其中,所述响应于用户的第三操作,显示第三界面,包括:
响应于用户对所述预览缩略图选项的第六操作,显示所述第三界面。
在一种可能的实现方式中,所述第四界面还包括第二提示窗口,所述第二提示窗口用于向用户提示已经生成所述第一照片、所述第二照片以及所述第二视频。
在一种可能的实现方式中,在录制所述第一视频之前,所述方法还包括:
响应于用户的第七操作,开启一录多得功能。
在一种可能的实现方式中,所述第一界面还包括播放进度条,所述播放进度条用于显示所述第一视频的播放进度。
在一种可能的实现方式中,所述第二界面还包括音乐控件;所述方法还包括:
响应于用户对所述音乐控件的第八操作,显示多个不同的音乐选项。
因此,用户可以为第二视频进行配乐,丰富了用户体验。
在一种可能的实现方式中,所述第二界面还包括风格控件;所述方法还包括:
响应于用户对所述风格控件的第九操作,显示多个不同的风格选项。比如,风格可以理解为滤镜。
因此,用户可以为第二视频选择视频风格,丰富了用户体验。
在一种可能的实现方式中,图库应用中包括第一相册,所述第一相册中包括所述第一照片和所述第二照片。
因此,第一照片和第二照片可以保存在同一相册,以便用户查看。
在一种可能的实现方式中,所述第一相册还包括所述第二视频的虚拟视频。虚拟视频是指没有实际生成视频文件的数据文件,比如,虚拟视频可以是XML播放逻辑。
在一种可能的实现方式中,所述第二界面还包括:分享控件或保存控件;
响应于用户对所述分享控件或保存控件的第十操作,生成所述第二视频的视频文件;
将所述视频文件存储在所述第一相册中。
可选地,所述视频文件占用的存储空间大于所述虚拟视频占用的存储空间。
因此,在对分享或保存第二视频时,才生成第二视频,这样可以有效减少视频对终端空间的占用。
在一种可能的实现方式中,所述第一界面还包括删除选项;所述方法还包括:
响应于用户对所述删除选项的第十一操作,显示第三提示窗口,所述第三提示窗口用于提示用户是否删除所述第二视频、所述第一照片以及所述第二照片。
因此,在用户删除第一视频时,可以提示用户是否删除第一视频的精彩时刻照片和精彩短视频,以避免发生误删,提升用户体验。
在一种可能的实现方式中,所述方法还包括:
如果在N天后未接收到用户查看所述第一照片的操作,自动删除所述第一照片。
可以理解,对于第二照片,如果在N天后未接收到用户查看所述第二照片的操作,自 动删除所述第二照片。
因此,通过对与所述第一视频关联的精彩时刻照片(比如第一照片和第二照片)进行如下设置“若用户未查看照片则自动保留预设时长后删除”,有助于节省空间。
在一种可能的实现方式中,所述第二视频中还包括所述第二照片。也就是说,第二视频可以包含全部精彩时刻的照片(比如第一照片和第二照片),也可以包含部分精彩时刻的照片(比如第一照片),对此不作具体限定。
在一种可能的实现方式中,所述第一时刻是基于第一时间标签确定的。
可选地,所述第一时间标签是基于第一层级信息、第二层级信息和第三层级信息确定的,所述第一层级信息用于表征视频的主题或场景,所述第二层级信息用于表征视频的场景发生变化,所述第三层级信息用于表征精彩时刻。
在一种可能的实现方式中,所述第二视频中还包括所述第一照片的附近图像帧,所述附近图像帧是基于所述第一时间标签确定的;
其中,所述附近图像帧包括所述第一时间标签的前A个时刻对应的图像帧和所述第一时间标签的后B个时刻对应的图像帧,A大于或等于1,B大于或等于1。
可选地,所述第二视频中去除发生转场的时刻对应的图像帧,所述转场是指场景发生变化。
在一种可能的实现方式中,所述方法还包括:
响应于所述第一操作,生成请求消息,所述请求消息中包括第一标识;
其中,所述第一照片与所述第二视频通过所述第一标识关联。
因此,可以通过第一标识实现精彩时刻的照片与精彩短视频的关联。
在一种可能的实现方式中,所述第一照片的分辨率大于在所述第一视频中截取的图像的分辨率。相比于在视频中截图的方式,本申请实施例获得的图像的分辨率更佳。
在一种可能的实现方式中,所述方法还包括:在录制视频时接收拍照请求,所述拍照请求携带抓拍标记;
响应于所述拍照请求,触发拍照并获得第一图像,所述第一图像对应的可交换图像文件格式EXIF信息中包括所述抓拍标记。
因此,在录制视频中还可以接收用户的手动抓拍请求,使得用户可以基于主观需求抓拍精彩时刻的照片,以进一步提升用户体验。
第二方面,提供了一种电子设备,包括用于执行第一方面中任一种方法的单元。该电子设备可以是终端设备,也可以是终端设备内的芯片。该电子设备包括输入单元、显示单元和处理单元。
当该电子设备是终端设备时,该处理单元可以是处理器,该输入单元可以是通信接口,该显示单元可以是图形处理模块和屏幕;该终端设备还可以包括存储器,该存储器用于存储计算机程序代码,当该处理器执行该存储器所存储的计算机程序代码时,使得该终端设备执行第一方面中的任一种方法。
当该电子设备是终端设备内的芯片时,该处理单元可以是芯片内部的逻辑处理单元,该输入单元可以是输出接口、管脚或电路等,该显示单元可以是芯片内部的图形处理单元;该芯片还可以包括存储器,该存储器可以是该芯片内的存储器(例如,寄存器、缓存等),也可以是位于该芯片外部的存储器(例如,只读存储器、随机存取存储器等);该存储器 用于存储计算机程序代码,当该处理器执行该存储器所存储的计算机程序代码时,使得该芯片执行第一方面的任一种方法。
在一种实现方式中,所述处理单元用于响应于用户的第一操作,录制第一视频;
调用所述显示单元显示第一界面,所述第一界面是所述第一视频的播放界面,所述第一界面中包括第一控件和第一区域,所述第一区域显示第一照片的缩略图和第二照片的缩略图,所述第一照片是在第一时刻自动拍摄的,所述第二照片是在第二时刻自动拍摄的,所述第一视频的录制过程中包括所述第一时刻和所述第二时刻,其中,所述第一视频包括第一视频片段和第二视频片段,所述第一视频片段为第一场景,所述第二视频片段为第二场景,所述第一照片是所述第一视频片段中的照片,所述第二照片是所述第二视频片段中的照片,所述第一照片的评分大于第一阈值,所述第二照片的评分大于第二阈值;
响应于对所述第一控件的第二操作,调用所述显示单元显示第二界面,所述第二界面是第二视频的播放界面,所述第二视频的时长小于所述第一视频的时长,所述第二视频中至少包括所述第一照片。
在一种可能的实现方式中,所述第一视频片段中还包括第三照片,所述第三照片是在第三时刻自动拍摄的,所述第三照片的评分大于所述第一阈值。
可选地,所述第一视频片段到所述第二视频片段间发生了一次转场。
可选地,所述第一照片是第一类型的动作,所述第二照片是第二类型的动作。
可选地,所述第一照片是风景,所述第二照片是人物。
在一种可能的实现方式中,所述处理单元还用于在自动拍摄所述第一照片之前,获取第四照片的评分,所述第四照片的评分小于或等于所述第一阈值,且,大于第三阈值;将所述第三阈值的取值更新为所述第三照片的评分。
在一种可能的实现方式中,所述第二视频片段中还包括所述第五照片,所述第五照片是在发生转场时自动拍摄的。
可选地,所述第一区域中还包括所述第五照片的缩略图。
可选地,所述转场距离上一次转场的时间大于时间阈值。
可选地,所述第三阈值小于所述第二阈值。所述处理单元还用于响应于用户的第三操作,调用所述显示单元显示第三界面,所述第三界面为图库应用的界面,所述第三界面包括第二控件;
所述处理单元调用所述显示单元用于显示第一界面,具体包括:响应于对所述第二控件的第四操作,调用所述显示单元显示所述第一界面。
在一种实现方式中,所述第三界面还包括第一提示窗口,所述第一提示窗口用于向用户提示已生成了所述第一照片和所述第二照片。
在一种实现方式中,所述第一提示窗口的亮度以及所述第一区域的亮度,高于所述第一界面中除去所述第一区域和所述第一提示窗口以外的区域的亮度。
在一种实现方式中,所述处理单元还用于:
响应于用户的第五操作,停止对所述第一视频的录制,调用所述显示单元显示第四界面,所述第四界面包括预览缩略图选项;
响应于用户对所述预览缩略图选项的第六操作,调用所述显示单元显示所述第三界面。
在一种实现方式中,所述第四界面还包括第二提示窗口,所述第二提示窗口用于向用户提示已经生成所述第一照片、所述第二照片以及所述第二视频。
在一种实现方式中,所述处理单元还用于,在录制所述第一视频之前,响应于用户的第七操作,开启一录多得功能。
在一种实现方式中,所述第一界面还包括播放进度条。
在一种实现方式中,所述第二界面还包括音乐控件;所述处理单元还用于,响应于用户对所述音乐控件的第八操作,调用所述显示单元显示多个不同的音乐选项。
在一种实现方式中,所述第二界面还包括风格控件;所述处理单元还用于,响应于用户对所述风格控件的第九操作,调用所述显示单元显示多个不同的风格选项。
在一种实现方式中,图库应用中包括第一相册,所述第一相册中包括所述第一照片和所述第二照片。
在一种实现方式中,所述第一相册还包括所述第二视频的虚拟视频。
在一种实现方式中,所述第二界面还包括:分享控件或保存控件;所述处理单元还用于:响应于用户对所述分享控件或保存控件的第十操作,生成所述第二视频的视频文件;将所述视频文件存储在所述第一相册中。
在一种实现方式中,所述视频文件占用的存储空间大于所述虚拟视频占用的存储空间。
在一种实现方式中,所述第一界面还包括删除选项;
所述处理单元还用于:响应于用户对所述删除选项的第十一操作,调用所述显示单元显示第三提示窗口,所述第三提示窗口用于提示用户是否删除所述第二视频以及所述多个精彩时刻的照片。
在一种实现方式中,所述处理单元还用于如果在N天后未接收到用户查看所述第一照片的操作,自动删除所述第一照片。
在一种实现方式中,所述第二视频还包括所述第二照片。
在一种实现方式中,所述第一时刻是基于第一时间标签确定的。
在一种实现方式中,所述第二视频中还包括所述第一照片的附近图像帧,所述附近图像帧是基于所述第一时间标签确定的;
其中,所述附近图像帧包括所述第一时间标签的前A个时刻对应的图像帧和所述第一时间标签的后B个时刻对应的图像帧,A大于或等于1,B大于或等于1。
在一种实现方式中,所述第一时间标签是基于第一层级信息、第二层级信息和第三层级信息确定的,所述第一层级信息用于表征视频的主题或场景,所述第二层级信息用于表征视频的场景发生变化,所述第三层级信息用于表征精彩时刻。
在一种实现方式中,所述第二视频中去除发生转场的时刻对应的图像帧,所述转场是指场景发生变化。
在一种实现方式中,所述处理单元还用于响应于所述第一操作,生成请求消息,所述请求消息中包括第一标识;其中,所述第一照片与所述第二视频通过所述第一标识关联。
在一种实现方式中,所述第一照片的分辨率大于在所述第一视频中截取的图像的分辨率。
第三方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机 程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种方法。
第四方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种方法。
附图说明
图1是一种适用于本申请的电子设备的硬件系统的示意图;
图2是本申请提供的一例开启“一录多得”的示意图;
图3是本申请提供的另一例开启“一录多得”的示意图;
图4是本申请提供的又一例开启“一录多得”的示意图;
图5是本申请提供的一例“一录多得”的图形用户界面GUI的示意图;
图6是本申请提供的另一例“一录多得”的图形用户界面GUI的示意图;
图7是本申请提供的又一例“一录多得”的图形用户界面GUI的示意图;
图8是一种适用于本申请的电子设备的软件系统的示意图;
图9是本申请实施例的LV0-3层的决策逻辑的一个示例图;
图10是基于数据流获取LV0-3的层级信息一个示例图;
图11是本申请实施例的拍照逻辑的一个示例图;
图12是LV0-3的层级信息的一个示例图;
图13至图16是本申请提供的录制视频时不同时刻的界面示意图;
图17是本申请提供的录制视频时不同时刻的界面相关的时间戳示意图;
图18是本申请实施例提供的视频处理方法的示意性流程图;
图19是本申请提供的快速运镜时转场帧的示意图;
图20是本申请提供的一例MM节点工作的示意图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
本申请实施例中,除非另有说明,“多个”的含义可以是两个或两个以上。
在介绍本申请实施例之前,首先对本申请实施例涉及的一些术语或概念进行解释。应理解,本申请对以下术语的命名不作具体限定。以下术语可以有其他命名。重新命名的术语仍满足以下相关的术语解释。
精彩时刻(magic moment,MM),是指视频录制过程中的一些精彩画面瞬间。例如,MM可以是最佳运动瞬间,最佳表情时刻或最佳打卡动作。可以理解,本申请对术语MM不作限定,MM也可以称作美好时刻,神奇时刻,精彩瞬间,决定性瞬间,或最佳拍摄(best shot,BS)等。在不同的场景下,精彩时刻可以不同类型的画面瞬间。例如当录制足球比赛视频时,精彩时刻可以是射门或传球时,运动员脚与足球接触的瞬间,精彩时刻也可以是足球飞进球门的瞬间;当录制人物从地面起跳的视频时,精彩时刻可以是人物在空中最高点的瞬间,也可以是人物在空中时动作最舒展的瞬间。
MM标签(TAG),即时间标签,MM标签用于指示精彩时刻在录制的视频文件中的 位置。例如,在视频文件中包括一个或多个MM标签,MM标签可指示在该视频文件的第10秒、第1分20秒等时刻,视频文件中的对应图像帧为精彩时刻。
MM节点,用于对拍摄的视频流进行分析,识别或决策精彩时刻,并在识别到精彩时刻时自动触发拍照。MM节点也称作MM决策引擎,BS决策引擎,MM决策模块等,这些术语具备如前所示的MM节点的功能。
一录多得,可以理解为用户使用相机应用拍摄视频时,通过一次按下“拍摄”图标,可以得到包括一张或多张精彩时刻照片、以及一段或多段精选视频的功能。一录多得的实现过程可以是:MM节点在录像过程中自动识别精彩时刻并触发抓拍,得到MM的照片;在录制结束后,用户查看录制视频时可以向用户推荐精彩时刻MM的照片以及精彩短视频(或称作精选短视频,或精彩视频,或精选视频)。可以理解的是,通过一录多得获得的精彩短视频的时长小于整段完整视频的时长。例如,录制的整段完整视频为1分钟,可以得到4张精彩时刻照片和时长为15秒的精彩短视频。还可以理解,一录多得也可有其他名称,比如,一键多得,一键多拍,一键出片,一键大片,AI一键大片等。
手动抓拍,在录像过程中,可同时进行手动拍照,获得期望拍摄的画面。
为了提升在录像模式下获得的精彩时刻的图像质量,本申请引入“一录多得”模式,即在录像模式中录制视频时,通过分析视频流,自动识别精彩时刻,并在识别到精彩时刻时自动触发拍照,以获得精彩时刻的照片。另外,当视频录制完成时,可在图库查看到精彩时刻的照片以及精彩短视频。相比于随机获取录制视频中的照片,本申请实施例的视频处理方法获得的精彩时刻的图像质量更高,用户体验更好。
本申请实施例提供的视频处理方法可以适用于各种电子设备。
在本申请的一些实施例中,该电子设备可以是手机、智慧屏、平板电脑、可穿戴电子设备、车载电子设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、投影仪等等。
下文以电子设备为手机为例,图1示出了本申请实施例提供的一种电子设备100的结构示意图。图1示出了一种适用于本申请的电子设备的硬件系统。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
需要说明的是,图1所示的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图1所示的部件更多或更少的部件,或者,电子设备100可以包括图1所示的部件中某些部件的组合,或者,电子设备100可以包括图1所示的部件中某些部件的子部件。图1示的部件可以以硬件、软件、或软件和硬件的组合实 现。
处理器110可以包括一个或多个处理单元。例如,处理器110可以包括以下处理单元中的至少一个:应用处理器(application processor,AP)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、神经网络处理器(neural-network processing unit,NPU)。其中,不同的处理单元可以是独立的器件,也可以是集成的器件。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。例如,处理器110可以包括以下接口中的至少一个:内部集成电路(inter-integrated circuit,I2C)接口、内部集成电路音频(inter-integrated circuit sound,I2S)接口、脉冲编码调制(pulse code modulation,PCM)接口、通用异步接收传输器(universal asynchronous receiver/transmitter,UART)接口、移动产业处理器接口(mobile industry processor interface,MIPI)、通用输入输出(general-purpose input/output,GPIO)接口、SIM接口、USB接口。图1所示的各模块间的连接关系只是示意性说明,并不构成对电子设备100的各模块间的连接关系的限定。可选地,电子设备100的各模块也可以采用上述实施例中多种连接方式的组合。
电子设备100可以通过GPU、显示屏194以及应用处理器实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194可以用于显示图像或视频。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体(active-matrix organic light-emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、迷你发光二极管(mini light-emitting diode,Mini LED)、微型发光二极管(micro light-emitting diode,Micro LED)、微型OLED(Micro OLED)或量子点发光二极管(quantum dot light emitting diodes,QLED)。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。作为一种可能的实现方式,在用户查看精彩时刻的照片和精选短视频时,显示屏194可用于显示精彩时刻MM的照片以及精选短视频。
电子设备100可以通过ISP、摄像头193、视频编解码器、GPU、显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP可以对图像的噪点、亮度和色彩进行算法优化,ISP还可以优化拍摄场景的曝光和色温等参数。在一些实施例中,ISP可以设置在摄像头193 中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的红绿蓝(red green blue,RGB),YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。在本申请实施例中,处理器110可基于摄像头193录制的视频流,确定视频流中的精彩时刻MM,并在确定出MM时,调用摄像头193自动触发拍照。ISP和DSP可对精彩时刻MM的图像信号进行处理,以得到精彩时刻的图像。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3和MPEG4。
NPU是一种借鉴生物神经网络结构的处理器,例如借鉴人脑神经元之间传递模式对输入信息快速处理,还可以不断地自学习。通过NPU可以实现电子设备100的智能认知等功能,例如:图像识别、人脸识别、语音识别和文本理解。
外部存储器接口120可以用于连接外部存储卡,例如安全数码(secure digital,SD)卡,实现扩展电子设备100的存储能力。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能(例如,声音播放功能和图像播放功能)所需的应用程序。存储数据区可存储电子设备100使用过程中所创建的数据(例如,音频数据和电话本)。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如:至少一个磁盘存储器件、闪存器件和通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种处理方法。
电子设备100可以通过音频模块170、扬声器170A、受话器170B、麦克风170C、耳机接口170D以及应用处理器等实现音频功能,例如,音乐播放和录音。
触摸传感器180K,也称为触控器件。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,触摸屏也称为触控屏。触摸传感器180K用于检测作用于其上或其附近的触摸操作。触摸传感器180K可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,并且与显示屏194设置于不同的位置。
按键190包括开机键和音量键。按键190可以是机械按键,也可以是触摸式按键。电子设备100可以接收按键输入信号,实现于按键输入信号相关的功能。
以下实施例中所涉及的技术方案均可以在具有上述硬件架构的电子设备100中实现。
为了便于理解,本申请以下实施例将以具有图1所示结构的电子设备为例,结合下文各个图中示出的应用场景,对本申请实施例提供的视频处理方法进行具体阐述。
本申请将以电子设备为手机,手机中安装相机应用为例,详细介绍本申请提供的视频处理方法。
在本申请的一些实施例中,用户可以手动开启或关闭本申请实施例提供的“一录多得”功能。以下结合图2至图4描述一录多得功能的入口。
图2是本申请实施例提供的一例视频处理方法的图形用户界面(graphical user interface,GUI)的示意图。
示例性的,用户可以通过触摸手机屏幕上特定的控件、按压特定的物理按键或按键组合、输入语音、隔空手势等方式,指示手机开启相机应用。响应于接收到用户开启相机的指示后,手机启动相机,显示拍摄界面。
例如,如图2中(1)所示,手机的屏幕显示系统显示了当前输出的界面内容,该界面内容显示了多款应用程序(application,App)。用户可以通过在手机桌面上点击“相机”应用图标401,指示手机开启相机应用,手机显示如图2中(2)所示的拍摄界面。
再例如,在手机处于锁屏状态时,用户也可以通过在手机屏幕上向右(或向左)滑动的手势,指示手机开启相机应用,手机也可以显示如图2中(2)所示的拍摄界面。
或者,手机处于锁屏状态时,用户可以通过在锁屏界面上点击“相机”应用的快捷图标,指示手机开启相机应用,手机也可以显示如图2中(2)所示的拍摄界面。
又例如,在手机运行其他应用时,用户也可以通过点击相应的控件使得手机开启相机应用进行拍照。比如,用户正在使用即时通信类应用(例如微信应用)时,用户也可以通过选择相机功能的控件,指示手机开启相机应用进行拍照和拍摄视频。
如图2中(2)所示,相机的拍摄界面一般包括有取景框402、拍照控件、录像控件以及其他功能控件(比如人像功能控件、夜景功能控件或更多其他控件)。用户通过点击录像控件可以开启录像模式,手机可以显示如图2中(3)所示的录制界面。用户通过点击“设置”,可以进入设置界面,手机显示如图2中(4)所示的界面。图2中(4)所示的界面中显示开启“一录多得”的选项404,用于开启一录多得的功能。也就是说,当用户开启该功能后,手机处于录像模式时会自动采用本申请实施例提供的视频处理方法,在录制视频时智能识别精彩时刻,识别到精彩时刻内容后将自动生成精彩照片和短视频。当然,用户也可以通过该选项404,手动关闭录像模式下的一录多得功能。
另外,图2中(4)所示的设置界面还可以包括最小时间限制的控件405。最小时间设置的控件用于限制能够开启一录多得功能的最小录制时长,如果视频的录制时长小于该最小录制时长,则无法回调视频的一录多得特性。比如,最小时间限制可设置为15s,当用户拍摄时间小于15s时,不会回调一录多得照片。
可以理解,图2中(4)所示的设置界面也可以包括其他关于录像设置的控件,比如,视频分辨率的设置控件、视频帧率的设置控件等,图2中(4)所示的控件只是示例性描述。
上述图2中(4)所示的视频分辨率的设置控件可用于选择视频的分辨率。应理解,视频分辨率的选项取决于手机的具体配置。比如,视频分辨率可选择3840*2160(超高清4K)、1920*1080(1080p全高清)、1280*720(720P高清)等。
举例来说,手机的视频分辨率可设置为1920*1080,换种表述,视频分辨率为(1080P)16:9。一般而言,正常拍照(即不是在录像过程中触发的拍照)的分辨率为4096*3072。需要说明的是,为了匹配宽高比16:9,在录像过程中自动抓拍的精彩时刻照片的分辨率为4096*2304,而在录像过程中截取的图像帧的分辨率为1920*1080。因此,从图像分辨率的角度看,本申请实施例在识别到精彩时刻时,自动抓拍的精彩时刻的照片的分辨率显然要优于在录像过程中截取的图像帧的分辨率。换句话说,精彩时刻的照片的分辨率,要大于通过常规方式在视频中截取的照片的分辨率。
也就是说,至少从分辨率的角度来讲,自动抓拍的精彩时刻的照片的画质要优于通过常规方式在视频中截取的照片的画质。当然,自动抓拍的精彩时刻的照片的画质还可取决于其他因素,比如,通过照片管道模式组件photo pipeline中的拍照算法处理后的照片画质会更好。本申请实施例涉及的photo pipeline中的拍照算法会在后文图8处详细描述。
另外,图2中(4)所示的设置界面还可以包括关于拍照设置的控件,比如,照片比例的设置控件、手势拍照的设置控件、笑脸抓拍的设置控件等。
在开启一录多得选项后,用户可以点击录像控件进行录像。图2中(5)所示的界面示出了录制过程中的一个画面(比如第10秒时的画面)。图2中(5)所示的界面中包括录像停止控件406、录像暂停控件407以及拍照键408。在录像过程中,用户可以点击拍照键408手动抓取照片。
图2中(6)所示的界面中,用户在16分15秒可以点击录像停止的按钮406,结束录制过程,可得到时长为16分15秒的视频。
以上介绍了触发手机进入“一录多得”模式的方法,但本申请不限于在录像模式进入“一录多得”。在本申请的一些实施例中,用户开启“一录多得”功能可以有其他方式。例如,“一录多得”作为一种新增的模式,供用户在相机应用中选择。用户可以选择进入“一录多得”模式。
例如,如图3中(2)所示的界面,响应于检测到用户点击图3中(2)所示界面中的“更多”控件,手机显示如图3中(3)所示的界面。用户点击图3中(3)所示的一录多得控件501,进入一录多得模式,手机显示如图3中(4)所示的界面。当然,“一录多得”控件501也可以显示于如图3中(2)所示界面中,即与拍照控件、录像控件在同一栏,用户通过左右滑动控件,选择一录多得模式。
图3中(1)与图2中(1)所示界面相同,这里不作赘述。从图3中(1)进入图3中(2)的方式也与图2中(1)进入图2中(2)的方式类似,为了简洁,这里不作赘述。
或者,在一些示例中,可以在设置菜单中设置手机的录像模式为“一录多得”模式。
又例如,如图4中(1)所示的界面,响应于检测到用户点击控件601,手机显示如图4中(2)所示的设置界面602。用户可以设置界面中的控件603,进入如图4中(3)所示的相机设置界面604。相机设置界面604中显示有控件605,用于开启一录多得的功能。也就是说,当用户开启该功能后,手机处于录像模式时会自动采用本申请实施例提供的视频处理方法,在录制视频时自动判断精彩时刻并触发抓拍,并自动保存一录多得功能下获得的精彩时刻照片以及精彩短视频。当然,用户也可以通过该控件605,手动关闭录像模式下的一录多得功能。
当然,手机可以默认选择录像模式下开启“一录多得”功能。本申请不作限定。
应理解,用户的手指点击图标可以包括用户的手指触摸到图标,或者,也可以是用户的手指距离图标为小于一定距离时(比如,0.5mm)也可以称为用户的手指触摸到图标。
基于以上各种实现方式,可以开启手机的“一录多得”功能。在手机打开上述一录多得功能后,用户录制的视频以及与该视频相关的一录多得的文件可在图库中查看。以下结合图5进行描述。
图5是本申请实施例提供的一例“一录多得”相关的图形用户界面(graphical user interface,GUI)的示意图。
示例性的,用户可以通过触摸手机屏幕上特定的控件、按压特定的物理按键或按键组合、输入语音、隔空手势等方式,指示手机开启图库应用。图库应用也称作相册、照片等。响应于接收到用户开启图库的指示后,手机显示照片界面。例如,如图5中(1)所示,手机的屏幕显示系统显示了当前输出的界面内容,该界面内容显示了多款应用程序App。用户可以通过在手机桌面上点击“图库”应用图标301,指示手机开启图库应用,手机显示如图5中(2)所示的界面。
如图5中(2)所示,界面中显示用户拍摄的照片以及视频,比如,用户拍摄的视频302(比如图2中(6)得到的时长为16分15秒的视频),照片302,视频304(时长为12秒)。用户拍摄的照片以及视频可以按照拍摄时间排序。图5中(2)所示的界面中展示的视频和照片呈缩略图排列,用户拍摄的视频302(比如图2中(6)得到的时长为16分15秒的视频)是最新录制的视频。在录制该视频302时,开启了一录多得功能。
图5中(2)所示的界面显示的是用户拍摄的所有照片以及视频(或者说是所拍摄的照片以及视频时在图库应用中以未分类的方式呈现)。
作为一种可能的实现方式,图库应用中可以包括多个相册,该多个相册用于分类存储视频、截屏录屏、我的电影等文件。该多个相册中包括用于保存一录多得的视频的相册。例如,可将该用于保存一录多得的视频的相册命名为一录多得相册。一录多得相册中还可保存与录制的原视频关联的精彩时刻照片以及精彩短视频。
需要说明的是,在用户未触发分享或保存精彩短视频时,一录多得相册中保存的精彩短视频是虚拟视频。虚拟视频是指没有实际生成视频文件的数据文件,比如,虚拟视频可以是XML播放逻辑。并且,虚拟视频在一录多得相册中也会有对应的视频缩略图。由于虚拟视频并非是实际生成的视频文件,所以虚拟视频占用的内存空间小于实际生成的视频文件。比如说,实际生成的视频文件占用5M,虚拟视频占用30k。另外,如果用户触发了保存精彩短视频的动作,那么实际生成的精彩短视频文件也会保存在该一录多得相册中。
图5中(2)所示的视频302的缩略图与图库中其他照片和视频(指未开启一录多得功能的视频)的缩略图,大小可以不同,也可以相同,本申请实施例对此不作限定。
作为一种可能的实现方式,开启了一录多得功能之后,录制的视频的缩略图,可大于图库中其他照片和视频(指未开启一录多的功能下录制的视频)的缩略图。比如,在图5中(2)所示的界面中,视频302的缩略图大于照片303的缩略图,视频302的缩略图也大于视频304的缩略图。
或者,作为一种可能的实现方式,开启了一录多得功能后录制的视频的缩略图,与其他缩略图大小保持一致。
在一种可能的实现方式中,在手机显示如图5中(2)所示的界面时,作为最新录制 的一录多得视频,视频302可自动播放,以供用户预览。可以理解,在供用户预览时,视频302不会全屏播放,视频的预览窗口沿用缩略图的窗口大小,即在预览视频302时仍然可以看到其他照片和视频的缩略图。
又一种可选的播放视频302的方式,在用户打开图库后,作为最新录制的视频,视频302可自动播放,即全屏播放视频302,以供用户查看。当然,如果在录制视频302以后用户还拍摄了一些照片,即视频302不是最新录制的视频,则不会自动播放视频302。另一种可选的播放视频302的方式,用户可以点击图5中(2)所示的视频302进行查看。在点击视频302后,手机显示如图5中(3)所示界面,在屏幕中出现播放按钮305。用户点击播放按钮305,则手机开始播放视频302,手机显示如图5中(4)所示的界面。
在图5中(4)所示的界面中,视频302呈现播放状态(比如,视频播放到了第3秒)。在视频302播放状态下,用户可通过一定的手势进行触发,使手机呈现“一录多得”获得的精彩照片和精彩短视频界面。触发手势可以是用户由屏幕下方向上滑动的手势。用户可以通过上滑屏幕进入一录多得。可以理解,本申请对如何进入一录多得的方式不作限定,用户也可以采用其他UX交互方式进入一录多得。
例如,在图5中(5)所示的界面中,用户手指向上滑动屏幕超过预设距离时,界面上呈现标识306,提示用户将呈现与该视频302关联的“一录多得”界面。当用户完成上滑操作,手指离开屏幕后,手机显示如图5中(6)所示的界面。在图5中(6)所示的界面中,屏幕最上方会显示视频302的预览图的一部分。此时,如果用户手指向下滑动屏幕,界面会重新回到视频302的播放界面。
在图5中(5)所示的界面中,还包括暂停控件307、喇叭控件308。暂停控件307用于暂停播放视频;喇叭控件308用于选择是否静音播放视频。视频下方显示按照时间排列的图像帧队列,用于显示当前视频播放的进度,可供用户查看即将要播放的画面帧。
另外,图5中(5)所示的界面中,还包括分享、收藏、编辑、删除、更多等选项。如果用户点击分享,可以分享视频302;如果用户点击收藏,可以将视频302收藏于文件夹;如果用户点击编辑,可以对视频302执行编辑;如果用户点击删除,则可以删除视频302;如果用户点击更多,则可以进入对视频的其他操作功能(比如移动、复制、添加备注、隐藏、重命名等等)。
如图5中(6)所示的界面,手机呈现“一录多得”获得的精彩时刻照片和精彩短视频界面,界面向用户呈现推荐的15秒精彩短视频309以及4张精彩时刻高质量照片(310、311、312、313)以及拼图314。其中,该15秒精彩短视频309由精彩时刻组成。该15秒精彩短视频309中包括的图像帧均是从16分15秒的完整视频中截取的。当然,此处的截取并非是指通过常规方式在16分15秒短视频中进行截图(或者说截取图像帧)的操作。以下描述该15秒精彩短视频309的获得方式。
一种可能的方式,该15秒精彩短视频309是16分15秒视频中的不同片段拼接而成的一段视频,例如,该15秒精彩短视频由以下多个片段拼接而成:第5分9秒至第5分11秒,第7分20秒至第7分22秒,第10分03秒至第10分05秒,第13分13秒至第13分15秒,以及,第15分08秒至第15分10秒。
另一种可能的方式,该15秒精彩短视频309是16分15秒视频中的一段完整视频,例如,该15秒精彩短视频由第10分3秒至第10分18秒的视频组成。此时,精彩时刻 MM对应的图像帧都处于第10分3秒至第10分18秒的视频中。
应注意,如果精彩时刻的照片比较多,那么在生成该15秒精彩短视频时,可以舍弃部分精彩时刻MM的照片。舍弃的原则是:优先保留精彩时刻评分较高的照片。
还应注意,如果精彩时刻MM的照片不够多,可以考虑适当增加一些图像帧进行过渡,比如,可以增加发生转场的图像帧,又比如,在裁剪精彩时刻的前后图像帧时,可适当扩大裁剪范围。举例来说,如果确定出了3个MM,那么在生成15秒精彩短视频时,可以每个MM为中心,向两侧扩展裁剪时长为5秒的片段,得到3个时长为5秒的片段,然后将这3个片段拼接为15秒精彩短视频。
可以理解的是,本申请实施例对精彩短视频的时长和数量均不做限定。例如,精彩短视频可以是一段20秒精彩短视频,还可以是两段精彩短视频,两段精彩短视频的时长分别为15秒和20秒。同样可以理解的是,本申请实施例对精彩时刻照片MM的数量也不做限制,精彩时刻MM的照片可以是1张或多张,具体的,可以是1张-4张。
在图5中(6)所示的界面中,拼图314可以是多张精彩时刻照片MM组成的拼图。应理解,本申请实施例对拼图314中包括的精彩时刻的照片数量不作限定,拼图314中可以包括部分或全部精彩时刻照片MM。
在图5中(6)所示的界面中,屏幕下方还包括人物标签图。若用户点击某一人物标签图,则手机会显示与该人物标签图相关的照片(或者说显示该人物标签图的聚类)。
在图5中(6)所示的界面中,用户通过点击如图5中(6)所示的15秒精彩短视频309,手机显示如图5中(7)所示的界面。
如图5中(7)所示,用户进入沉浸式卡片播放。沉浸式卡片播放是一种画面充满整个屏幕的播放方式。可以看到,图5中(7)所示的界面中,画面充满整个手机屏幕。
作为一种可能的实现方式,在图5中(7)所示的界面中,如果用户点击屏幕,界面显示如图5中(8)所示的界面。在图5中(8)所示的界面中,界面可以包括视频播放的进度条315,分享316,收藏317,编辑318以及删除319等选项。通过进度条315,用户可以得知视频播放的进度。
在图5中(8)所示的界面中,如果用户点击分享316,则手机会基于MM标签生成精彩短视频309对应的视频文件并存储,以便用户进行分享。如果用户点击收藏317,则手机会将精彩短视频309保存在收藏文件夹中,此处不需要生成精彩短视频309对应的视频文件。如果用户点击编辑318,则手机会对精彩短视频309进行编辑,至于是否生成精彩短视频309的视频文件,可取决于用户的后续操作,比如,如果用户需要保存,则生成编辑后的精彩短视频309的视频文件进行保存。如果用户点击删除319,则删除视频309。应注意,图5中(8)所示界面中的分享316与图5中(5)所示界面中的分享选项本质是不同的。图5中(8)所示界面中的分享316用于分享精彩短视频309,并且,在用户点击图5中(8)所示界面中的分享316后,手机才会生成待分享的精彩短视频309的视频文件。而图5中(5)所示界面中的分享选项是用于分享录制的原视频(即视频302)。
需要说明的是,在一种可选的实施方式中,为了节省终端的存储空间,图5中(6)的界面所显示的15秒精彩短视频309、图5中(7)播放的视频、以及图5中(8)的界面所显示的视频均是播放器基于视频标签生成的播放策略,此时手机的内部存储器121中并没有实际生成对应的视频文件,即在用户下发分享或保存指令之前,存储器中并不存储对 应的视频文件。具体来讲,图5中(6)的界面所显示的15秒精彩短视频309、图5中(7)播放的视频、以及图5中(8)的界面所显示的视频可通过以下方式生成:通过MM标签可得知精彩时刻在完整视频文件中的位置,基于MM标签在视频中的位置,可以生成预览视频。
举例来说,假设通过视频302得到5个MM标签,第一个MM标签为第5分10秒,第二个MM标签为第7分21秒,第三个MM标签为第10分04秒,第四个MM标签为第13分14秒,第五个MM标签为第15分09秒,那么基于每个MM标签的时间为中心点,向两侧扩展剪裁,生成15秒精彩短视频。最终得到的15秒精选视频由以下时间片段组成:第5分9秒至第5分11秒,第7分20秒至第7分22秒,第10分03秒至第10分05秒,第13分13秒至第13分15秒,以及,第15分08秒至第15分10秒。应理解,此处的举例只是示意描述,本申请并不限于此。
在用户有分享或保存该15秒精彩短视频的需求时,才会实际生成该15秒精彩短视频。比如,在图5中(8)所示的界面中,在用户点击分享316时,手机基于播放策略生成实际的15秒精彩短视频。
图6是本申请实施例提供的另一例“一录多得”相关的GUI示意图。
在开启了一录多得选项的情况下,图6中(1)呈现的是录制过程中的一个界面,比如第24秒的画面。若需要结束录制,可以点击图6中(1)所示界面中的停止控件901。
在结束录制后,如果用户是首次使用一录多得功能进行视频录制,则手机会向用户提示:已经生成了一录多得的文件。比如,图6中(2)所示的界面是录制结束后的预览界面,界面中会弹出一个气泡窗口902,窗口902中显示的内容是:“已生成一录多得精彩照片和短视频”。其中,图6中(2)中的预览图903是录制的原始视频的缩略显示。如果用户点击903,则可以进入图库呈现录制的原视频。
当用户点击903后,可以启动图库,显示如图6中(3)所示的界面。图6中(3)所示的界面是录制的原视频在图库应用中的一个呈现。在图6中(3)所示的界面中,录制的视频下方包括精彩时刻区904。精彩时刻区904用于展示精彩时刻的图像帧。比如,精彩时刻区904中包括5张精彩时刻的照片的缩略图。
需要说明的是,精彩时刻区904中包括的精彩时刻照片,与图5中(6)所示的精彩时刻高质量照片310-313类似。可选地,精彩时刻区904中包括的精彩时刻照片可以包括拼图的缩略图,也可以不包括拼图的缩略图。其中,拼图的定义与图5中(6)所示的拼图314类似,此处不再赘述。
另外,由于是首次进入该视频,界面中还会出现引导框905(或称作提示框),引导框905用于向用户提示以下信息:“一录多得”为您智能抓拍多个精彩瞬间。也就是说,引导框905用于告知用户904中包含的是精彩时刻照片的缩略图。作为一种可选的实现方式,当显示引导框905时,为了更好地提醒用户,可以对引导框905进行高亮显示,此时图6中(3)所示的界面中除引导框905和精彩时刻区904外,其余的部分可以调低显示亮度,以便凸显引导框905和精彩时刻区904。当然,如果不是首次进入该视频,则不会出现引导框905。
另外,图6中(3)所示的界面中还包括播放控件915、分享、收藏、编辑、删除、更多等选项,以便用户对原视频进行相应操作。各个选项的具体含义在前文图5中(5)处 的描述有涉及,这里不再赘述。
在用户点击播放控件915后,界面开始播放录制视频。比如,图6中(4)所示的界面是播放录制视频的一个界面,该界面播放的是第12秒的画面。视频的播放界面中会向用户显示AI一键大片控件906。AI一键大片控件906用于进入精彩短视频。也就是说,如果用户点击控件906,则会进入精彩短视频的播放界面,比如,图6中(5)所示的界面。图6中(5)所示的界面与如下图7中(2)所示的界面相同,相关描述可参考下文描述。图6中(4)所示的界面还包括进度条907。比如,进度条907显示录制的视频时长为56秒,当前播放到了12秒。进度条也可称作滑动条,用户通过拖动滑动条可以调整播放进度。图6中(4)所示的界面还包括精彩时刻区904,精彩时刻区904同样用于展示精彩时刻的图像帧。类似地,图6中(4)所示的界面中还包括分享、收藏、编辑、删除、更多等选项。
可选地,图6中(4)所示的界面还可以包括视频903的录制时间、录制该视频903时手机所处的地址位置信息等。
图7是本申请实施例提供的又一例“一录多得”相关的GUI示意图。
与图6中(4)所示的界面类似,图7中(1)所示的是播放录制视频的一个界面。类似地,如图7中(1)所示,界面中包括正在播放的视频、AI一键大片控件906、进度条907、精彩时刻区904、分享、收藏、编辑、删除、更多等选项。可选地,为了凸显一录多得功能,界面中的906、907、904所在的区域可构成一个显示框凸出显示。凸出显示的一种实现方式是界面中的906、907、904所在区域构成的显示框的宽度,可大于正在播放的视频的宽度。
在图7中(1)所示的界面中,如果用户点击控件906,则界面显示如图7中(2)所示的界面。在图7中(2)所示的界面中,正在播放精彩短视频。同样,此处播放的精彩短视频是基于视频标签生成的播放策略,此时手机的内部存储器121中并没有实际生成对应的视频文件,即在用户下发分享或保存指令之前,存储器中并不存储对应的视频文件。图7中(2)的界面还包括保存控件908、分享控件909、音乐控件910、编辑控件911、风格控件912等。
同样,如果用户点击控件908或控件909,手机会生成该15秒精彩短视频的视频文件。
如果用户点击音乐控件910,则可以进入图7中(3)所示的界面,为该精彩短视频添加不同的配乐。如图7中(3)所示的界面,用户可点击虚线框913中的任一个配乐控件,为该精彩短视频选择配乐,比如,舒缓、浪漫、温暖、惬意、恬静等。
如果用户点击风格控件912,则可以进入图7中(4)所示的界面,为该精彩短视频选择不同的风格。如图7中(4)所示的界面,用户点击虚线框914中的任一个风格控件,为该精彩短视频选择风格。这里的视频风格可以是滤镜,即通过套用滤镜来对该视频进行调色处理。滤镜是视频特效的一种,用来实现视频的各种特殊效果。可选地,这里的视频风格也可以是快放、慢放等视频效果。可选地,这里的视频风格还可以指各种主题,不同的主题包括各自对应的滤镜和音乐等内容。
如果用户点击编辑控件911,可以对精彩短视频进行剪辑、分割、音量调整、画幅调整等编辑操作。用户编辑完成后,如果对编辑后的精彩短视频进行保存,则手机可以生成对应的编辑后的视频文件。如果对编辑后的精彩短视频做放弃处理,即不保存编辑后的精 彩短视频,则手机可以不实际生成视频文件,在一录多得相册中仍然仅保存虚拟视频。
在本申请实施例中,一录多得中的精彩照片是实际已存储的。也就是说,如果开启了一录多得选项,那么在录制视频过程中自动触发拍照的精彩照片会自动存储在图库中。比如,图2中(6)录制视频结束后及图6中(1)录制视频结束后,录制过程中自动抓拍的精彩照片会保存在图库中。录制过程中自动抓拍的精彩照片,例如为图5中(6)中示出的精彩照片310-313、图6中(3)或图6中(4)或图7中(1)中的精彩时刻区904示出的精彩照片等。录制过程中自动抓拍的这些精彩照片,如前面描述的,可以自动存储在一录多得相册中。
作为一种可能的实现方式,如果用户没有查看一录多得相关的文件,那么未查看的精彩照片在自动保留N天后自动删除,以便节省终端的存储空间。N的取值是可以预先设置。
比如,如果用户没有执行图5中(5)所示的上滑操作查看一录多得文件,那么图5中(6)所示的照片310-314会在自动保留N天后自动删除。
又比如,如果用户没有查看图6中(3)或图6中(4)或图7中(1)中的精彩时刻区904示出的精彩照片,那么图6中(3)或图6中(4)或图7中(1)中的精彩时刻区904示出的精彩照片会在自动保留N天后自动删除。
可选地,如果在N天之前,用户主动删除录制的原始视频,那么可以向用户显示提示信息。该提示信息用于提示用户是否删除原始视频的一录多得文件(或者说与原始视频关联的精彩照片和精彩短视频)。
比如,在用户点击图5中(5)所示的界面中的删除选项时,手机可弹出提示窗口,提示窗口用于提示用户是否删除关联的精彩照片和精彩短视频。
又比如,在用户点击图6中(3)或图6中(4)或图7中(1)中所示界面中的删除选项时,手机可弹出提示窗口,提示窗口用于提示用户是否删除关联的精彩照片和精彩短视频。
上文描述了在用户点击分享或保存等操作时,才会实际生成精彩短视频的视频文件,本申请并不限于此。
作为另一种可选的实施方式,在录制结束后,也可以基于MM标签直接生成精彩短视频进行存储,(即,不需要用户点击分享或保存等操作,才生成精彩短视频)。另外,基于此实施方式生成的精彩短视频,在用户删除录制的原始视频时,也可以向用户提示是否删除原始视频的一录多得文件。
图8是本申请实施例的电子设备100的软件结构示意图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统可以分为五层,从上至下分别为应用程序(application,APP)层、应用程序框架层(简称为FWK)、系统库、硬件抽象层(HAL)以及驱动层。
应用程序层可以包括一系列应用程序包。例如,如图8所示,应用程序层包括相机应用、图库应用。其中,相机应用支持录像模式(或电影模式)。
应用程序层可以分为应用界面(UI)和应用逻辑。相机的应用界面可以包括录像模式、电影模式等。
应用逻辑包括以下模块:捕获流(CaptureFlow),视频标签(Video TAG),精彩时刻MM,捕获照片回调函数(OnPictureTaken),手动抓拍JPEG,一录多得JPEG等。
CaptureFlow支持用户手动触发的抓拍操作。
Video TAG用于保存框架层发送的精彩时刻MM标签的时间信息,以及精彩时刻的语义信息(包括LV0-LV3)的描述。精彩时刻语义信息的描述包括但不限于:精彩时刻的类型(比如,精彩时刻的类型是笑容、跳跃、回眸、进球瞬间等等),以及,精彩时刻的评分等。
OnPictureTaken是一种回调函数,用于回调图像数据。在图8中,应用逻辑层中的OnPictureTaken可用于回调手动抓拍的图像数据。应用逻辑层中的手动抓拍JPEG用于基于OnPictureTaken回调的手动抓拍的图像数据,生成手动抓拍的图像。
精彩时刻MM用于保存一录多得JEPG队列数据。作为一种可能的实现方式,该一录多得JEPG队列数据可以传输至一录多得JEPG模块,以便通过一录多得JEPG模块生成一录多得JEPG。一录多得JEPG在图库中可以呈现为:图5中(6)所示的310-313,图6中(3)或图6中(4)或图7中(1)中的精彩时刻区904。
可以理解,应用程序层也可以包括其他应用程序,比如,日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息、浏览器、微信、支付宝、淘宝等应用程序。
应用程序框架层为应用程序层的应用程序提供应用程序编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预定义的函数。
如图8所示,应用程序框架层可以包括相机框架(或者说相机应用对应的接口)和私有拍照通路。私有拍照通路用于将图像的数据传输至应用程序层的相应模块。一种实现方式,一录多得JPEG队列通过私有拍照通路传输至应用程序层的精彩时刻MM模块,在图库应用中呈现精彩时刻MM的照片,比如,如图5中(6)所示的310-313,或者如图6中(3)所示的904,或者如图7中(1)所示的904。一种实现方式,手动抓拍的图像的数据通过私有拍照通路传输至应用程序层的OnPictureTaken模块。
可以理解,应用程序框架层还可以包括其他内容,比如,窗口管理器、内容提供器、视图系统、电话管理器、资源管理器和通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏、锁定屏幕和截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频、图像、音频、拨打和接听的电话、浏览历史和书签、以及电话簿。
视图系统包括可视控件,例如显示文字的控件和显示图片的控件。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成,例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理、堆栈管理、线程管理、安全和异常的管理、以及垃圾回收等功能。
如图8所示,系统库可以包括相机服务功能。
系统库还可以包括多个功能模块(图8中未示出),例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:针对嵌入式系统的开放图形库(open graphics library for embedded systems,OpenGL ES)和2D图形引擎(例如:skia图形库(skia graphics library,SGL))。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D图层和3D图层的融合。
媒体库支持多种音频格式的回放和录制、多种视频格式回放和录制以及静态图像文件。媒体库可以支持多种音视频编码格式,例如:MPEG4、H.264、动态图像专家组音频层面3(moving picture experts group audio layer III,MP3)、高级音频编码(advanced audio coding,AAC)、自适应多码率(adaptive multi-rate,AMR)、联合图像专家组(joint photographic experts group,JPG)和便携式网络图形(portable network graphics,PNG)。
三维图形处理库可以用于实现三维图形绘图、图像渲染、合成和图层处理。
二维图形引擎是2D绘图的绘图引擎。
硬件抽象层(HAL)是位于操作系统内核与硬件电路之间的接口层,其目的在于将硬件抽象化。它隐藏了特定平台的硬件接口细节,为操作系统提供虚拟硬件平台,使其具有硬件无关性,可在多种平台上进行移植。
如图8所示,硬件抽象层包括视频管道模式组件(video pipeline)、精彩时刻MM节点、照片管道模式组件(photo pipeline)、MM标签、一录多得JPEG队列以及视频编码MP4。其中,照片管道模式组件中包括RAW队列、RAW域拍照算法、拜耳处理阶段(Bayer processing segment,BPS)模块、图像处理引擎(Image processing engine,IPE)模块、风格化模块以及JPEG编码器(encoder,Enc)。
驱动层是硬件和软件之间的层。如图8所示,驱动层可以包括显示驱动、摄像头驱动等驱动模块。其中,摄像头驱动是camera器件的驱动层,主要负责和硬件的交互。
以相机应用为例,应用程序层中的相机应用可以以图标的方式显示在电子设备的屏幕上。当相机应用的图标被触发时,电子设备运行相机应用。相机应用运行在电子设备上,电子设备可以根据用户的操作,向驱动层发送相应的触摸事件。当触摸屏接收到触摸事件,启动相机应用,通过调用驱动层的摄像头驱动启动摄像头。
下面对本申请实施例提供的视频处理方法所涉及的软件模块和模块间的交互进行说明。
如图8所示,应用程序层中的相机应用接收到用户触发的录像请求。应用程序层中的相机应用可以与框架层中的相机框架交互,将录像请求发送至相机框架。相机框架将录像请求发送至系统库中的相机服务。系统库中的相机服务将录像请求发送至硬件抽象层的视频管道模式组件。硬件抽象层的视频管道模式组件将录像的视频流数据发送至MM节点。MM节点基于录制的视频流确定精彩时刻MM,并在确定出精彩时刻MM时调用摄像头驱动进行拍照,同时将拍照数据送入照片管道模式组件处理。MM节点还可以将精彩时刻MM的时间信息(或者说MM在视频中所处的时间位置)以及精彩时刻的类型或者说是精彩时刻的语义层面的描述(精彩时刻对应的LV0-LV3信息,比如,精彩时刻MM为回眸、笑容、跳跃等信息)输送至MM标签模块。MM标签模块,可以以精彩时刻的标签作为元数据(meta)并以clip为单位,将精彩时刻MM的时间信息以及精彩时刻的类型实时上报给video pipeline。精彩时刻MM的时间信息以及精彩时刻的类型通过video pipeline传输至系统库的相机服务。进一步地,相机服务将精彩时刻MM的时间信息以及精彩时刻的类型传递至框架层的相机框架,并通过相机框架发送至应用程序层的Video Tag模块。在MM节点识别到精彩时刻MM触发自动拍照时,照片管道模式组件可以将精彩时刻MM的照片 数据进行处理,输出一录多得JPEG队列(即精彩时刻MM照片的JPEG数据)。具体地,照片管道模式组件中的RAW队列用于将RAW数据送入RAW域拍照算法处理。RAW域拍照算法输出的数据送入BPS模块。BPS模块用于将RAW数据转换为拜耳数据。经过BPS模块处理后得到的拜耳数据进入IPE模块。IPE模块用于对拜耳数据进行进一步处理,以提升成像的清晰度、纹理细节、影调色彩、锐化等。经过IPE模块处理后的数据送入风格化模块。风格化模块用于对图像进行渲染(比如将图像渲染为有艺术风格的画作)。经过风格化模块处理后的图像数据送入JPEG编码器。JPEG编码器用于将从风格化模块获得的图像数据进行处理,得到JPEG数据。硬件抽象层的一录多得JPEG队列可通过私有拍照通路将JPEG数据回调至应用程序层的精彩时刻MM。应用程序层的精彩时刻MM可以将一录多得JPEG队列传递至应用程序层的一录多得JPEG模块。应用程序层的精彩时刻MM还可以向私有拍照通路注册MM。一录多得JPEG模块可以基于JPEG数据生成JPEG,即精彩时刻MM的照片。另外,硬件抽象层中的视频管道模式组件可以将录制的视频数据传递至MP4模块。MP4模块用于输出录制的原视频。录制的原视频可通过录像请求中的录像标识与应用程序层中的一录多得JPEG建立关联关系。
示例性地,应用程序层中的图库应用接收用户触发的查看操作,该查看操作用于查看一录多得的JPEG图像。图库应用通过调用显示驱动将一录多得的JPEG图像显示在显示屏上。比如,在图5中(5)所示的界面中,用户通过上滑屏幕进入图5中(6)的界面,图5中(6)的界面显示的是一录多得的JPEG图像。又比如,用户点击图6中(3)中的904,查看精彩时刻的照片(或者说一录多得的JPEG图像)。
可选地,在录像过程中同时支持手动抓拍功能。图8中的架构提供了手动抓拍功能的相关结构。
示例性地,在录像模式下,应用程序层中的CaptureFlow向框架层中的相机框架下发用户触发的手动抓拍请求。框架层将手动抓拍请求通过系统库中的相机服务下发至硬件抽象层的视频管道模式组件。视频管道模式组件将该手动抓拍请求发送至手动抓拍选帧模块。手动抓拍选帧模块调用摄像头驱动进行拍照,并将拍照数据送入照片管道模式组件进行处理。照片管道模式组件中包含的各个模块的处理参考上文描述,这里不作赘述。照片管道模式组件输出手动抓拍的图像数据。手动抓拍的图像数据可通过私有拍照通路反馈至应用程序层的OnPictureTaken模块。应用程序层的OnPictureTaken模块可以基于手动抓拍的图像数据,确定手动抓拍的是哪些帧,然后基于这些帧可得到手动抓拍JPEG图像。
比如,在图13中(2)所示的界面中,用户可点击控件802触发手动抓拍操作。
示例性地,应用程序层中的图库接收到用户触发的查看手动抓拍图像的操作,图库应用也通过调用显示驱动将手动抓拍JPEG图像显示在显示屏上。
应理解,图8中所示的架构并不对本申请实施例构成限定。
还应理解,本申请实施例中所涉及的技术方案可以在具有图8所示的软件架构的电子设备100中实现。
图8中的MM节点可基于录制的视频流确定精彩时刻MM。作为一种可能的实现方式,MM节点基于录制的视频流确定精彩时刻MM,包括:基于视频流获取多个粒度的层级信息;根据多个粒度的层级信息确定精彩时刻MM,其中,所述多个粒度的层级信息包括:第一层级信息、第二层级信息以及第三层级信息,第一层级信息的粒度大于第二层级信息 的粒度,第二层级信息的粒度大于第三层级信息的粒度。所述第一层级信息用于表征视频的主题或场景,所述第二层级信息用于表征视频的场景发生变化,所述第三层级信息用于表征精彩时刻。
上述第一层级信息、第二层级信息以及第三层级信息按照粒度由粗到细的次序提供决策信息,以辅助MM节点识别录制过程中的精彩时刻。
示例性地,假设上述第一层级信息所对应的层级包含LV0和LV1,第二层级信息所对应的层级记作LV2,第三层级信息对应的层级记作LV3,即将决策信息按照粒度由粗到细依次划分为LV0、LV1、LV2和LV3。
其中,LV0的信息用于给出整段视频的风格或氛围TAG(比如,童趣、人物、春节、圣诞节、生日、婚礼、毕业、美食、艺术、旅行、夜景、运动、大自然、轻松欢快/小伤感/动感节奏/休闲)。LV1的信息用于语义层面场景识别,将视频分成若干片段,并给出每个片段的类别,例如:山脉、人像等。
示例性地,假设上述第一层级信息所对应的层级包含LV0和LV1,以下表1给出了LV0和LV1的定义的举例。
表1
Figure PCTCN2022118147-appb-000001
示例性地,假设上述第二层级信息所对应的层级记作LV2,LV2信息的粒度相比于LV0-1而言会更细。LV2的信息可以给出视频转场位置(比如,发生转场的帧号),以及转场类型(人物主角切换、快速运镜、场景类别变化、其他情况引起的图像内容变化),以防止相似场景推荐数量过多。LV2的信息用于表征视频场景变化(或者也可以简称为转场), 包括但不限于以下变化中的一种或多种:人物主体(或主角)变化,图像内容构成发生较大变化,语义层面场景发生变化,以及图像亮度或颜色发生变化。
其中,人物主体变化:当人物主体发生改变时,视为一次转场。人物主体可以定义图像中占比最大的人物。例如,若第t-1帧图像的人物主体为A,第t帧图像的人物主体增加了B,但是主体仍为A,则不算一次转场。又例如,若第t-1帧图像的人物主体为A,第t帧图像的人物主体变成B,则算一次转场。
图像内容构成发生较大变化视为一次转场。例如,在相机基本稳定时,如果录制画面中有较多物体移动,导致画面内容发生较大变化,则视为一次转场。比如,在观看赛车比赛时,用户通过手机录制赛车画面,如果画面中有赛车驶过,那么可以认为赛车经过时发生了一次转场。快速运镜(e.g.快速从A摇到B)。又例如,在相机缓慢平稳运镜时,此时画面内容一般不会有明显的转场分界,但是转场检测帧率为2FPS,比如第t帧图像与第t-16帧图像内容构成差别较大时,视为一次转场。又例如,在相机快速运镜期间,画面模糊严重,帧与帧之间的内容变化较大,但是只能将整个运镜过程视为一次转场。如图19所示,在相机运镜的区间A至区间C中,假设B为快速运镜的区间,那么将区间B的起始帧a和结束帧b视为转场帧。
图像亮度或颜色发生变化视为一次转场。比如,演唱会中,画面内容变化较小,但是氛围灯的颜色和亮度发生了变化,则视为一次转场。
示例性地,假设上述第三层级信息所对应的层级记作LV3,LV3的信息的粒度相比于LV2而言会更细。LV3的信息用于确定精彩时刻。LV3的信息可以按照以下类别划分:基础画质、主观图像评价、人物以及动作。
举例来说,基础画质是从图像的整体维度判断图像的整体清晰度,比如,图像是否失焦、运动是否模糊,曝光是否合适,噪声是否明显等。主观图像评价可从构图的维度判断,比如,构图是否美观(评价准则可基于对称、三分法等评价标准)。
举例来说,人物可从以下维度判断:人脸是否清晰(这里仅判断一个脸)、人物是否睁眼或闭眼,传递情感的表情(比如大笑、惊讶等,这里需要剔除无意义的标签,比如,肌肉抽搐,眼角歪斜等)。
举例来说,动作可从以下维度判断:投篮(上篮最高点、跳投最高点(人最高))、踢球(踢球的瞬间,比如起始动作或完成动作)、羽毛球(打球或扣球动作)、跳跃(跳跃最高点)、奔跑(迈腿、滞空点)、回眸(回眸瞬间、长发飘逸(45度以下))、泼水(小溪泼水打卡)以及抛物(抛物打卡照)。需要说明的是,若图像中有多人场景,则选择主要人物(比如一个)进行动作检测。
应理解,上述关于LV0-LV3的信息的举例只是示例性描述,本申请并不限于此。
以下结合图9中的决策模块描述LV0-3层的决策逻辑。如图9所示,LV0-3层的决策逻辑由各个算法模块配合决策输出。
其中,LV0-1层的决策通过场景识别模块配合决策输出。该场景模块的输入包括以下信息:单帧图像、人脸信息以及人体信息;输出场景类别。其中,人脸信息可以通过人脸信息模块获得。具体来讲,通过场景识别模块对输入的单帧图像进行场景识别,可以输出图像的场景类别。另外,图像中的人体信息(包括但不限于人体在图像中的位置、人体是否是小孩、人物标识ID)可通过人体检测+ReID模块获得。图像中的人脸信息可以通过人 脸信息(包括但不限于人脸位置、性别、表情等)可通过人脸信息模块获得。
LV2层的决策通过转场模块实现。转场模块通过上述人体检测+ReID模块和转场检测模块配合决策输出。人体检测+ReID模块可以输出人体检测的相关信息,包括但不限于以下内容:人体位置、是否是小孩、ID等信息。该转场检测模块的输入:2帧图像;输出:是否转场以及转场帧号。具体来说,可以将两帧图像输入到转场检测模块,这两帧图像的场景信息可以通过场景识别模块获得。转场检测模块基于这两帧图像,可以得到是否发生了转场以及转场帧号等信息。另外,转场模块中的人体检测+ReID模块也可以用于辅助转场模块判断是否发生了转场。
LV3层的决策通过MM模块实现。MM模块通过人脸表情、动作检测模块(输出:动作类别、动作分数)、构图评价模块(输入:单帧图像;输出:分数)以及画质评价模块(输入:单帧图像;输出:分数)配合决策输出。具体来讲,转场模块中的人体检测+ReID模块可以将人体检测信息输入到MM模块中的动作检测模块。人脸信息模块可以将人脸信息输入到MM模块中的人脸表情模块,以便进行MM模块进行人脸表情的打分。MM模块可以基于以下四个维度的评分,输出图像帧的最终评分:人脸表情的分数、动作检测的分数、构图评价分数以及画质评价分数等。
因此,通过图9中各个逻辑模块的配合,可以获得LV0-LV3的结果。
在本申请实施例中,MM节点可以基于预设间隔获取各个等级(比如LV0-3)的层级信息(或者说关键数据、决策信息等),以供拍照决策使用。该间隔可以是预设值,预设值可取决于硬件资源。比如,每间隔10帧获取一次场景信息。又比如,每间隔10帧获取一次转场信息。
结合图10和图11中的示例说明。此处作统一说明,图10和图11均是基于数据流(或者说视频流)进行分析或检测的。该数据流可以是基于预览流进行降低分辨率操作得到的数据流。可以理解,这里采用降低分辨率的数据流进行分析或检测,有助于提高检测效率。比如,图10和图11中的数据流可以是Tiny流。
如图10所示,对于视频流,MM节点在时间戳(-10)处通过对视频流的进行场景识别,获得LV1信息,可以获得视频的场景,从而可以得知视频的场景。通过场景识别获得的信息应送入LV0统计决策模块中。需要说明的是,LV0结果需要结合最终的统计结果而定。也就是说,待用户拍摄完成后,可以统计生成唯一的LV0结果,LV0结果用于表征整个视频的主题和氛围。
MM节点可以每间隔10帧获取一次场景信息(可通过场景识别模块获得)。比如,在时间戳t(-15)及时间戳t(-5),MM节点各自获取一次场景信息,然后进行图像转场比对。如果图像发生了转场,则统计一次转场。另外,在时间戳t(-5)也可以通过人脸检测模块进行人脸识别,并将识别到的人脸送入人脸ReID中。不论是图像的场景发生变化,还是人脸发生了变化,统计结果均可体现于LV2结果中。MM节点基于时间戳t(-15)及时间戳t(-5)生成LV2结果。通过LV2结果可以得知是否发生转场。
在当前时间戳t(0),MM节点依次对图像帧进行如下人体检测,动作检测,并基于检测到的动作进行评分,将动作评分送入LV3综合决策模块;同时,结合在时间戳t(-5)处的人脸检测模块输出的数据,对时间戳t(0)的图像帧进行画质/美学评分,得到画质/美学评分,送入LV3综合决策模块。LV3综合决策模块基于画质/美学评分和动作评分进 行综合决策,得到时间戳t(-1)的LV3结果(即精彩时刻的评分)。
以下结合图11描述MM节点在获得LV0-3的结果后,如何触发拍照逻辑。如图11所示,图11中示出了MM节点在当前时间戳t(0)的相关判断逻辑。如图11所示,包括以下步骤:
步骤0,录像开始时,初始化相对阈值。
初始阈值可以设置为较小的值,比如0。在相对阈值等于初始阈值时,可以清空缓存区。初始化相对阈值的目的在于保证至少可以拍出一张照片。
同时,还可以配置绝对阈值(或者说是分数阈值)。绝对阈值可认为是用于评价精彩时刻的量化指标。比如,绝对阈值可用thd_max_confid表示。
一个示例,绝对阈值用于分离出预定义的动作。每个动作类别可以具有对应的绝对阈值。可以理解,绝对阈值需要尽可能精确调试,以便准确识别出精彩时刻动作。该绝对阈值表示大概率动作被探测。
举例来说,对于跳跃动作,可预先设置与跳跃动作对应的绝对阈值。如果检测到图像帧中有跳跃动作,则可以对该跳跃动作进行评分,然后将评分与跳跃动作对应的绝对阈值进行比较。如果评分大于跳跃动作对应的绝对阈值,可以将该跳跃动作确定为精彩时刻,那么相应的图像帧可以认为是精彩时刻的照片。
步骤1,录像过程中,判断具有LV3数据的关键帧(比如时间戳t(-1))的LV3数据的分数是否大于绝对阈值。如果LV3数据的分数大于绝对阈值,则执行步骤6;如果否,则执行步骤2。
如果当前LV3分数比绝对阈值大,则表示间戳t(-1)的图像帧以非常高的置信度落入预先定义的动作类别中。在没有其他拍照进程冲突影响下,可以直接触发RAW域算法拍照。
另外,此处需要额外考虑Tiny流与零秒延迟(zero shutter lag,ZSL)序列的对齐问题。这是因为,MM节点是工作在Tiny数据上的,因此拍照需要选择其对应的RAW数据。
步骤2,判断LV3数据的分数是否大于相对阈值。如果LV3数据的分数比相对阈值大,则执行步骤3。
步骤3,将RAW数据复制到缓存区中,同时,更新相对阈值以使其保持最新最高值。
上述缓存区用于存储RAW数据。前文分析或检测利用的是tiny流,此处存储到缓存区中的数据是tiny流对应的原始(RAW)图像帧。比如,此处将时间戳t(-7)至t(0)对应的RAW图像帧存储在缓存区中。
应理解,本申请实施例对缓存区的类型对此不作限定。比如,所述缓存区可以是一个单独设置的buffer。又比如,若电子设备采用ZSL拍照系统,那么缓存区可以是ZSL拍照系统中的ZSL buffer。步骤4,在当前时间戳判断是否发生转场,以及该转场距离上一次转场的时间是否超过最短转场时间限制阈值(比如,thd_change),如果是,则判断该转场为一次可以触发拍照的转场,继续执行步骤5。这里,引入最短转场时间限制阈值的目的在于:防止频繁转场导致的拍照过于频繁。
步骤5,判断缓存区是否不为空以及相对阈值是否小于绝对阈值。
如果当前相对阈值还小于绝对阈值,则说明此转场片段还没有触发过拍照,此时可以执行步骤6,将暂存缓存区中的RAW数据送入拍照通路触发一次拍照。此设计使一个转 场下至少能输出一张照片。
步骤6,送RAW域拍照算法处理。
可选地,在步骤6中,将RAW数据送入RAW域拍照算法处理之前,还可以先判断当前拍照间隔是否大于最小拍照间隔。比如,最小拍照间隔设置为3s,如果判断出当前拍照间隔大于3s,才会触发拍照。这样设置的好处在于可以防止频繁触发拍照。并且,也可以避免手动抓拍与自动拍照发生冲突。当然,如果用户连续抓拍,可能会存在自动触发拍照受阻的情况。比如,如果用户连续抓拍导致自动拍照受阻,那么可以将用户手动抓拍的图像与视频关联起来,即可以考虑将用户抓拍的图像输出为精彩时刻的照片。
步骤7,拍照通路处理,送JPEG编码。在JPEG编码时,可以将对应的MM决策数据(比如LV3数据的评分)保存到EXIF中。
步骤8,按照LV3数据的评分高低出入JPEG队列。
高低出入JPEG队列是基于照片评分进行末位淘汰得到的。举例来说,假设当前JPEG队列保留有5张照片(比如图中示出的t(-4)至t(0)对应的5张照片),这5张照片中评分最低的照片记作照片X,此时如果输出1张照片Y,该照片Y的评分大于照片X的评分,那么照片X出队列,照片Y入队列。
比如,按照LV3数据的评分高低输出包含5个JPEG的队列,即永远保留评分最高的TOP5照片。可以理解,这里是以包含5个JPEG的队列为例进行说明,该JPEG队列中包含的JPEG的数量是可以设置的。
示例性地,如图12所示,将决策信息按照粒度由粗到细依次划分为LV0、LV1、LV2和LV3。LV0给出了整个视频的概括(summary),或者说视频的整体氛围。LV1在LV0的基础上将视频划分为3个类别的视频片段,比如,所属类别分别为肖像(portrait)、风景(landscape)以及建筑物(building)。LV2在LV1的基础上获得场景发生变化的信息(比如发生转场的帧号),具体包括3次转场。LV3在LV2的基础上,获得以下精彩时刻:MM1(在第一次转场和第二次转场之间)、MM2(在第一次转场和第二次转场之间)、MM3(在第二次转场和第三次转场之间),MM4(第三次转场以后)。可以看到,第一次转场和第二次转场之间发生了两次MM。当然,对于同一场景下的MM1与MM2,为避免相似场景下照片推荐数量过多,在决策时可比较MM1与MM2的评分,保留评分较高的MM。
为便于理解,以下结合图13-图17描述录制视频中获得精彩时刻MM的流程。
假设拍摄的对象是山脉,用户在录制视频过程中移动手机,录制不同的画面。图13-图16是用户在录制视频时不同时刻的界面示意图。图17是针对图13-图16所示界面的时间戳示意图。
如图13中(1)所示的界面,用户点击控件801,开启录像模式。此处手机已开启一录多得功能。开启一录多得的方式可以参考前文图2至图4的描述,这里不再赘述。录像的起始时间为00分钟00秒00毫秒(表示为00:00:00)。
如图13中(2)所示的界面,在录像开始后,界面中包括拍照控件802、停止控件803和暂停控件804。如果用户点击停止控件803,可结束录制;如果用户点击控件802,可以在录像过程中进行手动抓拍。如果用户点击暂停控件804,可暂停录制。在时刻00:02:15的录制画面如图13中(2)所示,此时的画面呈现了山的全貌,画面内容为山(记作山A)。用户手持手机继续移动,得到时刻00:05:00的画面如图13中(3)所示,界面中显示画面 内容为山的一部分。
示例性地,MM节点对时刻00:00:00到00:05:00的视频流进行检测,可识别到时刻00:00:00到00:05:00的视频片段的语义场景或类别为山脉。例如,MM节点可识别出该视频片段的场景为山A。进一步地,MM节点识别到时刻00:00:00到00:05:00中,时刻00:02:15的画面呈现了山A的全貌。MM节点对时刻00:02:15的画面的基础画质、构图是否美观等因素进行判断,得到此刻画面帧的评分为65,将此刻确定为第一MM。
在录制时刻00:05:00到00:06:00时,镜头快速移动了1秒。时刻00:06:00的画面如图14中(1)所示,画面内容为另一个山的部分。时刻00:05:00的画面与时刻00:06:00的画面不同。MM节点对时刻00:05:00到时刻00:06:00的视频流进行检测,可认为在时刻00:06:00发生一次转场(转场是指场景发生变化),该转场类型为快速运镜。因此,在后端剪辑精选短视频时,抛弃时刻00:05:00到时刻00:06:00的内容。
用户继续手持手机移动,拍摄另一山脉。比如,时刻00:08:54的画面如图14中(2)所示,此时的画面中呈现了山的全貌,画面中的内容为山(记作山B)。时刻00:11:00的画面如图15中(1)所示,画面中的内容为天空。从图14中(1)所示的界面、图14中(2)所示的界面以及图15中(1)所示的界面得知,在时刻00:11:00,画面内容发生改变,因此认为在时刻00:11:00发生一次转场。MM节点通过对时刻00:06:00至时刻00:11:00的视频流进行检测,得知在时刻00:11:00场景发生变化。进一步地,MM节点对从时刻00:06:00至时刻00:11:00的视频流的MM进行检测,得到第二MM为时刻00:08:54,评分为79。
在拍摄完山脉后,用户移动手机,以期拍摄天空。比如,时刻00:18:50的画面如图15中(2)所示,画面中的内容为天空。时刻00:20:00的画面如图15中(3)所示,画面中的内容为天空。MM节点对00:11:00到00:20:00的视频流进行检测,可识别到该视频片段的场景类别为天空。进一步地,MM节点对00:11:00到00:20:00的MM进行检测,可得到第三MM为时刻00:18:50,评分为70。
在拍摄完天空后,用户手持手机快速移动,以期拍摄人像。比如,从时刻00:20:00到00:25:00,镜头快速移动了5秒。时刻00:25:00的画面如图16中(1)所示。从图16中(1)所示的界面可知,在时刻00:25:00人物进入镜头。时刻00:20:00的画面与时刻00:25:00的画面不同。MM节点对时刻00:20:00到时刻00:25:00的视频流进行检测,得知在时刻00:25:00场景发生变化,发生一次转场,该转场类型为快速运镜。因此,在后端剪辑精选短视频时,抛弃从00:20:00到00:25:00的内容。
时刻00:28:78的画面如图16中(2)所示,在时刻00:28:78人物发生回眸。时刻00:30:99的画面如图16中(3)所示,在时刻00:30:99人物发生另一回眸。如图16中(4)所示,在时刻00:35:00用户可点击控件803,结束录制。
从时刻00:25:00到时刻00:35:00,MM节点检测到场景类别为人物。进一步地,MM节点对时刻00:25:00到时刻00:35:00的视频流的MM进行检测,得知两个精彩时刻分别为00:28:78和00:30:99,并结合以下因素分别对这两个时刻的画面进行评分:基础画质、人物、人物动作,得到这两个精彩时刻的评分分别为95和70。基于此,MM节点确定出第四MM为时刻00:28:78,以及第五MM为时刻00:30:99。
在上述图13至图16中,得到的5个MM分别为:时刻00:02:15、时刻00:08:54、时刻00:18:50、时刻00:28:78以及时刻00:30:99。基于多个MM所在的时间位置,可以生成 精彩短视频。其中,精彩短视频是由这些精彩MM对应的图像帧组成的,帧之间包括过渡。精彩短视频还包括这些帧附近的图像帧,例如,对于时刻00:25:00而言,精彩短视频除了包括00:25:00的图像帧以外,还包括00:24:58-00:25:02的图像帧。
换种方式描述,以图17中所示的时间轴为例,从录制开始(00:00:00)到录制结束(00:35:00),MM节点基于视频流检测到以下多个片段的信息:
片段Clip1:起始时间为00:00:00,结束时间为00:05:00,场景的类别为风景(比如,图13中所示该场景实际为风景A:山),该场景的精彩时刻MM为00:02:15,该MM的评分为65,起始帧的转场类型为:启动。
Clip2:起始时间为00:05:00,结束时间为00:06:00,场景的类别为动感节奏,该场景不存在MM,起始帧的转场类型为:快速运镜。因此,建议在精选视频中抛弃从00:05:00到00:06:00的内容。
Clip3:起始时间为00:06:00,结束时间为00:11:00,场景的类别为风景(比如,图14中所示该场景实际为风景B:山),在该场景的MM为00:08:54,该MM的评分为79,起始帧的转场类型为:内容变化。
Clip4:起始时间为00:11:00,结束时间为00:20:00,场景的类别为天空(比如,图15中所示该场景实际为风景C:天空),在该场景的MM为00:18:50,该MM的评分为70,起始帧的转场类型为:内容变化。
Clip5:起始时间为00:20:00,结束时间为00:25:00,场景的类别为动感节奏,在该场景中不存在MM,起始帧的转场类型为:快速运镜。因此,建议在精选视频中抛弃从00:20:00到00:25:00的内容。
Clip6:起始时间为00:25:00(该时间戳检测到人物入镜头),结束时间为00:35:00,场景的类别为人物,该场景的MM为00:28:78(比如,图16中(2)所示在该时间戳检测到回眸的动作)以及00:30:99(比如,图16中(3)所示在该时间戳检测到回眸的动作),这两个MM的评分分别为95,70,起始帧的转场类型为:内容变化。
需要说明的是,上述6个Clip可以认为是将录制的原视频划分为6个视频片段,或者说是基于识别到的语义层面的信息(可记作LV1信息)将原视频划分为6个视频片段。根据划分的每个视频片段,可进一步识别出发生转场的信息(即LV2信息)。接着,可以在每个视频片段中检测精彩时刻的信息(即LV3信息),以便确定精彩时刻MM。另外,在整个视频录制结束后,可以基于原视频确定整段视频的主题或风格,即LV0信息。
另外,当多个视频片段的总MM数量,超出了最终能够呈现的精彩时刻照片的数量限制时,可优先保留评分较高的MM的照片。举例来说,假设录制的视频划分了4个视频片段,精彩时刻MM的照片数量限制为4,经过分析得知:第1个视频片段中包含了2个MM,第2个视频片段到第4个视频片段中分别包含1个MM,即一共确定出5个MM,那么需要比较第1个视频片段中2个MM的评分,保留第1个视频片段中评分较高的MM,同时,为了保证每个视频片段至少需要输出1个MM,还需要将第2个视频片段到第4个视频片段中分别包含的1个MM作为最终输出的精彩时刻照片,即最终输出了4张精彩时刻照片。
在图17中的示例中,可获得5张MM的照片。并且,还可以基于上述5个MM生成15秒的精选视频。在基于上述5个MM生成15秒精选短视频时,可以基于MM在完整视 频中所在的时间位置,以及,前后转场位置关系,确定具体剪裁多少秒。比如,以某个MM所在的时间为中心点,向两侧扩展并避开发生转场的时间戳,并且针对每个MM都这样操作,直到视频时长满足预设时长(比如15秒),得到预设时长的精选视频。
可以理解,图13-图17中的示例只是便于本领域技术人员进行理解,并不对本申请实施例的保护范围构成限定。
请参考图18,为本申请实施例提供的视频处理方法的示意性流程图。如图18所示,该方法包括:
S701,响应于用户的第一操作,录制第一视频。
第一操作可以是录制操作。比如,如图13中(1)所示的界面,第一操作可以是用户点击录像控件801的操作,响应于用户点击录像控件801的操作,电子设备开始视频录制。
第一视频是用户录制的原始视频。比如,第一视频是图5中(2)的视频302(时长为16分15秒)。又比如,第一视频是图6中(4)所示界面中正在播放的视频(时长为56秒)。
S702,显示第一界面,所述第一界面是所述第一视频的播放界面,所述第一界面中包括第一控件和第一区域,所述第一区域显示第一照片的缩略图和第二照片的缩略图,所述第一照片是在第一时刻自动拍摄的,所述第二照片是在第二时刻自动拍摄的,所述第一视频的录制过程中包括所述第一时刻和所述第二时刻,其中,所述第一视频包括第一视频片段和第二视频片段,所述第一视频片段为第一场景,所述第二视频片段为第二场景,所述第一照片是所述第一视频片段中的照片,所述第二照片是所述第二视频片段中的照片,所述第一照片的评分大于第一阈值,所述第二照片的评分大于第二阈值。
第一照片和第二照片可以理解为精彩时刻的照片。应理解,此处是第一照片和第二照片为例进行说明,并非限定精彩时刻的照片只有两张,事实上,精彩时刻的照片可以多张,本申请实施例对此不作限定。
第一时刻与第二时刻是在录制视频过程中识别到的精彩时刻。
第一视频片段和第二视频片段是第一视频中的不同视频片段,或者说是不同场景下的视频片段。第一视频片段是第一场景,第二视频片段是第二场景。
应理解,第一视频片段和第二视频片段可以是第一视频中连续的视频片段,也可以是不连续的视频片段,对此不作具体限定。
比如,第一视频片段是图13中(1)至图13中(3)所示的界面,即,起始时间为00:00:00,结束时间为00:05:00,该视频片段的场景(可对应第一场景)为山脉。第二视频片段是图16中(1)至图16中(4)所示界面对应的视频片段,起始时间为00:25:00,结束时间为00:35:00,该视频片段的场景类别(可对应第二场景)为人物。
第一场景和第二场景是不同的场景。所述第一视频片段到所述第二视频片段间发生了一次转场。
比如,第一场景是山脉(比如图14中(1)和图14中(2)所示的界面,其场景为山脉),第二场景是天空(比如图15中(1)至图15中(3)所示的界面,其场景为天空)等。
又比如,第一场景是天空(比如图15中(1)至图15中(3)所示的界面,其场景为天空),第二场景是人物(比如图16中(1)至图16中(4)所示的界面,其场景为人物)。
应理解,此处对第一场景和第二场景的举例只是示例性描述,本申请实施例并不限于此。还应理解,第一视频中除了上述描述的第一场景和第二场景以外,也可以包含更多的场景,本申请实施例不作具体限定。
还应理解,本申请实施例对第一照片和第二照片的精彩时刻的具体类型不作具体限定。
作为一种可能的实现方式,第一照片是第一类型的动作,第二照片是第二类型的动作。也就是说,第一照片与第二照片是不同类型的人物动作。关于人物动作的判断维度可参考前文介绍,此处不作具体阐述。比如,第一类型的动作是跳跃。第二类型的动作是回眸等。又比如,第一类型的动作是踢球,第二类型的动作是长发飘逸。
作为一种可能的实现方式,第一照片是风景(比如山脉、天空等),第二照片是人物(或者说人像、肖像等)。
作为一种可能的实现方式,第一照片是山脉,第二照片是天空。应理解,上述关于第一照片和第二照片的类型描述只是示例性描述,本申请实施例并不限于此。
比如,第一时刻是图13中(2)所示界面中的时刻00:02:15,第一照片是时刻00:02:15对应的画面,第一照片的评分为65。又比如,第二时刻是图14中(2)所示界面中的时刻00:08:54,第二照片是时刻00:08:54对应的画面,第二照片的评分为79。
又比如,第一时刻是图15中(2)所示界面中的时刻00:18:50,第一照片是时刻00:18:50对应的画面,第一照片的评分为70。又比如,第二时刻是图16中(2)所示界面中的时刻00:28:78,第二照片是时刻00:28:78对应的画面,第二照片的评分为95。
第一照片的评分应满足第一阈值。以第一照片是第一类型动作为例,上述第一阈值是第一类型动作对应的绝对阈值。通过第一阈值可以判断第一照片的评分是否满足精彩时刻评分的标准。如果判断出第一照片的评分大于第一阈值,则说明第一照片是精彩时刻的照片。
举例说明,假设第一阈值设置为60,如果第一照片的评分是70,则第一照片的评分满足精彩时刻评分的标准。
需要说明的是,如果第一视频片段中决策出了多种精彩时刻的照片,那么这些精彩时刻的照片均应该满足第一阈值。此处以第一视频片段中还包括第三照片为例进行说明,可选地,所述第一视频片段中还包括第三照片,所述第三照片是在第三时刻自动拍摄的,所述第三时刻的评分大于所述第一阈值。换句话说,第三照片也是第一视频片段中的精彩时刻照片。
应理解,上述是以评分大于第一阈值作为评价精彩时刻照片的标准的一种可能实现方式,事实上本申请实施例并不限于此,还可以有多种实现方式。比如,也可以设置一个精彩时刻分数范围,若照片的评分落入该分数范围,则认为照片是精彩时刻照片。又比如,第一阈值的端点值也可以包含在精彩时刻照片的范畴内,如评分等于第一阈值时也可以认为照片为精彩时刻照片。
作为一种可能的实现方式,所述方法还包括:在自动拍摄所述第一照片之前,获取第四照片的评分,所述第四照片的评分小于或等于所述第一阈值,且,大于第三阈值;将所述第三阈值的取值更新为所述第三照片的评分。
此处的第三阈值指的是相对阈值,第一阈值是绝对阈值。在得到第四照片的评分时,如果第四照片的评分不满足大于绝对阈值(第一阈值)的情形,那么继续判断第四照片的 评分与第三阈值的关系;如果第四照片的评分大于第三阈值,则将第三阈值更新为第四照片的评分,以使得相对阈值始终保持最新最高值,具体可以参考前文图11中的步骤2和步骤3的理解。
上述第一场景和第二场景是不同的场景。所述第一视频片段到所述第二视频片段间发生了一次转场。
作为一种可能的实现方式,可选地,所述第二视频片段中还包括所述第五照片,所述第五照片是在发生转场时自动拍摄的。也就是说,为了保证在第二视频片段中至少能够输出一张照片,所以在发生转场时可以先触发自动拍照,获得转场帧(比如第五照片)。当然,是否保留第五照片还要取决于后续是否出现比第五照片的评分更高的照片。这里,随着视频的录制,如果判断出第二视频片段中出现了比第五照片的评分更高的照片,比如,第二照片,那么可以用第二照片替换第五照片,即输出第二照片。
可选地,所述第一区域中还包括所述第五照片的缩略图。
也就是说,如果所述第五照片的评分也大于第二阈值,则第五照片也可以判定为第二视频片段中的精彩时刻照片,即第五照片的缩略图可呈现在第一区域中。类似地,第二阈值是用于判断第二视频片段中的精彩时刻的绝对阈值。
可选地,所述转场距离上一次转场的时间大于时间阈值。
该时间阈值可以对应前文图11中的步骤4中的最短转场时间限制阈值。也就是说,为了避免频繁触发转场拍照,可以设置时间阈值。
可选地,所述第三阈值(相对阈值)小于所述第二阈值。第二阈值是用于判断第二照片评分的绝对阈值。
在第二视频片段中,如果相对阈值小于绝对阈值的话,说明该转场片段下还未触发过自动拍摄(或者说第二视频片段中还没触发过自动拍摄),因此为了保证一个转场片段下至少能输出一张照片,可以在转场帧触发自动拍摄,即获得上述第五照片。
可选地,所述第一界面还包括播放进度条,所述播放进度条用于显示所述第一视频的播放进度。
以图6中的界面示意为例,第一界面是图6中(4)所示的界面。第一控件是图6中(4)的906,第一区域是904。第一照片的缩略图和第二照片的缩略图可显示于904中。播放进度条是图6中(4)的907。
又比如,第一界面是图7中(1)所示的界面。第一控件是图7中(1)的906,第一区域是904。
可选地,所述第一照片的分辨率大于在所述第一视频中截取的图像的分辨率。关于图像分辨率的不同,本申请实施例在前文图2中(4)处的已进行相关描述,这里不再赘述。
S703,响应于对所述第一控件的第二操作,显示第二界面,所述第二界面是第二视频的播放界面,所述第二视频的时长小于所述第一视频的时长,所述第二视频中至少包括所述第一照片。
其中,第二视频可以理解为是第一视频的精彩短视频。精彩短视频的组成方式在前文有提及,相关描述可以参考前文,此处不再赘述。
比如,第二界面是图6中(5)所示的界面。第二视频是图6中(5)所示的15秒视频。该15秒视频中至少包括904中的一张照片。
可以理解,第二视频可以包括部分精彩时刻的照片,也可以包括全部精彩时刻的照片,本申请实施例对此不作具体限定。
可选地,所述第二视频中还包括所述第二照片。
在一种可能的实现方式中,所述方法还包括:
响应于用户的第三操作,显示第三界面,所述第三界面为图库应用的界面,所述第三界面包括第二控件;
所述显示第一界面,包括:响应于对所述第二控件的第四操作,显示所述第一界面。
上述第三操作可以是用户在图库应用中查看上述第一视频的操作。比如,第三界面可以是图6中(3)所示的界面。第二控件是播放控件。比如,第二控件可以是图6中(3)所示的915。
在一种可能的实现方式中,所述第三界面还包括第一提示窗口,所述第一提示窗口用于向用户提示已生成了所述第一照片和所述第二照片。
也就是说,在首次进入时,可以引导用户查看精彩时刻的照片。比如,所述第一提示窗口可以是图6中(3)所示的905。
在一种可能的实现方式中,所述第一提示窗口的亮度以及所述第一区域的亮度,高于所述第一界面中除去所述第一区域和所述第一提示窗口以外的区域的亮度。
因此,可以通过高亮显示的方式,引起用户对所述第一提示窗口的注意,以达到更醒目的提醒效果,提升用户体验。
在一种可能的实现方式中,所述方法还包括:响应于用户的第五操作,停止对所述第一视频的录制,显示第四界面,所述第四界面包括预览缩略图选项;
其中,所述响应于用户的第三操作,显示第三界面,包括:
响应于用户对所述预览缩略图选项的第六操作,显示所述第三界面。
第五操作是触发停止录制的操作。比如,所述第五操作可以是用户点击图6中(1)所示控件901的操作。第四界面可以是图6中(2)所示的界面。
在录制结束的界面中,还可以显示当前录制的视频的预览缩略图选项。用户在点击预览缩略图选项后,可以跳转至图库应用中,显示当前录制的视频(非播放状态)。比如,预览缩略图选项可以是图6中(2)的903。第六操作可以是用户点击903的操作。在用户点击903后,显示图6中(3)所示的界面,其中包含播放控件915。
在一种可能的实现方式中,所述第四界面还包括第二提示窗口,所述第二提示窗口用于向用户提示已经生成所述第一照片、所述第二照片以及所述第二视频。
当然,如果是首次使用一录多得功能,可以通过提示窗口引导用户查看一录多得的内容。比如,第二提示窗口可以是图6中(2)所示的902。
在一种可能的实现方式中,在录制所述第一视频之前,所述方法还包括:
响应于用户的第七操作,开启一录多得功能。
上述第一视频是在开启了以录多得功能的前提下录制的。开启一录多得功能的实现方式在前文已经描述,具体可以参考图2至图4中的描述。比如,开启一录多得功能可以通过图2中(4)所示的404设置。
可选地,应用可以设置录制视频的最小时长,当录制时长小于最小时长时,不会回调视频的一录多得特性。在一种可能的实现方式中,第一视频的时长大于或等于预设时长。 比如,预设时长设置为15秒,当用户的录制时长小于15秒时,不会回调一录多得照片。比如,录制视频的最小时长可以通过图2中(4)所示的405设置。
可选地,所述第二界面还包括音乐控件;
响应于用户对所述音乐控件的第八操作,显示多个不同的音乐选项。
用户可以对第二视频实现配乐操作。比如,第二界面可以是图7中(2)所示的界面。音乐控件可以是图7中(2)所示的音乐控件910。
可选地,所述第二界面还包括风格控件;所述方法还包括:
响应于用户对所述风格控件的第九操作,显示多个不同的风格选项。
比如,风格控件可以是图7中(2)所示的风格控件912。
用户可以对第二视频添加风格。关于风格的描述在前文已提及,相关描述可以参考前文,此处不再赘述。
可选地,图库应用中包括第一相册,所述第一相册中包括所述第一照片和所述第二照片。
这里作统一说明,第一相册可以参考前文描述的一录多得相册。相关描述可以参考前文,这里不再赘述。
可选地,所述第一相册还包括所述第二视频的虚拟视频。虚拟视频的含义参考前文的解释。
可选地,所述第二界面还包括:分享控件或保存控件;
响应于用户对所述分享控件或保存控件的第十操作,生成所述第二视频的视频文件;
将所述视频文件存储在所述第一相册中。
比如,分享控件是图7中(2)所示的909。又比如,保存控件是图7中(2)所示的908。
可选地,所述视频文件占用的存储空间大于所述虚拟视频占用的存储空间。
在一种可能的实现方式中,所述第一界面还包括删除选项;所述方法还包括:响应于用户对所述删除选项的第十一操作,显示第三提示窗口,所述第三提示窗口用于提示用户是否删除所述第二视频、所述第一照片以及所述第二照片。
比如,删除选项如图6中(4)所示的删除选项。
也就是说,若接收到用户删除录制的原始视频的请求(第十三操作),在用户界面显示提示信息,以便提示用户是否删除与所述原始视频关联的图像和视频(比如,第一照片、第二照片以及第二视频)。这样,如果用户希望保留与所述原始视频关联的图像和视频,可以选择保留与所述原始视频关联的图像和视频,避免了数据丢失,有助于提升用户体验。如果用户希望一并删除,那么将原始视频以及与所述原始视频关联的图像和视频一并删除,有助于节省空间。
上述与第一视频关联的精彩时刻的照片可以自动保留预设时长,比如N天,N小时等其他时间单位。比如,预设时长可以是出厂设置好的,或者,也可以由用户自主设置,对此不作限定。在一种可能的实现方式中,所述方法还包括:如果在N天后未接收到用户查看所述第一照片的操作,自动删除所述第一照片。可以理解,这里仅是第一照片为例进行说明,第二照片也可以是在保留N天后自动删除。
可选地,在N天后未接收到用户查看所述第二视频的操作,自动删除所述第二视频。
在一种可能的实现方式中,所述第二视频中还包括所述第一照片的附近图像帧,所述附近图像帧是基于所述第一时间标签确定的;
其中,所述附近图像帧包括所述第一时间标签的前A个时刻对应的图像帧和所述第一时间标签的后B个时刻对应的图像帧,A大于或等于1,B大于或等于1。
在前文介绍图5中(6)的界面时,介绍了15秒精彩短视频309的获得方式。以精彩短视频309的获得方式中的一个精彩时刻为例说明附近图像帧,比如,假设第5分10秒是精彩时刻(对应第一时刻),第5分10秒对应的图像帧是精彩时刻照片(对应第一照片),那么第5分9秒对应的图像帧和第5分11秒对应的图像帧为所谓的附近图像帧。
可选地,所述第二视频中去除发生转场的时刻对应的图像帧,所述转场是指场景发生变化。这里举例说明去除发生转场的时刻对应的图像帧的含义。示例性地,在获得5个精彩时刻MM的照片后,基于5个MM的照片生成第二视频,以每个MM所在的时间为中心点,向两侧扩展并避开发生转场的时间戳,并且针对每个MM都这样操作,直到第二视频的时长满足预设时长(比如15秒),得到预设时长的精选短视频。
在一种可能的实现方式中,所述第一时刻是基于第一时间标签确定的。所述第一时间标签是基于第一层级信息、第二层级信息和第三层级信息确定的,所述第一层级信息用于表征视频的主题或场景,所述第二层级信息用于表征视频的场景发生变化,所述第三层级信息用于表征精彩时刻。
比如,在具体实现时,MM节点可以实时获得视频流的多个粒度的层级信息,以便识别精彩时刻。
作为一种可能的实施例,在录制视频时实时获取视频流的多个层级信息,并基于所述多个层级信息识别视频的精彩时刻;在所述视频的精彩时刻自动触发拍照以获得精彩时刻的照片(比如在第一时刻自动拍摄第一照片,在第二时刻自动拍摄第二照片);其中,所述多个层级信息包括第一层级信息,第二层级信息和第三层级信息,所述第一层级信息用于表征视频的主题或场景,所述第二层级信息用于表征视频的场景发生变化,所述第三层级信息用于表征精彩时刻。
作为一种可能的实现方式,在识别到精彩时刻时可生成时间标签(或者说视频标签)。时间标签是指精彩时刻在第一视频中的时间位置。比如,第一时刻对应第一时间标签。基于时间标签可生成精选视频。
在具体实现时,HAL中的MM节点在录制视频时,可实时判断录像过程中的精彩时刻(或者说精彩瞬间),并在识别到精彩时刻时自动触发拍照,获得精彩时刻的图像。
可选地,在录制视频时,通过自动触发拍照获得的精彩时刻的图像的数量可以设置或基于需求调整。比如,可以设置获得的精彩时刻的图像最多为5张。
可选地,为了实现录制的视频与上述精彩时刻的照片的关联,可以将所述录像标识以及时间标签写入抓拍图像的JPEG信息中。示例性地,所述多帧图像的JPEG信息携带可交换图像文件格式EXIF信息,所述EXIF信息包括所述录像标识以及所述视频标签的信息。可以理解,EXIF信息还可以包括其他JPEG数据,比如,标准信息,缩略图,水印信息等。
可选地,所述方法还包括:
响应于所述第一操作,生成请求消息,所述请求消息中包括第一标识;
其中,所述第一照片与所述第二视频通过所述第一标识关联。
所述请求消息可以称作录像请求。所述录像请求用于触发相机应用开启录像模式。比如,如图13中(1)所示的界面,用户点击录像控件801,即可触发相机应用开启录像模式。
比如,第一标识称作录像标识,录像标识为UUID。
可选地,在图库应用中,录制的原始视频(第一视频),第一照片,以及第二照片可以通过第一标识关联。
可选地,在图库应用中,录制的原始视频(第一视频),第一照片,第二照片以及精选短视频(第二视频)在图库应用中可通过数据库实现关联。这样,用户在查看原始视频时,可以选择查看与第一视频关联的精彩时刻的照片以及精选短视频。
当用户在查看录制的原始视频时,如果触发查看操作,可以查看与该视频相关联的精彩时刻的照片或精选视频。比如,如图5中(5)所示,该查看操作是指向上滑动屏幕,界面上呈现标识306,提示用户将呈现与该视频302关联的“一录多得”界面;当用户完成上滑操作,手指离开屏幕后,手机显示如图5中(6)所示的界面。又比如,该查看操作可以是点击图6中(4)所示的906,进入图6中(5)所示的界面。
需要说明的是,上述时间标签是指精彩时刻在视频中的位置,具体可以为精彩时刻对应的时间戳。所述时间标签可用于生成精选视频,可以理解为:在录像结束后,可以根据视频中的标签位置,自动生成播放策略,该播放策略可用于生成精选视频(或者说精彩短视频)。并且,在用户需要对精选视频执行操作(比如,该操作可以指分享该精选视频)时,才利用所述视频标签生成精选视频,而并非在录制视频过程中实时生成精选视频,这样可以节省存储空间。
图20给出了本申请提供的一例MM节点(这里,MM节点可以是图8中硬件抽象层中的MM节点)工作的示例图。如图20所示,缓存中包含16个时间戳的RAW数据。MM节点当前执行的是时间戳5和时间戳14的图像帧的比较。由于算法存在延时,在算法当前帧(时间戳14)得到的当前局部最优帧实际是时间戳11的图像帧。MM节点通过分析时间戳(比如时间戳18、时间戳34和时间戳50)的LV1信息,可以识别到场景为生日。MM节点通过分析比对时间戳0和时间戳16的图像信息(LV2信息),可得知在时间戳16发生了转场。MM节点分析送入算法的当前帧(时间戳14)的LV3信息(比如,LV3信息包括以下维度:人脸相关、图像构图评价、动作检测以及基础图像质量评价),并利用MM比较策略对时间戳14对应图像帧的评分与时间戳5(之前局部最优帧)的评分进行比较。另外,MM节点在获得图像帧的RAW数据时,可以将RAW数据暂存在缓存区。MM节点在识别到分数较高的数据帧时,将暂存在缓冲区的RAW数据送入拍照通路,触发RAW域算法拍照。MM节点可以将数据库(数据库包含图20中所示的不同粒度的决策信息,比如,时间戳5的信息:ID5,cls1,pri2,分数96;时间戳14的信息:ID14,cls1,pri2,分数99;发生转场时间戳:16,82,235,…主题:生日)反馈至相机框架层。
应理解,图20中的工作图只是示例描述,并不对本申请实施例构成限定。
在本申请实施例中,在录制视频时还可以支持用户手动抓拍图像,以期提升用户的拍摄体验。比如,参考图13中(2)所示的界面,在录像过程中,用户点击控件802可进行收到抓拍。
作为一种可能的实现方式,所述方法还包括:在录制视频时接收拍照请求,所述拍照 请求携带(手动)抓拍标记;响应于所述拍照请求,触发拍照并获得第一图像,所述第一图像对应的可交换图像文件格式EXIF信息中包括所述抓拍标记。
在具体实现时,HAL层支持手动抓拍能力,通过拍照通路处理,可以生成第一图像以及对应的EXIF信息。
由上可知,本申请提供的视频处理方法,在录像过程中,用户即可同时获得高质量的精彩时刻的照片以及视频,极大提高了用户体验。
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器执行时实现本申请中任一方法实施例所述的方法。
该计算机程序产品可以存储在存储器中,经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器执行的可执行目标文件。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。
该计算机可读存储介质可以是易失性存储器或非易失性存储器,或者,可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和设备的具体工作过程以及产生的技术效果,可以参考前述方法实施例中对应的过程和技术效果,在此不再赘述。
在本申请所提供的几个实施例中,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的方法实施例的一些特征可以忽略,或不执行。以上所描述的装置实施例仅仅是示意性的,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统。另外,各单元之间的耦合或各个组件之间的耦合可以是直接耦合,也可以是间接耦合,上述耦合包括电的、机械的或其它形式的连接。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般 表示前后关联对象是一种“或”的关系。
总之,以上所述仅为本申请技术方案的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (27)

  1. 一种视频处理方法,其特征在于,包括:
    响应于用户的第一操作,录制第一视频;
    显示第一界面,所述第一界面是所述第一视频的播放界面,所述第一界面中包括第一控件和第一区域,所述第一区域显示第一照片的缩略图和第二照片的缩略图,所述第一照片是在第一时刻自动拍摄的,所述第二照片是在第二时刻自动拍摄的,所述第一视频的录制过程中包括所述第一时刻和所述第二时刻;
    其中,所述第一视频包括第一视频片段和第二视频片段,所述第一视频片段为第一场景,所述第二视频片段为第二场景,所述第一照片是所述第一视频片段中的照片,所述第二照片是所述第二视频片段中的照片,所述第一照片的评分大于第一阈值,所述第二照片的评分大于第二阈值;
    响应于对所述第一控件的第二操作,显示第二界面,所述第二界面是第二视频的播放界面,所述第二视频的时长小于所述第一视频的时长,所述第二视频中至少包括所述第一照片。
  2. 根据权利要求1所述的方法,其特征在于,所述第一视频片段中还包括第三照片,所述第三照片是在第三时刻自动拍摄的,所述第三照片的评分大于所述第一阈值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一视频片段到所述第二视频片段间发生了一次转场。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一照片是第一类型的动作,所述第二照片是第二类型的动作。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一照片是风景,所述第二照片是人物。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    在自动拍摄所述第一照片之前,获取第四照片的评分,所述第四照片的评分小于或等于所述第一阈值,且,大于第三阈值;
    将所述第三阈值的取值更新为第三照片的评分。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述第二视频片段中还包括第五照片,所述第五照片是在发生转场时自动拍摄的。
  8. 根据权利要求7所述的方法,其特征在于,所述第一区域中还包括所述第五照片的缩略图。
  9. 根据权利要求7或8所述的方法,其特征在于,所述转场距离上一次转场的时间大于时间阈值。
  10. 根据权利要求6至9中任一项所述的方法,其特征在于,所述第三阈值小于所述第二阈值。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述方法还包括:
    响应于用户的第三操作,显示第三界面,所述第三界面为图库应用的界面,所述第三界面包括第二控件;
    所述显示第一界面,包括:响应于对所述第二控件的第四操作,显示所述第一界面。
  12. 根据权利要求11所述的方法,其特征在于,所述第三界面还包括第一提示窗口,所述第一提示窗口用于向用户提示已生成了所述第一照片和所述第二照片。
  13. 根据权利要求12所述的方法,其特征在于,所述第一提示窗口的亮度以及所述第一区域的亮度,高于所述第一界面中除去所述第一区域和所述第一提示窗口以外的区域的亮度。
  14. 根据权利要求11至13中任一项所述的方法,其特征在于,所述方法还包括:
    响应于用户的第五操作,停止对所述第一视频的录制,显示第四界面,所述第四界面包括预览缩略图选项;
    其中,所述响应于用户的第三操作,显示第三界面,包括:
    响应于用户对所述预览缩略图选项的第六操作,显示所述第三界面。
  15. 根据权利要求14所述的方法,其特征在于,所述第四界面还包括第二提示窗口,所述第二提示窗口用于向用户提示已经生成所述第一照片、所述第二照片以及所述第二视频。
  16. 根据权利要求1至15中任一项所述的方法,其特征在于,图库应用中包括第一相册,所述第一相册中包括所述第一照片和所述第二照片。
  17. 根据权利要求16所述的方法,其特征在于,所述第一相册还包括所述第二视频的虚拟视频。
  18. 根据权利要求17所述的方法,其特征在于,所述第二界面还包括:分享控件或保存控件;
    响应于用户对所述分享控件或保存控件的第十操作,生成所述第二视频的视频文件;
    将所述视频文件存储在所述第一相册中。
  19. 根据权利要求18所述的方法,其特征在于,所述视频文件占用的存储空间大于所述虚拟视频占用的存储空间。
  20. 根据权利要求1至19中任一项所述的方法,其特征在于,所述方法还包括:
    如果在N天后未接收到用户查看所述第一照片的操作,自动删除所述第一照片。
  21. 根据权利要求1至20中任一项所述的方法,其特征在于,所述第二视频中还包括所述第二照片。
  22. 根据权利要求1至21中任一项所述的方法,其特征在于,所述第二视频中去除发生转场的时刻对应的图像帧,所述转场是指场景发生变化。
  23. 根据权利要求1至22中任一项所述的方法,其特征在于,所述第一照片的分辨率大于在所述第一视频中截取的图像的分辨率。
  24. 一种电子设备,其特征在于,包括处理器和存储器,所述处理器和所述存储器耦合,所述存储器用于存储计算机程序,当所述计算机程序被所述处理器执行时,使得所述电子设备执行权利要求1至23中任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,使得所述处理器执行权利要求1至23中任一项所述的方法。
  26. 一种芯片,其特征在于,包括处理器,当所述处理器执行指令时,所述处理器执行如权利要求1至23中任一项所述的方法。
  27. 一种计算机程序产品,其特征在于,包括计算机程序,当所述计算机程序被运行时,使得计算机执行如权利要求1至23中任一项所述的方法。
PCT/CN2022/118147 2021-10-22 2022-09-09 一种视频处理方法和电子设备 WO2023065885A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22826792.8A EP4199492A4 (en) 2021-10-22 2022-09-09 VIDEO PROCESSING METHOD AND ELECTRONIC DEVICE

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111236229 2021-10-22
CN202111236229.1 2021-10-22
CN202210114568.0A CN116033261B (zh) 2021-10-22 2022-01-30 一种视频处理方法、电子设备、存储介质和芯片
CN202210114568.0 2022-01-30

Publications (1)

Publication Number Publication Date
WO2023065885A1 true WO2023065885A1 (zh) 2023-04-27

Family

ID=85382607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118147 WO2023065885A1 (zh) 2021-10-22 2022-09-09 一种视频处理方法和电子设备

Country Status (2)

Country Link
EP (1) EP4199492A4 (zh)
WO (1) WO2023065885A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109672899A (zh) * 2018-12-13 2019-04-23 南京邮电大学 面向游戏直播场景的精彩时刻实时识别与预录制方法
CN111061912A (zh) * 2018-10-16 2020-04-24 华为技术有限公司 一种处理视频文件的方法及电子设备
WO2021042364A1 (zh) * 2019-09-06 2021-03-11 华为技术有限公司 拍摄图像的方法和装置
WO2021163882A1 (zh) * 2020-02-18 2021-08-26 深圳市欢太科技有限公司 一种游戏录屏方法、装置及计算机可读存储介质
CN115002340A (zh) * 2021-10-22 2022-09-02 荣耀终端有限公司 一种视频处理方法和电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4536402B2 (ja) * 2004-03-05 2010-09-01 ソニー株式会社 映像再生装置、映像再生方法及びその方法をコンピュータに実行させるためのプログラム
JP2007189473A (ja) * 2006-01-13 2007-07-26 Hitachi Ltd 動画再生装置
US10204273B2 (en) * 2015-10-20 2019-02-12 Gopro, Inc. System and method of providing recommendations of moments of interest within video clips post capture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061912A (zh) * 2018-10-16 2020-04-24 华为技术有限公司 一种处理视频文件的方法及电子设备
CN109672899A (zh) * 2018-12-13 2019-04-23 南京邮电大学 面向游戏直播场景的精彩时刻实时识别与预录制方法
WO2021042364A1 (zh) * 2019-09-06 2021-03-11 华为技术有限公司 拍摄图像的方法和装置
WO2021163882A1 (zh) * 2020-02-18 2021-08-26 深圳市欢太科技有限公司 一种游戏录屏方法、装置及计算机可读存储介质
CN115002340A (zh) * 2021-10-22 2022-09-02 荣耀终端有限公司 一种视频处理方法和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4199492A4

Also Published As

Publication number Publication date
EP4199492A4 (en) 2024-01-24
EP4199492A1 (en) 2023-06-21

Similar Documents

Publication Publication Date Title
WO2023065884A1 (zh) 一种视频处理方法和电子设备
US11995530B2 (en) Systems and methods for providing feedback for artificial intelligence-based image capture devices
US10706892B2 (en) Method and apparatus for finding and using video portions that are relevant to adjacent still images
WO2021104508A1 (zh) 一种视频拍摄方法与电子设备
WO2021052414A1 (zh) 慢动作视频拍摄方法及电子设备
WO2023173850A1 (zh) 视频处理方法、电子设备及可读介质
WO2022068511A1 (zh) 视频生成方法和电子设备
CN115689963B (zh) 一种图像处理方法及电子设备
WO2023160241A1 (zh) 一种视频处理方法及相关装置
WO2024055797A9 (zh) 一种录像中抓拍图像的方法及电子设备
US20230188830A1 (en) Image Color Retention Method and Device
CN115802146A (zh) 一种录像中抓拍图像的方法及电子设备
WO2023065885A1 (zh) 一种视频处理方法和电子设备
WO2023036007A1 (zh) 一种获取图像的方法及电子设备
CN116033261B (zh) 一种视频处理方法、电子设备、存储介质和芯片
CN114827454A (zh) 视频的获取方法及装置
WO2023160142A1 (zh) 视频处理方法、电子设备及可读介质
CN116828099B (zh) 一种拍摄方法、介质和电子设备
WO2023207210A1 (zh) 一种视频处理方法及电子设备
WO2022228010A1 (zh) 一种生成封面的方法及电子设备
WO2023231696A1 (zh) 一种拍摄方法及相关设备
CN115567633A (zh) 拍摄方法、介质、程序产品及电子设备
CN117692762A (zh) 拍摄方法及电子设备
CN116700550A (zh) 视频处理方法、电子设备以及计算机可读存储介质
CN115914823A (zh) 拍摄方法及电子设备

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022826792

Country of ref document: EP

Effective date: 20221230