WO2022048347A1 - 一种视频编辑方法及设备 - Google Patents
一种视频编辑方法及设备 Download PDFInfo
- Publication number
- WO2022048347A1 WO2022048347A1 PCT/CN2021/108646 CN2021108646W WO2022048347A1 WO 2022048347 A1 WO2022048347 A1 WO 2022048347A1 CN 2021108646 W CN2021108646 W CN 2021108646W WO 2022048347 A1 WO2022048347 A1 WO 2022048347A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- electronic device
- user
- clips
- time units
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Definitions
- the present application relates to the field of terminal technologies, and in particular, to a video editing method and device.
- the video platform will mark the location in the video. For example, by making dots on the progress bar of the video to form multiple dot positions, when the user touches or clicks a dot position, it will be displayed at the dot position.
- the text information of the video content at the dotted position can help the user to switch to the position they want to watch in a relatively short time, and it can also help the user to identify the more exciting parts of the video, so as to edit the wonderful video clips and share them with friends or share to social networks.
- the present application provides a video editing method and device, which are used to cut wonderful video clips from real-time live video based on wake-up words or wake-up actions issued by a user watching the video due to emotional fluctuations.
- an embodiment of the present application provides a video editing method.
- the method can be executed by a first electronic device.
- the method includes: first, during the process of playing the first video by the first electronic device, the first electronic
- the acquisition device acquires the voice information of the user watching the first video and/or the second video of the user, and the first electronic device identifies M key information related to the user's emotions in the voice information and/or the second video, and in the first video N first video clips corresponding to the collection time units of the M pieces of key information are determined, and the N first video clips are edited to generate an edited video.
- M and N are positive integers
- the acquisition device may be a voice acquisition device or an image acquisition device, and the acquisition device may be integrated in the first electronic device, or may be a device connected to the first electronic device.
- the key information includes at least one of the following wake-up words or wake-up actions: wake-up words include the sound and the preset voice information issued by the user when the user makes a set body action due to emotional fluctuations; the wake-up action includes the user's Set body movements and set facial expressions made by emotional fluctuations.
- the key information may also be information existing in nature, such as the decibel size of the sound, which is not limited in this embodiment of the present application.
- the electronic device triggers the video clip based on the unconscious voice or action of the user watching the video. This method does not require the user to actively trigger the video clip to generate a wonderful video clip, which can effectively Improve user experience.
- the method further includes: the first electronic device determining M second video clips of the second video corresponding to the N first video clips; wherein the playing period of the N first video clips is the same as The acquisition periods of the M second video clips overlap; then the first electronic device edits the N first video clips and the M second video clips to generate an edited video.
- playing wonderful video clips and video information related to the user through multiple windows helps to increase the interest of the video and increase the interaction effect between the user and the electronic device.
- the method further includes: dividing the first video into L first video segments. Then, when the key information is identified, the first electronic device makes dots on the first video segment corresponding to the collection time unit of the key information among the L first video segments; and then the first electronic device obtains from the first video For the dotted information of the first video segment, according to the dotted information, N first video segments corresponding to the collection time units of the M pieces of key information are determined from the L first video segments.
- a beautiful video clip can be obtained from the first video, which is also helpful for the subsequent intuitive display of the video content at the dotted position, which can effectively improve the user experience.
- an embodiment of the present application provides a video editing method.
- the method can be executed by a first electronic device.
- the method includes: during the process of playing the first video by the first electronic device, Obtain the voice information of the user who watches the first video and/or the second video of the user; then the first electronic device divides the voice information and/or the second video according to the collection time unit, and obtains M collection time units .
- the first electronic device determines, according to the voice information corresponding to the M collection time units and/or the key information in the second video, the user emotion scores corresponding to the M collection time units respectively; the first electronic device determines the user emotion scores corresponding to the M collection time units respectively; , in the first video, determine the brilliance of the L first video clips corresponding to the M collection time units; the first electronic device measures the N first videos whose brilliance in the L first video clips is greater than the set threshold The clips are edited to generate an edited video, where M, L, and N are positive integers.
- the electronic device realizes the scoring of the user's movie-watching emotion based on the unconscious voice or actions of the user watching the video, so as to evaluate the brilliance of the video clip and complete the video clip. This method does not If the user's active triggering of video clips is required, wonderful video clips can be generated, which can effectively improve the user experience.
- the specific method for determining the user emotion scores corresponding to the M collection time units includes: the first electronic device recognizes, according to a preset neural network model, the voice information and/or voice information corresponding to the M collection time units The key information in the second video; according to the identification result, determine the user emotion scores corresponding to the M collection time units respectively.
- the method further includes: determining M second video clips of the second video corresponding to the N first video clips; wherein the playing periods of the N first video clips are the same as The acquisition periods of the M second video segments overlap.
- the first electronic device edits the N first video clips and the M second video clips to generate an edited video, where M and N are positive integers.
- the user emotion score is used to reflect the brilliance of the video clip itself, which can more objectively reflect the brilliance of the video clip.
- an embodiment of the present application provides a video editing method, the method can be executed by a second electronic device, and the method includes: during the process of playing the first video by the first electronic device, acquiring from a collection device to watch the first video The user's voice information of the video and/or the user's second video, identify M key information related to the user's emotions in the voice information and/or the second video, and obtain the M key information from the first video of the first electronic device. Editing the N first video clips corresponding to the key information collection time unit to generate an edited video, where M and N are positive integers.
- the second electronic device may trigger the editing of the video played by the first electronic device based on the unconscious voice or action of the user watching the video.
- this method does not need to play the video.
- the device has a video editing function, and the video editing is completed through the cooperation of multiple devices in the distributed system to generate wonderful video clips, which effectively improves the user experience.
- the method may further include: the second electronic device determining M second video segments of the second video corresponding to the N first video segments; wherein the N first videos The playback period of the clips overlaps with the collection period of the M second video clips; specifically, the second electronic device may edit the N first video clips and the M second video clips, and generate an edited video clip. video.
- the second electronic device may divide the first video into L first video clips, and when the key information is identified, share the L first video clips with the first video clips. Dot on the first video segment corresponding to the collection time unit of the key information; obtain the dot-dot information of the first video segment from the first video, and determine from the L first video segments the dot-dot information according to the dot-dot information. N first video clips corresponding to the collection time units of the M pieces of key information.
- the key information may include at least one of the following wake-up words or wake-up actions:
- the wake-up word includes the sound and the preset voice information made by the user due to emotional fluctuations; facial expression.
- an embodiment of the present application provides a video editing method.
- the method can be executed by a second electronic device.
- the method includes: during the process of playing the first video by the first electronic device, acquiring from a collection device the first video for viewing the video.
- the L first video clips of the first video of the Editing is performed on N first video clips whose brilliance is greater than the set threshold among the L first video clips, and an edited video is generated, wherein M, L and N are positive integers.
- the electronic device realizes the scoring of the user's movie-watching emotion based on the unconscious voice or actions of the user watching the video, so as to evaluate the brilliance of the video clip and complete the video clip. This method does not If the user's active triggering of video clips is required, wonderful video clips can be generated, which can effectively improve the user experience.
- determining the user emotion scores corresponding to the M collection time units respectively includes:
- the second electronic device identifies the voice information corresponding to the M collection time units and/or the key information in the second video according to the preset neural network model; the second electronic device determines the M collection times according to the recognition result The user sentiment scores corresponding to the units respectively.
- the method further includes: the second electronic device determining M second video clips of the second video corresponding to the N first video clips; The playback period overlaps with the collection period of the M second video clips; specifically, the N first video clips and the M second video clips can be edited to generate an edited video, where M and N are positive integer.
- the key information may include at least one of the following wake-up words or wake-up actions:
- the wake-up word includes the sound and the preset voice information made by the user due to emotional fluctuations; facial expression.
- an embodiment of the present application provides a first electronic device, including a processor and a memory, wherein the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor , so that the first electronic device can implement any possible design method of the first aspect or the second aspect.
- an embodiment of the present application provides a second electronic device, including a processor and a memory, wherein the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor , so that the second electronic device can implement any possible design method of the third aspect or the fourth aspect.
- an embodiment of the present application further provides an apparatus, where the apparatus includes a module/unit for performing any possible design method of the first aspect or the second aspect.
- modules/units can be implemented by hardware or by executing corresponding software by hardware.
- an embodiment of the present application further provides an apparatus, where the apparatus includes a module/unit for performing any possible design method of the third aspect or the fourth aspect.
- modules/units can be implemented by hardware or by executing corresponding software by hardware.
- the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and when the computer program runs on a first electronic device, the first electronic device executes Any one of the possible design methods of the first aspect or the second aspect above.
- the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and when the computer program runs on a second electronic device, the second electronic device executes Any one of the possible design methods of the third aspect or the fourth aspect above.
- an embodiment of the present application further provides a computer program product that, when the computer program product runs on a first electronic device, enables the first electronic device to perform the above-mentioned first aspect or the second aspect. Any of the possible design methods.
- an embodiment of the present application further provides a computer program product that, when the computer program product runs on a second electronic device, causes the second electronic device to execute the third aspect or the fourth aspect. Any of the possible design methods.
- an embodiment of the present application further provides a chip, which is coupled to a memory and configured to execute a computer program stored in the memory, so as to execute any one of the possible design methods in any of the foregoing aspects.
- FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
- FIG. 2 is a schematic structural diagram of a main body of a device according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a video editing method provided by an embodiment of the present application.
- FIG. 5A is a schematic diagram of another application scenario provided by an embodiment of the present application.
- 5B is a schematic diagram of a size window provided by an embodiment of the present application.
- 5C is a schematic flowchart of a video editing process provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of another video editing method provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of another application scenario provided by an embodiment of the present application.
- FIG. 8 is a schematic flowchart of another video editing method provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of a user emotion scoring method provided by an embodiment of the present application.
- FIG. 10 is a schematic flowchart of another video editing method provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a first electronic device according to an embodiment of the application.
- FIG. 12 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
- FIG. 1 it is a schematic diagram of a system architecture applicable to the embodiment of the application.
- a local area network of a home is used as an example for illustration.
- the home has the ability to connect to the network.
- Electronic devices include: smart cameras, smart speakers, smart TVs, mobile phone a1 and mobile phone a2.
- all electronic devices shown in FIG. 1 are electronic devices with the ability to connect to a network. Some electronic devices may have established a connection with the network, and some electronic devices may not have established a connection with the network, that is, have not been registered with the network.
- Several electronic devices shown in FIG. 1 are only examples, and other electronic devices may also be included in practical applications, which are not limited in the embodiments of the present application.
- an apparatus for executing the method provided by the embodiment of the present application
- the apparatus provided by the embodiment of the present application may be the electronic device shown in FIG. 1 .
- the apparatus in this embodiment of the present application may be one or more electronic devices, such as a device with a voice capture function (such as a smart speaker) and a device with a video playback function (such as a mobile phone or a smart TV), or a device with an image capture function A device (such as a camera) and a device with a video playback function (such as a mobile phone or a smart TV), or a device with both voice and image capture functions and a video playback function (such as a mobile phone or a smart TV).
- a voice capture function such as a smart speaker
- a video playback function such as a mobile phone or a smart TV
- an image capture function such as a camera
- a video playback function such as a mobile phone or a smart TV
- the specific connection methods include but are not limited to general Serial bus (Universal Serial Bus, USB) data line connection, Bluetooth, wireless fidelity (wireless fidelity, Wi-Fi), Wi-Fi Direct (Wi-Fi Direct), near field communication technology (Near Field Communication, NFC), Fifth Generation Mobile Communication System (The Fifth Generation, 5G), Global System of Mobile Communication (GSM) system, Code Division Multiple Access (CDMA) system, Wideband Code Division Multiple Access (Wideband Code Division Multiple Access, WCDMA) General Packet Radio Service (General Packet Radio Service, GPRS) system, Long Term Evolution (Long Term Evolution, LTE) system, LTE Frequency Division Duplex (Frequency Division Duplex, FDD) system, LTE time division Duplex (Time Division Duplex, TDD), Universal Mobile Telecommunication System (UMTS), Worldwide Interoperability for Microwave Access (
- the electronic device shown in FIG. 1 is only an example, and that the electronic device may have more or fewer components than those shown in the figure, may combine two or more components, or may have different component configuration.
- the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
- the electronic device in the embodiment of the present application may be a mobile phone (mobile phone), a tablet computer (pad), a computer with a wireless transceiver function, a virtual reality (VR) device, an augmented reality (AR) device, an industrial Wireless devices in industrial control, wireless devices in self driving, wireless devices in remote medical, wireless devices in smart grid, transportation safety Wireless devices in smart cities, wireless devices in smart homes, and more.
- FIG. 2 it is a schematic diagram of a hardware structure of an electronic device 200 according to an embodiment of the present application.
- the electronic device 200 may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2 , mobile communication module 250, wireless communication module 260, audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, and Subscriber identification module (subscriber identification module, SIM) card interface 295 and so on.
- SIM Subscriber identification module
- the sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, and ambient light.
- the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 200 .
- the electronic device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 210 may include one or more processing units, for example, the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- image signal processor image signal processor
- ISP image signal processor
- controller video codec
- digital signal processor digital signal processor
- baseband processor baseband processor
- neural-network processing unit neural-network processing unit
- the electronic device 200 implements a display function through a GPU, a display screen 294, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.
- the electronic device 200 may implement a shooting function through an ISP, a camera 293, a video codec, a GPU, a display screen 294, an application processor, and the like.
- the SIM card interface 295 is used to connect a SIM card.
- the SIM card can be contacted and separated from the electronic device 200 by inserting into the SIM card interface 295 or pulling out from the SIM card interface 295 .
- the electronic device 200 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
- the SIM card interface 295 can support Nano SIM cards, Micro SIM cards, SIM cards, and the like.
- the same SIM card interface 295 can insert multiple cards at the same time.
- the types of the plurality of cards may be the same or different.
- the SIM card interface 295 can also be compatible with different types of SIM cards.
- the SIM card interface 295 is also compatible with external memory cards.
- the electronic device 200 interacts with the network through the SIM card to realize functions such as call and data communication.
- the electronic device 200 employs an eSIM, ie: an embedded SIM card.
- the wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in electronic device 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
- the mobile communication module 250 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 200 .
- the mobile communication module 250 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
- the mobile communication module 250 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
- the mobile communication module 250 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves for radiation through the antenna 1 .
- at least part of the functional modules of the mobile communication module 250 may be provided in the processor 210 .
- at least part of the functional modules of the mobile communication module 250 may be provided in the same device as at least part of the modules of the processor 210 .
- the wireless communication module 260 can provide applications on the electronic device 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (infrared radiation, IR) technology.
- WLAN wireless local area networks
- BT Bluetooth
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared radiation
- the wireless communication module 260 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 260 receives electromagnetic waves via the antenna 2 , modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 .
- the wireless communication module 260 can also receive the signal to be sent from the processor 210 , perform frequency modulation on the signal, amplify the signal, and then convert it into an electromagnetic wave for radiation through the antenna 2 .
- the antenna 1 of the electronic device 200 is coupled with the mobile communication module 250, and the antenna 2 is coupled with the wireless communication module 260, so that the electronic device 200 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the electronic device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the software system of the electronic device 200 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
- the embodiments of the present application take the Android system with a layered architecture as an example to exemplarily describe the software structure of the electronic device 200.
- FIG. 3 is a block diagram of a software structure of an electronic device according to an embodiment of the present invention.
- the software modules and/or codes of the software architecture may be stored in the internal memory 221.
- the processor 210 runs the software modules or codes, the embodiments of the present application are executed.
- the provided running posture detection method is a block diagram of a software structure of an electronic device according to an embodiment of the present invention.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
- the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
- the application layer can include a series of application packages.
- the application package may include applications such as phone, camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
- a window manager is used to manage window programs.
- the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
- Content providers are used to store and retrieve data and make these data accessible to applications.
- the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
- a display interface can consist of one or more views.
- the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
- the phone manager is used to provide the communication function of the electronic device. For example, the management of call status (including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
- the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
- the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
- Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
- the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
- the application layer and the application framework layer run in virtual machines.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
- a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
- surface manager surface manager
- media library Media Libraries
- 3D graphics processing library eg: OpenGL ES
- 2D graphics engine eg: SGL
- the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
- 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
- the hardware may refer to various types of sensors, such as an acceleration sensor, a gyroscope sensor, a touch sensor, a pressure sensor, and the like involved in the embodiments of the present application.
- the present application therefore provides a video editing method, which uses the video of the user watching the video collected by the image capture device, or the user's voice related to the user's emotion collected by the voice capture device. , body movements or facial expressions, triggering the electronic device to cut out the wonderful video clips from the video the user is watching. In this way, the user can complete the video clip without actively issuing a fixed wake-up word, which improves the user experience.
- FIG. 4 it is a schematic flowchart of a video editing method according to an embodiment of the present application.
- the method can be implemented by the electronic device shown in FIG. 1 .
- the following takes the first electronic device executing the method as an example to illustrate, as shown in FIG. 4 , the process includes:
- Step 401 During the process of playing the first video by the first electronic device, the first electronic device acquires the voice information of the user watching the first video and/or the second video of the user from the collecting device.
- the acquisition device may include a voice acquisition device, an image acquisition device, and the like.
- the voice collection device may be an audio module in the first electronic device, such as a receiver, a microphone, and the like.
- the voice collection device may also be a peripheral device connected to the first electronic device, such as a microphone externally connected to the first electronic device, or a device such as a smart speaker wirelessly connected to the first electronic device. That is to say, during the process of the user watching the first video, the voice collecting device will collect the voice information of the user in real time. In this way, the voice collection device can collect voice information such as "great" and "wonderful" issued by the user, or the user makes a sound of applause.
- the voice collecting device can also collect voice information of the first video during the playing process of the first video. Taking the live video playing process of a football match as an example, the most exciting part of a football comparison is usually the goal scene. At this time, the video of the match usually plays the sound of cheering and applauding from the audience. If the first electronic device uses a speaker to play the video, then The voice collecting device can collect the cheering and applauding sounds of the audience.
- the user may utter an exclamation such as "Great!”.
- the audio module of the smart TV such as a microphone
- the smart speaker can collect the voice information sent by the user during the video playback period.
- Step 402 the first electronic device identifies M key information related to the user's emotions in the voice information and/or the second video, and determines N first videos corresponding to the collection time units of the M key information from the first video Fragment.
- the key information may include at least one of keywords and key actions.
- the first electronic device after the first electronic device acquires the voice information from the voice collection device, the first electronic device recognizes the voice information based on a preset voice recognition model, for example, through voiceprint recognition, and identifies the user from it.
- the voice information of the user is matched with the preset voice template, so as to determine whether there is a wake-up word related to the user in the voice information, and the collection time unit of the voice corresponding to the wake-up word issued by the user. , and then, the first electronic device determines the first video segment of the first video corresponding to the collection time unit.
- the wake-up word includes the sound (such as the sound of applause) made by the user due to the set body movements due to emotional fluctuations, and the set voice information (such as various interjections) sent out by the user.
- the preset speech template may be speech information related to the user's emotion generated by pre-training, using the sound of applause, the sound of celebration, various interjections, and the like.
- the key information may also be information existing in nature, such as the decibel size of the sound, which is not limited in this embodiment of the present application.
- the first electronic device after the first electronic device acquires the second video from the image acquisition device, the first electronic device recognizes the image information based on a preset image recognition model, and recognizes the user's body movements or Expressions, etc., and match the recognized body movements and expressions of the user with the pre-stored set body movement template or set facial expression template, so as to determine whether there is a wake-up action related to the user in the second video, and The user performs the capture time unit corresponding to the wake-up action, and then the first electronic device determines the first video segment of the first video corresponding to the capture time unit. It should be noted that, in this embodiment of the present application, the above two possible embodiments may also be combined to determine N first video segments.
- a possible way for the first electronic device to determine the first video segment of the first video corresponding to the collection time unit is: the first electronic device may divide the first video into L first video segments in advance. Dot on the first video clip of the first video corresponding to the information collection time unit; then the first electronic device can obtain the first video from the first video after the first video is played, or after the partial video of the first video is played. Dot information of a video segment, according to the dot information, N first video clips of the first video corresponding to the collection time units of the M wake words are determined from the L first video clips.
- the first electronic device divides the first video with a fixed duration (for example, 10 seconds), so that the first video is divided into a plurality of first video segments, therefore, the first electronic device can wake up the collection time of the word Dot on the first video segment of the first video corresponding to the unit.
- a fixed duration for example, 10 seconds
- the user sends out the exclamation sentence "Awesome! when watching the goal screen.
- the wake-up word of the user exists in the period from 9:45:11 Beijing time to 9:45:11 Beijing time, so the smart TV matches the football game played by the smart TV in the 10 seconds before 9:45:11 Beijing time.
- the live video clip of the tick is clicked, and then the smart TV determines the first video clip to be clicked according to the click information. Because before the goal, that is, within 10 seconds before 9:45:11 Beijing time, it is very likely that the center will break through each other and block the opponent one by one, and the last shot will be a wonderful time. Therefore, according to the above method, a wonderful video clip of a live video of a football match can be edited.
- the embodiment of the present application does not limit the number of dotting positions included in the dotting information, which may be one or more.
- the correspondence between the time unit of collection of key information and the time unit of the first video clip may exist in many cases: in the first possible case, the time unit of collection of key information is the same as the time unit of the video clip, for example, The wake-up word "Awesome" detected from 9:45:10 GMT to 9:45:11 GMT, the smart TV can edit this from 9:45:10 GMT to 9:45:11 A video clip within one second; in the second possible case, the time unit of the video clip contains the collection unit of key information, that is, the time unit detected between 9:45:10 Beijing time and 9:45:11 Beijing time With the wake-up word "Awesome", the smart TV can edit video clips within 10 seconds before 9:45:11 Beijing time.
- the smart TV can edit the video clip within 10 seconds after 11:30:11 Beijing time .
- This embodiment of the present application does not specifically limit this, and the relationship between the collection time unit of the key information and the time unit of the video clip may be determined according to actual experience.
- Step 403 the first electronic device edits the N first video segments to generate an edited video, where M and N are positive integers.
- the user can share the video on the first electronic device to other users' electronic devices, or to a social network, such as a circle of friends.
- the first electronic device may splicing and combining the first video clips of the first video corresponding to all or part of the key information collection time units to synthesize wonderful video clips.
- the first electronic device may further determine M second video clips of the second video corresponding to the N first video clips; wherein, the playing period of the N first video clips is the same as that of the N first video clips.
- the acquisition periods of the M second video clips overlap, and then the first electronic device edits the N first video clips and the M second video clips to generate an edited video, where M and N are positive integers.
- the image acquisition device may be a camera in the first electronic device.
- the image acquisition device may also be a peripheral device connected to the first electronic device, such as a camera externally connected to the first electronic device, or a device such as a smart camera wirelessly connected to the first electronic device.
- the image capturing device will capture the image information of the user in real time. In this way, the image capture device can capture image information such as the user's applauding action, so as to generate the second video.
- the smart TV when the user is playing the live video of a football match, after the smart TV recognizes the acquired voice information, it is determined that the voice information exists in the time period from 9:45:10 Beijing time to 9:45:11 Beijing time.
- the user's wake-up word so that the smart TV not only determines that the smart TV will be activated during the period from 9:45:10 Beijing time to 9:45:11 Beijing time, or within 10 seconds after 9:45:10 Beijing time.
- the live video clip of the football match played, and the smart TV also determined the second video clip in the second video during the period from 9:45:10 Beijing time to 9:45:11 Beijing time, and finally, the smart TV
- the TV can splicing and combining the first video clip in the first video and the second video clip in the second video to synthesize a wonderful video clip that can be played in multiple windows.
- the final synthesized wonderful video can be played in multiple windows. Fragments can be as shown in Figure 5B.
- the method plays wonderful video clips through multiple windows, which helps to increase the interest of the video.
- Step 501 the first electronic device recognizes the voice or image information collected by the collecting device.
- step 502 the first electronic device obtains the camera cache data from the image information collected by the camera (that is, the second video clip with a length of 10s in the second video above), on the other hand, the first electronic device The electronic device obtains the cached live data (that is, the first video clip with a length of 10s in the first video above);
- Step 503 the first electronic device generates a wonderful video clip file, or generates multiple pictures;
- Step 504 the first The electronic device acquires associated device information, such as device information of the user's friend;
- step 505 the first electronic device shares the link with the associated device.
- the electronic device triggers the video clip based on the unconscious voice or action of the user watching the video. This method does not require the user to actively trigger the video clip, and can generate a wonderful video clip, effectively improving the user experience.
- FIG. 6 it is a schematic flowchart of a video editing method according to an embodiment of the present application.
- the method can be jointly implemented by at least two electronic devices shown in FIG. 1 .
- the following takes the first electronic device and the second electronic device executing the method as an example for description.
- the process includes:
- Step 601 During the process of playing the first video by the first electronic device, the second electronic device acquires the voice information of the user watching the first video from the collecting device.
- the acquisition device may include a voice acquisition device, an image acquisition device, and the like.
- the voice acquisition device may be an audio module in the second electronic device, or an external device connected by wire or wirelessly
- the image acquisition device may also be a peripheral device connected to the second electronic device, such as an external device connected to the second electronic device.
- a voice acquisition device may be an audio module in the second electronic device, or an external device connected by wire or wirelessly
- the image acquisition device may also be a peripheral device connected to the second electronic device, such as an external device connected to the second electronic device.
- the user may issue an exclamation such as "Great!.
- the audio module such as a microphone
- the smart speaker can collect the voice information sent by the user during the video playback period, and the user's mobile phone can obtain the voice information from the voice collecting device.
- Step 602 the second electronic device identifies M pieces of key information related to the user's emotion in the voice information and/or the second video.
- the user sends out the exclamation sentence "Fantastic!.
- the smart TV recognizes the acquired voice information, it is determined to be between 9:45:10 Beijing time and 10 seconds Beijing time.
- the user's wake-up word exists during the time period of 9:45:11.
- Step 603 The second electronic device acquires N first video segments corresponding to the collection time units of the M pieces of key information from the first video of the first electronic device.
- the second electronic device may divide the first video into L first video segments in advance, which may be Dot on the first video segment of the first video corresponding to the collection time unit of the key information; then the second electronic device can obtain the first video from the first video after the first video is played, or after the partial video of the first video is played.
- Dotting information of the first video segment according to the dotting information, N first video segments of the first video corresponding to the collection time units of the M wake-up words are determined from the L first video segments.
- the second electronic device divides the first video with a fixed length of time (for example, 10 seconds), so that the first video is divided into a plurality of first video segments, so the second electronic device can wake up the collection time of the word Dot on the first video segment of the first video corresponding to the unit.
- a fixed length of time for example, 10 seconds
- the embodiment of the present application does not limit the number of dotting positions included in the dotting information, which may be one or more.
- the user sends out the exclamation sentence "Great! during the live video playback of the football match.
- the mobile phone recognizes the acquired voice information, it is determined that it will be at 9:45 Beijing time. From 10 seconds to 9:45:11, Beijing time, the user's wake-up word exists, so that the mobile phone obtains the live video of the football game from the smart TV, and the mobile phone is from 9:45:10 Beijing time to 9:00 Beijing time.
- the live video clip of the football game played by the smart TV will be clicked, and then the mobile phone will determine the first time to be clicked according to the click information. video clips.
- Step 604 the second electronic device edits the N first video segments to generate an edited video, where M and N are positive integers.
- the user can share the video on the second electronic device to other users' electronic devices, or to a social network, such as a circle of friends.
- the second electronic device may splicing and combining the first video clips of the first video corresponding to all or part of the key information collection time units to synthesize wonderful video clips.
- the second electronic device may further determine M second video clips of the second video corresponding to the N first video clips; wherein the playing period of the N first video clips is the same as the The acquisition periods of the M second video clips overlap; the second electronic device edits the N first video clips and the M second video clips to generate an edited video, where M and N are positive integers.
- the mobile phone recognizes the acquired voice information, it is determined that the user exists during the period from 9:45:10 Beijing time to 9:45:11 Beijing time. so that the mobile phone not only determines the live video clips of the football match played by the smart TV during the period from 9:45:10 Beijing time to 9:45:11 Beijing time, but also the mobile phone collects data from the smart camera.
- the second video it is determined that during the period from 9:45:10 Beijing time to 9:45:11 Beijing time, or within 10 seconds after 9:45:10 Beijing time, the first video in the second video Two video clips.
- the mobile phone can splicing and combining the first video clip in the first video and the second video clip in the second video to synthesize a wonderful video clip that can be played in multiple windows.
- the final synthesized video clip can be A highlight video clip played in multiple windows can be as shown in FIG. 5B .
- the second electronic device may trigger the editing of the video played by the first electronic device based on the unconscious voice or action of the user watching the video.
- this method does not require playback
- the video equipment has the video editing function, and the video editing is completed through the cooperation of multiple equipment in the distributed system to generate wonderful video clips, which effectively improves the user experience.
- FIG. 8 it is a schematic flowchart of another video editing method provided by an embodiment of the present application.
- the method can be implemented by the electronic device shown in FIG. 1 .
- the following takes the first electronic device executing the method as an example for description.
- the process includes:
- Step 801 During the process of playing the first video by the first electronic device, the first electronic device acquires the voice information of the user watching the first video and/or the second video of the user from the collecting device.
- the acquisition device may include a voice acquisition device, an image acquisition device, and the like.
- the voice acquisition device may be an audio module in the second electronic device, or an external device connected by wire or wirelessly
- the image acquisition device may also be a peripheral device connected to the second electronic device, such as an external device connected to the second electronic device.
- a voice acquisition device may be an audio module in the second electronic device, or an external device connected by wire or wirelessly
- the image acquisition device may also be a peripheral device connected to the second electronic device, such as an external device connected to the second electronic device.
- Step 802 the first electronic device divides the voice information and/or the second video according to the collection time units, identifies the voice information corresponding to the M collection time units and/or the key information in the second video, and determines the M collection time units The corresponding user sentiment scores.
- the method for determining the user emotion scores corresponding to the M collection time units by the first electronic device may adopt any of the following methods:
- Manner 1 The first electronic device recognizes the wake-up word in the voice information, and according to the recognition result, determines the user emotion scores corresponding to the M collection time units respectively.
- the first electronic device After the first electronic device acquires the voice information from the voice collection device, the first electronic device recognizes the voice information based on a preset voice recognition model, such as voiceprint recognition, from which the user's voice information is identified, and the first electronic device recognizes the voice information of the user.
- a preset voice recognition model such as voiceprint recognition
- An electronic device determines a user emotion score corresponding to each collection time unit based on a preset neural network model.
- the first electronic device recognizes that the first collection time unit (eg, 9:45:10, Beijing time to 9:45:20, Beijing time) includes the voice information "Great! sent by the user, Then the user emotion score of the first collection time unit is 9 points; if the first electronic device identifies that the second collection time unit (Beijing time 9:45:20 to 9:45:30 Beijing time) does not include the user If the voice information is sent, the user emotion score of the second collection time unit is 0 points.
- the first collection time unit eg, 9:45:10, Beijing time to 9:45:20, Beijing time
- the second collection time unit Beijing time 9:45:20 to 9:45:30 Beijing time
- the third collection time unit is The user emotion score of the third collection time unit 9 points; if the first electronic device recognizes that the fourth collection time unit (10:45:20 Beijing time to 10:45:30 Beijing time) does not contain any voice information, Then the user emotion score of the fourth collection time unit is 0 points.
- the first electronic device identifies the key actions in the second video; according to the identification result, the user emotion scores corresponding to the M collection time units are determined respectively.
- the first electronic device after the first electronic device acquires the second video from the image acquisition device, the first electronic device recognizes the second video based on a preset image recognition model, and recognizes the user's expressions and actions from it. Or at least one of the languages, the first electronic device determines the user emotion score corresponding to each collection time unit based on a preset neural network model.
- the first electronic device identifies that the first collection time unit (for example, from 9:45:10 to 9:45:20 Beijing time) includes the user's laughing expression, the first collection time The user emotion score of the unit is 9 points; if the first electronic device recognizes that the user's expression in the second collection time unit (Beijing time 9:45:20 to 9:45:30 Beijing time) is flat, then the second collection time The user sentiment score for the time unit is 0 points.
- the first electronic device identifies the wake-up word in the voice information and at least one piece of information in the key action in the second video, and determines user emotion scores corresponding to the M collection time units according to the identification results.
- the first electronic device recognizes the voice information and the second video in combination with the methods in Embodiment 1 and Embodiment 2, and synthesizes the recognition results to determine the user emotion score corresponding to each collection time unit.
- Step 803 the first electronic device determines the degree of brilliance corresponding to each of the L first video clips from the first video according to the user emotion scores corresponding to the M collection time units.
- the first electronic device can convert the user emotion scores corresponding to the M collection time units into wonderfulness through a preset function.
- the embodiment of the present application does not limit the representation of the function. Any user emotion score can be converted into wonderfulness.
- the functions of degrees are all applicable to the embodiments of the present application.
- the user sends out the exclamation sentence "Fantastic!.
- the smart TV recognizes the acquired voice information, it is determined to be between 9:45:10 Beijing time and 10 seconds Beijing time.
- the user's emotional score is 9 points, so the smart TV is determined to be in the period from 9:45:10 Beijing time to 9:45:11 Beijing time, or 9:45 Beijing time.
- the live video clip of the football match played by the smart TV has a brilliance of 9 points.
- Step 804 the first electronic device edits the N first video clips whose brilliance is greater than the set threshold in the L first video clips of the first video, and generates an edited video, wherein M, L and N are: positive integer.
- the user can share the video on the first electronic device to other users' electronic devices, or to a social network, such as a circle of friends.
- the first electronic device can use any one of the following ways to edit the video:
- the first electronic device may splicing and combining all or part of the first video clips whose brilliance is greater than a set threshold to synthesize wonderful video clips.
- the first electronic device may further determine M second video clips of the second video corresponding to the N first video clips; The collection periods of the second video clips overlap, and then the first electronic device edits the N first video clips and the M second video clips to generate an edited video, where M and N are positive integers.
- M and N are positive integers.
- the electronic device realizes the scoring of the user's movie-watching emotion based on the unconscious voice or actions of the user watching the video, so as to evaluate the brilliance of the video clip and complete the video clip. This method does not If the user actively triggers the video clip, a wonderful video clip can be generated, which effectively improves the user experience.
- FIG. 10 it is a schematic flowchart of another video editing method provided by an embodiment of the present application.
- the method can be jointly implemented by at least two electronic devices shown in FIG. 1 .
- the following takes the first electronic device and the second electronic device executing the method as an example for description.
- the process includes:
- Step 1001 During the process of playing the first video by the first electronic device, the second electronic device acquires the voice information of the user watching the first video and/or the second video of the user from the collecting device.
- the voice collection device may be an audio module in the second electronic device, or may be an external device connected by wire or wirelessly.
- the second electronic device may acquire the user's voice information or the audio information of the first video from the voice collection device.
- the image capturing device may be a camera in the second electronic device.
- the image acquisition device may also be a peripheral device connected to the second electronic device, such as a camera externally connected to the second electronic device, or a device such as a smart camera wirelessly connected to the second electronic device. That is to say, during the process of the user watching the first video, the image capturing device may capture the image information of the user in real time. In this way, the image collection device can collect image information such as the user's applauding action, so as to generate the second video.
- the user may issue an exclamation such as "Great!.
- the audio module such as a microphone
- the smart speaker can collect the voice information sent by the user during the video playback period, and the user's mobile phone can obtain the voice information from the voice collecting device.
- Step 1002 the second electronic device divides the voice information and/or the second video according to the collection time unit, identifies the voice information corresponding to the M collection time units and/or the key information in the second video, and determines the M collection time units The corresponding user sentiment scores.
- the method for the second electronic device to determine the user emotion scores respectively corresponding to the M collection time units may adopt any of the following methods:
- Manner 1 The second electronic device recognizes the wake-up word in the voice information, and determines user emotion scores corresponding to the M collection time units according to the recognition result.
- the second electronic device identifies at least one of the user's language, facial expressions and body movements in the second video; according to the identification result, the user emotion scores corresponding to the M collection time units are determined.
- Mode 3 The second electronic device recognizes the wake-up word in the voice information and recognizes at least one information in the user's language, facial expression and body movements in the second video, and according to the recognition result, determines M collection time units respectively. Corresponding user sentiment score.
- Step 1003 the second electronic device acquires L first video segments corresponding to the M collection time units from the first video of the first electronic device.
- Step 1004 the second electronic device determines the respective degrees of brilliance corresponding to the L first video segments of the first video according to the user emotion scores corresponding to the M collection time units.
- the second electronic device can convert the user emotion scores corresponding to the M collection time units into wonderfulness through a preset function.
- the embodiment of the present application does not limit the representation of the function. Any user emotion score can be converted into wonderfulness.
- the functions of degrees are all applicable to the embodiments of the present application.
- Step 1005 the second electronic device edits the N first video clips whose brilliance is greater than the set threshold in the L first video clips of the first video, and generates an edited video, wherein M, L and N are: positive integer.
- the first video segment is edited to generate an edited video, where M, L and N are positive integers.
- the user can share the video on the second electronic device to other users' electronic devices, or to a social network, such as a circle of friends.
- the second electronic device may use any of the methods provided in the foregoing step 804 to edit the video, and the description will not be repeated here.
- the electronic device realizes the scoring of the user's movie-watching emotion based on the unconscious voice or actions of the user watching the video, so as to evaluate the brilliance of the video clip and complete the video clip.
- This method does not If the user actively triggers the video clip, a wonderful video clip can be generated, which effectively improves the user experience.
- the method does not require the device for playing the video to have a video editing function. Multiple devices in the distributed system cooperate to complete video editing to generate wonderful video clips, which effectively improves user experience.
- the embodiment of the present invention provides a first electronic device, which is specifically used to implement the method executed by the first electronic device in the above-mentioned Embodiment 1 and Embodiment 3.
- the structure of an electronic device is shown in Figure 11, including a playback unit 1101, an acquisition unit 1102, a determination unit 1103, and a clip unit 1104.
- each module unit in the first electronic device performs the following actions:
- the playing unit 1101 is used to play the first video.
- the obtaining unit 1102 is configured to obtain, from the collecting device, the voice information of the user watching the first video and/or the second video of the user during the process of playing the first video by the first electronic device.
- the determining unit 1103 is configured to identify the M pieces of key information related to the user's emotions in the voice information and/or the second video, and to determine the number of pieces of information corresponding to the collection time units of the M pieces of key information. N first video segments of a video.
- the editing unit 1104 is configured to edit the N first video segments to generate an edited video, where M and N are positive integers.
- the determining unit 1103 is further configured to determine M second video segments of the second video corresponding to the N first video segments; wherein, the N first videos The playback period of the segment overlaps with the acquisition period of the M second video segments;
- the editing unit 1104 is further configured to edit the N first video clips and the M second video clips to generate an edited video, where M and N are positive integers.
- the determining unit 1103 is specifically configured to: divide the first video into L first video segments; when identifying the key information, Dotting on the first video segment of the first video corresponding to the collection time unit;
- the key information includes at least one of the following wake-up words or wake-up actions:
- the wake-up word includes the sound and the preset voice information made by the user due to emotional fluctuations; facial expression.
- each module unit in the first electronic device performs the following actions:
- the playing unit 1101 is used to play the first video.
- the obtaining unit 1102 is configured to obtain, from the collecting device, the voice information of the user watching the first video and/or the second video of the user during the process of playing the first video by the first electronic device.
- the determining unit 1103 is configured to divide the voice information and/or the second video according to the collection time unit, and identify the key information in the voice information and/or the second video corresponding to the M collection time units , determine the user emotion scores corresponding to the M collection time units respectively; according to the user emotion scores corresponding to the M collection time units, determine the brilliance respectively corresponding to the L first video segments of the first video.
- the editing unit 1104 is configured to edit the N first video clips whose brilliance is greater than the set threshold in the L first video clips of the first video, and generate an edited video, wherein M, L and N is a positive integer.
- the determining unit 1103 is configured to identify the voice information and/or key information in the second video corresponding to the M collection time units according to a preset neural network model;
- the user emotion scores corresponding to the M collection time units respectively are determined.
- the determining unit 1103 is configured to determine M second video segments of the second video corresponding to the N first video segments; wherein, the N first video segments The playback period of the M second video clips overlaps with the acquisition period of the M second video clips; the N first video clips and the M second video clips are edited to generate an edited video, where M and N are positive integers.
- the key information includes at least one of the following wake-up words or wake-up actions:
- the wake-up word includes the sound and the preset voice information made by the user due to emotional fluctuations; facial expression.
- an embodiment of the present invention further provides a second electronic device, which is specifically used to implement the method executed by the second electronic device in the foregoing Embodiment 2 and Embodiment 4.
- the structure of the second electronic device is shown in FIG. 12, including an acquisition unit 1201, a determination unit 1202 and a clipping unit 1203, wherein:
- each module unit in the second electronic device performs the following actions:
- the obtaining unit 1201 is configured to obtain, from a collection device, voice information of a user watching the first video and/or a second video of the user during the process of playing the first video by the first electronic device.
- the determining unit 1202 is configured to identify M pieces of key information related to user emotions in the voice information and/or the second video.
- the obtaining unit 1201 is further configured to obtain, from the first electronic device, N first video segments of the first video corresponding to the collection time units of the M pieces of key information.
- the editing unit 1203 is configured to edit the N first video segments to generate an edited video, where M and N are positive integers.
- the determining unit 1202 is further configured to determine M second video segments of the second video corresponding to the N first video segments; wherein, the N first videos The playback period of the clip overlaps with the collection period of the M second video clips; the editing unit 1203 is further configured to edit the N first video clips and the M second video clips, and generate the edited 's video.
- the determining unit 1202 is further configured to divide the first video into L first video segments; when the key information is identified, the Dot on the first video clip of the first video corresponding to the time unit; obtain the dot dot information of the first video clip from the first video, and obtain dot dot information from the L first video clips according to the dot dot information N first video segments corresponding to the acquisition time units of the M pieces of key information are determined.
- each module unit in the second electronic device performs the following actions:
- the obtaining unit 1201 is configured to obtain, from a collection device, voice information of a user watching the first video and/or a second video of the user during the process of playing the first video by the first electronic device.
- the determining unit 1202 is configured to divide the voice information and/or the second video according to collection time units, and identify key information in the voice information and/or the second video corresponding to the M collection time units , and determine the user emotion scores corresponding to the M collection time units respectively.
- the obtaining unit 1201 is further configured to obtain, from the first electronic device, L first video segments of the first video corresponding to the M collection time units.
- the determining unit 1202 is further configured to determine, according to the user emotion scores corresponding to the M collection time units, the respective degrees of brilliance corresponding to the L first video segments of the first video.
- the editing unit 1203 is configured to edit the N first video clips whose brilliance is greater than the set threshold in the L first video clips of the first video, and generate an edited video, wherein M, L and N is a positive integer.
- the determining unit 1202 is configured to, according to a preset neural network model, identify the voice information corresponding to the M collection time units and/or key information in the second video ;
- the user emotion scores corresponding to the M collection time units respectively are determined.
- the determining unit 1202 is further configured to determine M second video segments of the second video corresponding to the N first video segments; wherein, the N first videos The playback period of the clip overlaps with the collection period of the M second video clips; the editing unit 1203 is further configured to edit the N first video clips and the M second video clips, and generate the edited video, where M and N are positive integers.
- the key information includes at least one of the following wake-up words or wake-up actions:
- the wake-up word includes the sound and the preset voice information made by the user due to emotional fluctuations; facial expression.
- This embodiment also provides a computer storage medium, where computer instructions are stored in the computer storage medium, and when the computer instructions are executed on the electronic device, the electronic device is made to perform one or more steps performed in the foregoing embodiments to achieve methods in the above-mentioned embodiments.
- This embodiment also provides a program product, which when the program product runs on a computer, causes the computer to execute one or more steps in the foregoing embodiments, so as to implement the methods in the foregoing embodiments.
- the embodiments of the present application also provide an apparatus, which may specifically be a chip system, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, when the apparatus is running , the processor can execute the computer-executed instructions stored in the memory, so that the chip executes one or more steps in the foregoing embodiments, so as to implement the methods in the foregoing embodiments.
- Each functional unit in each of the embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
- a computer-readable storage medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
本申请提供了一种视频编辑方法及设备,该方法可以应用于具有视频播放功能的第一电子设备,也可以应用于具有视频处理功能的第二电子设备,示例性地,当方法由第一电子设备执行时,该方法包括:在第一电子设备播放第一视频的过程中,从语音采集装置或图像采集装置获取观看视频的用户的语音信息或第二视频;对语音信息中的唤醒词进行识别,或者对第二视频中的唤醒动作进行识别,从第一视频中确定与M个唤醒词或唤醒动作的采集时间单元对应的N个第一视频片段;对N个第一视频片段进行编辑,生成编辑后的视频。该方法基于观看视频的用户因情绪波动发出无意识发出的唤醒词或唤醒动作,实现从视频中剪辑出精彩视频片段。
Description
相关申请的交叉引用
本申请要求在2020年09月02日提交中国专利局、申请号为202010909167.5、申请名称为“一种视频编辑方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及终端技术领域,尤其涉及一种视频编辑方法及设备。
近年来,随着电子产业和通信技术的飞速发展,目前智能电子设备越来越多,例如手机、智能音箱、智能手环等,人们的生活变得越来越智能化。由于手机的便携性,且可以从应用商店上下载各种功能的应用软件,所以手机已经成为人们日常生活中必不可少的必备品。
基于互联网的发展,用户在智能电子设备上观看视频越来越方便,当用户看到比较精彩的视频内容,例如直播视频中精彩内容,用户通常希望把精彩的视频片段分享给朋友或分享至社交网络。通常视频平台会在视频中较为精彩在位置上进行标注,例如,在视频的进度条上进行打点,形成多个打点位置,用户在触动或点击某一个打点位置时,会在该打点位置处显示该打点位置处视频内容的文字信息,这样可以有利于用户在较短的时间内切换到想要观看的位置,也可以便于用户识别视频中较为精彩的部分,从而剪辑出精彩的视频片段分享给朋友或分享至社交网络。
然而,针对实时直播视频来说,由于内容的不可预知性,导致了用户在观看时无法预料到精彩的片段可能会出现的时机,因此很难从正在观看的实时直播视频中剪辑出精彩的视频片段。
发明内容
本申请提供一种视频编辑方法及设备,用于实现基于观看视频的用户因情绪波动发出的唤醒词或唤醒动作,从实时直播视频中剪辑出精彩的视频片段。
第一方面,本申请实施例提供了一种视频编辑方法,该方法可以由第一电子设备执行,该方法包括:首先,在第一电子设备播放第一视频的过程中,第一电子设备从采集装置获取观看第一视频的用户的语音信息和/或用户的第二视频,第一电子设备识别语音信息和/或第二视频中与用户情绪相关的M个关键信息,在第一视频中确定与M个关键信息的采集时间单元相对应的N个第一视频片段,对N个第一视频片段进行编辑,生成编辑后的视频。其中,M和N为正整数,采集装置可以语音采集装置,也可以是图像采集装置,采集装置可以集成在第一电子设备中,也可以是与第一电子设备连接的设备。
其中,关键信息包括如下唤醒词或唤醒动作中的至少一个:唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;唤醒动作包括所述用户因 情绪波动做出的设定肢体动作、设定面部表情。关键信息还可以是声音的分贝大小等自然界中存在的信息,本申请实施例对此并不作限定。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作,触发视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,可有效地改善用户体验。
在一种可能的设计中,该方法还包括:第一电子设备确定N个第一视频片段相对应的第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与M个第二视频片段的采集时段相重叠;然后第一电子设备对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频。
本申请实施例中,通过多窗口播放精彩视频片段以及与用户相关的视频信息,有助于增加视频的趣味性,增加用户与电子设备的互动效果。
在一种可能的设计中,该方法还包括:将所述第一视频分割成L个第一视频片段。然后当识别到所述关键信息时,第一电子设备在L个第一视频片段中与关键信息的采集时间单元相对应的第一视频片段上打点;然后第一电子设备从第一视频中获取第一视频片段的打点信息,根据打点信息,从L个第一视频片段中确定与M个关键信息的采集时间单元相对应的N个第一视频片段。
本申请实施例中,按照上述方法可以从第一视频中获取精彩视频片段,还有助于后续直观的显示打点位置处的视频内容,可以有效改善用户体验。
第二方面,本申请实施例提供了一种视频编辑方法,该方法可以由第一电子设备执行,该方法包括:在第一电子设备播放第一视频的过程中,第一电子设备从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频;然后第一电子设备将语音信息和/或第二视频按照采集时间单元进行划分,得到M个采集时间单元。第一电子设备根据M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定所述M个采集时间单元分别对应的用户情绪评分;第一电子设备根据用户情绪评分,在第一视频中确定与M个采集时间单元对应的L个第一视频片段的精彩度;第一电子设备对L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作,实现对用户的观影情绪的评分,从而评估视频片段的精彩度,完成视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,可有效地改善用户体验。
在一种可能的设计中,确定M个采集时间单元分别对应的用户情绪评分的具体方法包括:第一电子设备按照预设的神经网络模型,识别M个采集时间单元对应的语音信息和/或第二视频中的关键信息;根据识别结果,确定所述M个采集时间单元分别对应的用户情绪评分。
本申请实施例中,按照上述方法评估用户的情绪,有助于准确地获取精彩视频片段。
在一种可能的设计中,该方法还包括:确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠。第一电子设备对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
本申请实施例中,以用户情绪评分反映视频片段本身的精彩度,能够较为客观的反映 所述视频片段的精彩程度。
第三方面,本申请实施例提供了一种视频编辑方法,该方法可以由第二电子设备执行,该方法包括:在第一电子设备播放第一视频的过程中,从采集装置获取观看第一视频的用户的语音信息和/或用户的第二视频,识别语音信息和/或第二视频中的与用户情绪相关的M个关键信息,从第一电子设备的第一视频中获取与M个关键信息的采集时间单元相对应的N个第一视频片段,对N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
本申请实施例中,第二电子设备可以基于观看视频的用户无意识发出的语音或做出的动作,触发剪辑第一电子设备所播放的视频,相比实施例1,该方法并不需要播放视频的设备具有视频编辑功能,通过分布式系统中多个设备合作完成视频剪辑,生成精彩视频片段,有效地改善用户体验。
在一种可能的设计中,该方法还可以包括:第二电子设备确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;具体地,第二电子设备可以对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频。
在一种可能的设计中,第二电子设备可以将所述第一视频分割成L个第一视频片段,当识别到所述关键信息时,在所述L个第一视频片段中与所述关键信息的采集时间单元相对应的第一视频片段上打点;从所述第一视频中获取第一视频片段的打点信息,根据所述打点信息,从所述L个第一视频片段中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段。
在一种可能的设计中,关键信息可以包括如下唤醒词或唤醒动作中的至少一个:
所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
第四方面,本申请实施例提供一种视频编辑方法,该方法可以由第二电子设备执行,该方法包括:在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频;第二电子设备将语音信息和/或第二视频按照采集时间单元进行划分,得到M个采集时间单元;第二电子设备根据M个采集时间单元对应的语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分;第二电子设备从第一电子设备获取与M个采集时间单元对应的第一视频的L个第一视频片段;第二电子设备根据用户情绪评分,在第一视频中确定与M个采集时间单元对应的L个第一视频片段的精彩度;第二电子设备对L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作,实现对用户的观影情绪的评分,从而评估视频片段的精彩度,完成视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,可有效地改善用户体验。
在一种可能的设计中,确定所述M个采集时间单元分别对应的用户情绪评分,包括:
第二电子设备按照预设的神经网络模型,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息;第二电子设备根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
在一种可能的设计中,方法还包括:第二电子设备确定所述N个第一视频片段相对应 的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;具体地,可以对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
在一种可能的设计中,关键信息可以包括如下唤醒词或唤醒动作中的至少一个:
所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
第五方面,本申请实施例提供一种第一电子设备,包括处理器和存储器,其中,存储器用于存储一个或多个计算机程序;当存储器存储的一个或多个计算机程序被处理器执行时,使得该第一电子设备能够实现上述第一方面或第二方面的任意一种可能的设计的方法。
第六方面,本申请实施例提供一种第二电子设备,包括处理器和存储器,其中,存储器用于存储一个或多个计算机程序;当存储器存储的一个或多个计算机程序被处理器执行时,使得该第二电子设备能够实现上述第三方面或第四方面的任意一种可能的设计的方法。
第七方面,本申请实施例还提供一种装置,该装置包括执行上述第一方面或第二方面的任意一种可能的设计的方法的模块/单元。这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第八方面,本申请实施例还提供一种装置,该装置包括执行上述第三方面或第四方面的任意一种可能的设计的方法的模块/单元。这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第九方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在第一电子设备上运行时,使得所述第一电子设备执行上述第一方面或第二方面的任意一种可能的设计的方法。
第十方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在第二电子设备上运行时,使得所述第二电子设备执行上述第三方面或第四方面的任意一种可能的设计的方法。
第十一方面,本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在第一电子设备上运行时,使得所述第一电子设备执行上述第一方面或第二方面的任意一种可能的设计的方法。
第十二方面,本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在第二电子设备上运行时,使得所述第二电子设备执行上述第三方面或第四方面的任意一种可能的设计的方法。
第十三方面,本申请实施例还提供一种芯片,所述芯片与存储器耦合,用于执行所述存储器中存储的计算机程序,以执行上述任一方面的任意一种可能的设计的方法。
以上第三方面至第十三方面中任一方面中的各种设计可以达到的技术效果,请参照上述第一方面或第二方面中各个设计分别可以达到的技术效果描述,这里不再重复赘述。
图1为本申请实施例提供的一种应用场景示意图;
图2为本申请实施例提供的一种设备主体结构示意图;
图3为本申请实施例提供的一种电子设备的软件结构示意图;
图4为本申请实施例提供的一种视频剪辑方法流程示意图;
图5A为本申请实施例提供的另一种应用场景示意图;
图5B为本申请实施例提供的一种大小窗口示意图;
图5C为本申请实施例提供的一种视频剪辑流程示意图;
图6为本申请实施例提供的另一种视频剪辑方法流程示意图;
图7为本申请实施例提供的另一种应用场景示意图;
图8为本申请实施例提供的另一种视频剪辑方法流程示意图;
图9为本申请实施例提供的一种用户情绪评分方式示意图;
图10为本申请实施例提供的另一种视频剪辑方法流程示意图;
图11为本申请实施例提供的一种第一电子设备的结构示意图;
图12为本申请实施例提供的一种第二电子设备的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
如图1所示,为本申请实施例适用的一种系统架构示意图,图1中以一个家庭的局域网的网络为例进行展示,如图1所示,该家庭中具有连接到网络的能力的电子设备包括:智能摄像头、智能音箱、智能电视、手机a1和手机a2。一方面,图1示出的所有电子设备都属于具有连接到网络的能力的电子设备。有些电子设备可能已经与网络建立连接,而有些电子设备可能还未与网络建立过连接,即还未在网络注册过。图1中示出的几种电子设备仅仅是举例,实际应用可能还包括有其它的电子设备,本申请实施例中不做限制。
本申请实施例提供中提供一种装置,用于执行本申请实施例提供的方法,本申请实施例提供的装置可以图1所示的电子设备。例如,本申请实施例的装置可以是一个或多个电子设备,例如具有语音采集功能的设备(如智能音箱)和具有视频播放功能的设备(如手机或智能电视),或者具有图像采集功能的设备(如摄像头)和具有视频播放功能的设备(如手机或智能电视),或者可以是既具有语音和图像采集功能,又具有视频播放功能的设备(如手机或智能电视)。
其中,当具有语音采集功能的设备、具有图像采集功能的设备和具有视频播放功能的设备是不同的设备时,不同的设备之间可以通过有线或无线方式互相连接,具体连接方法包括不限于通用串行总线(Universal Serial Bus,USB)数据线连接、蓝牙、无线高保真(wireless fidelity,Wi-Fi)、Wi-Fi直连(Wi-Fi Direct)、近距离无线通讯技术(Near Field Communication,NFC)、第五代移动通信系统(The Fifth Generation,5G)、全球移动通讯(Global System of Mobile Communication,GSM)系统、码分多址(Code Division Multiple Access,CDMA)系统、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)通用分组无线业务(General Packet Radio Service,GPRS)系统、长期演进(Long Term Evolution,LTE)系统、LTE频分双工(Frequency Division Duplex,FDD)系统、LTE时分双工(Time Division Duplex,TDD)、通用移动通信系统(Universal Mobile Telecommunication System,UMTS)、全球互联微波 接入(Worldwide Interoperability for Microwave Access,WiMAX)等。其中,Wi-Fi Direct,也可以被称为Wi-Fi点对点(Wi-Fi Peer-to-Peer),是一套软件协议,让wifi设备可以不必透过无线网络基地台(Access Point),以点对点的方式,直接与另一个wifi设备连线,进行高速数据传输。
应理解,图1所示的电子设备仅是一个范例,并且电子设备可以具有比图中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
本申请实施例中的电子设备可以是手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线设备、无人驾驶(self driving)中的无线设备、远程医疗(remote medical)中的无线设备、智能电网(smart grid)中的无线设备、运输安全(transportation safety)中的无线设备、智慧城市(smart city)中的无线设备、智慧家庭(smart home)中的无线设备等等。参见图2,为本申请实施例提供的一种电子设备200的硬件结构示意图。
电子设备200可包括处理器210、外部存储器接口220、内部存储器221、通用串行总线(universal serial bus,USB)接口230、充电管理模块240、电源管理模块241,电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器270A、受话器270B、麦克风270C、耳机接口270D、传感器模块280、按键290、马达291、指示器292、摄像头293、显示屏294、以及用户标识模块(subscriber identification module,SIM)卡接口295等。其中传感器模块280可以包括压力传感器280A、陀螺仪传感器280B、气压传感器280C、磁传感器280D、加速度传感器280E、距离传感器280F、接近光传感器280G、指纹传感器280H、温度传感器280J、触摸传感器280K、环境光传感器280L、骨传导传感器280M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(application processor,AP)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
电子设备200通过GPU,显示屏294,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏294和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
电子设备200可以通过ISP、摄像头293、视频编解码器、GPU、显示屏294以及应用处理器等实现拍摄功能。
SIM卡接口295用于连接SIM卡。SIM卡可以通过插入SIM卡接口295,或从SIM 卡接口295拔出,实现和电子设备200的接触和分离。电子设备200可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口295可以支持Nano SIM卡、Micro SIM卡、SIM卡等。同一个SIM卡接口295可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口295也可以兼容不同类型的SIM卡。SIM卡接口295也可以兼容外部存储卡。电子设备200通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备200采用eSIM,即:嵌入式SIM卡。
电子设备200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。电子设备200中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块250可以提供应用在电子设备200上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块250可以包括至少一个滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。移动通信模块250可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块250还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块250的至少部分功能模块可以被设置于处理器210中。在一些实施例中,移动通信模块250的至少部分功能模块可以与处理器210的至少部分模块被设置在同一个器件中。
无线通信模块260可以提供应用在电子设备200上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络)、蓝牙(bluetooth,BT)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外线(infrared radiation,IR)技术等无线通信的解决方案。无线通信模块260可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块260经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器210。无线通信模块260还可以从处理器210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备200的天线1和移动通信模块250耦合,天线2和无线通信模块260耦合,使得电子设备200可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、时分码分多址(time-division code division multiple access,TD-SCDMA)、长期演进(long term evolution,LTE)、BT、GNSS、WLAN、NFC、FM、和/或IR技术等。
电子设备200的结构也可参见图2电子设备200的结构,此处不再赘述。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
电子设备200的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备200的软 件结构。
图3是本发明实施例的电子设备的软件结构框图,该软件架构的软件模块和/或代码可以存储在内部存储器221中,当处理器210运行该软件模块或代码时,执行本申请实施例所提供的跑步姿态检测方法。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图3所示,应用程序包可以包括电话、相机、图库、日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图3所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可 以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。其中,硬件可以指的是各类传感器,例如本申请实施例中涉及的加速度传感器、陀螺仪传感器、触摸传感器、压力传感器等。
现有技术中,虽然在用户主动发出预设唤醒词“小艺小艺”或者“开始剪辑视频”时,会触发设备进行剪辑视频,但是,考虑到用户一旦沉浸观看视频程,很可能会忘记主动发出上述唤醒词相关的语音指令,导致用户错过视频剪辑时机,无法生成精彩视频片段。而当用户沉浸于观看视频时,更容易因为情绪被感染,用户无意识的发出“太美了”、“太壮观了”等语气词,或者用户有鼓掌、跺脚等行为。基于这一发现,因此本申请提供一种视频剪辑方法,该方法利用图像采集装置所采集到的观看视频的用户的视频,或利用语音采集装置所采集到的用户发出的与用户情绪相关的语音、肢体动作或面部表情,触发电子设备从用户观看的视频中截取出精彩视频片段。这样,用户不用主动发出固定的唤醒词,就可以完成视频剪辑,提升用户体验。
实施例1
参见图4,为本申请实施例提供的一种视频剪辑方法流程示意图。该方法可以由图1所示的电子设备实现。以下以第一电子设备执行该方法为例进行说明,如图4所示,该流程包括:
步骤401,第一电子设备在所述第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
其中,采集装置可以包括语音采集装置和图像采集装置等。语音采集装置可以是第一电子设备中音频模块,例如受话器、麦克风等。语音采集装置也可以是与第一电子设备连接的外设设备,如第一电子设备外接的麦克风,或者与第一电子设备无线连接的智能音箱等设备。也就是说,在用户观看第一视频的过程中,语音采集装置会实时地采集用户的语音信息。这样的话,语音采集装置能够采集到用户发出“太棒了”、“太精彩了”等语音信息,或者用户发出鼓掌的声音。另外,语音采集装置还能够采集到第一视频播放过程中,第一视频的语音信息。以足球比赛的直播视频播放过程为例,在一个足球比较中较为精彩的部分通常是进球场面,这时比赛视频中通常播放观众欢呼鼓掌的声音,若第一电子设备采用扬声器播放视频,则语音采集装置可以采集到观众欢呼鼓掌的声音。
示例性地,如图5A所示,用户在观看智能电视播放的足球比赛的直播视频过程中,可能发出“太棒了!”等感叹语句,这时,智能电视的音频模块(如麦克风)或智能音箱可以采集到用户在视频播放时段所发出的语音信息。
步骤402,第一电子设备识别语音信息和/或第二视频中与用户情绪相关的M个关键信息,从第一视频中确定与M个关键信息的采集时间单元相对应的N个第一视频片段。
其中,关键信息可以包括关键词和关键动作中的至少一个。
一种可能的实施例中,第一电子设备从语音采集装置获取到语音信息之后,第一电子设备基于预设的语音识别模型,对语音信息进行识别,例如通过声纹识别,从中识别出用户的语音信息,并将识别出来的用户的语音信息与预设语音模板进行匹配,从而确定出该 语音信息中是否存在与用户相关的唤醒词,以及用户发出该唤醒词对应的语音的采集时间单元,继而,第一电子设备确定该采集时间单元对应的第一视频的第一视频片段。其中,唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音(如鼓掌的声音)、用发出的设定语音信息(如各种感叹词)。预设语音模板可以是预先训练生成的与用户情绪相关的语音信息,利用鼓掌的声音,庆祝的声音、各种感叹词等。需要说明的是,关键信息还可以是声音的分贝大小等自然界中存在的信息,本申请实施例对此并不作限定。
在一种可能的实施例中,第一电子设备从图像采集装置获取到第二视频之后,第一电子设备基于预设的图像识别模型,对图像信息进行识别,从中识别出用户的肢体动作或表情等,并将识别出来的用户的肢体动作和表情与预先存储的设定肢体动作模板或设定面部表情模板进行匹配,从而确定出该第二视频中是否存在与用户相关的唤醒动作,以及用户做出该唤醒动作对应的采集时间单元,继而,第一电子设备确定该采集时间单元对应的第一视频的第一视频片段。需要说明的是,本申请实施例还可以将上述两种可能的实施例进行结合,从而确定出N个第一视频片段。
其中,第一电子设备确定采集时间单元对应的第一视频的第一视频片段的一种可能的方式是:第一电子设备可以预先将第一视频分割成L个第一视频片段,可以在关键信息的采集时间单元对应的第一视频的第一视频片段上打点;然后第一电子设备可以第一视频播放结束后,或者在第一视频的部分视频播放结束后,从第一视频中获取第一视频片段的打点信息,根据打点信息,从L个第一视频片段中确定与M个唤醒词的采集时间单元对应的所述第一视频的N个第一视频片段。也就是说,第一电子设备以固定时长(例如10秒)分割第一视频,这样,第一视频就被分割成多个第一视频片段,因此,第一电子设备就可以唤醒词的采集时间单元对应的第一视频的第一视频片段上打点。
示例性地,用户在足球比赛的直播视频播放过程中,当观看到进球画面时,发出“太棒了!”这一感叹语句,智能电视对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内存在用户的该唤醒词,从而智能电视对北京时间9点45分11秒之前的10秒内智能电视所播放的足球比赛的直播视频片段打点,之后智能电视根据打点信息确定出被打点的第一视频片段。因为在进球之前,也就是在北京时间9点45分11秒之前的10秒内,很可能中锋逐个突围对方阻扰后,最后一脚射门的精彩时段。所以按照上述方法可以剪辑出足球比赛的直播视频的精彩视频片段。
需要说明的是,本申请实施例中并不限定所述打点信息中包括的打点位置的数量,可以是一个也可以是多个。另外,关键信息的采集时间单元与第一视频片段的时间单元之间的对应关系可能存在多种情况:第一种可能的情况,关键信息的采集时间单元与视频片段的时间单元相同,例如,在北京时间9点45分10秒至北京时间9点45分11秒检测的唤醒词“太棒了”,智能电视可以剪辑北京时间9点45分10秒至北京时间9点45分11秒这一秒内的视频片段;第二种可能的情况,视频片段的时间单元包含关键信息的采集单元,也就是说,在北京时间9点45分10秒至北京时间9点45分11秒检测的唤醒词“太棒了”,智能电视可以剪辑出北京时间9点45分11秒之前的10秒内的视频片段。或者,在北京时间11点30分10秒至北京时间11点30分11秒检测的唤醒词“开始了”,智能电视可以剪辑出北京时间11点30分11秒之后的10秒内的视频片段。本申请实施例对此并不作具体限定,可以根据实际经验确定关键信息的采集时间单元与视频片段的时间单元之间的关系。
步骤403,第一电子设备对N个第一视频片段进行编辑,生成编辑后的视频,其中, M和N为正整数。
可选地,在第一电子设备生成剪辑后的视频之后,用户可以将第一电子设备上的该视频分享至其他用户的电子设备,也可以分享至社交网络,如朋友圈等。
具体地,一种可能的方式是:第一电子设备可以将全部或者部分关键信息的采集时间单元对应的第一视频的第一视频片段进行拼接组合,合成精彩的视频片段。
在另一种可能的方式是:第一电子设备还可以确定N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与M个第二视频片段的采集时段相重叠,然后第一电子设备对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。其中,图像采集装置可以是第一电子设备中摄像头。图像采集装置也可以是与第一电子设备连接的外设设备,如第一电子设备外接的摄像头,或者与第一电子设备无线连接的智能摄像头等设备。也就是说,在用户观看第一视频的过程中,图像采集装置会实时地采集用户的图像信息。这样的话,图像采集装置能够采集到用户鼓掌的动作等图像信息,从而生成第二视频。
示例性地,用户在足球比赛的直播视频播放过程中,智能电视对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内存在用户的唤醒词,从而智能电视不仅确定在北京时间9点45分10秒至北京时间9点45分11秒这一时段内,或者北京时间9点45分10秒之后的10秒内,智能电视所播放的足球比赛的直播视频片段,而且智能电视还确定在北京时间9点45分10秒至北京时间9点45分11秒这一时段,第二视频中的第二视频片段,最终,智能电视可以将第一视频中的第一视频片段和第二视频中的第二视频片段进行拼接组合,合成可以多窗口播放的精彩视频片段,示例性地,最终合成的可以多窗口播放的精彩视频片段可以如图5B所示。该方法通过多窗口播放精彩视频片段,有助于增加视频的趣味性。
结合图5C来说,用户发出设定语音信息或作出设定肢体动作时,会触发第一电子设备执行如下步骤:步骤501,第一电子设备识别到采集装置所采集到的语音或图像信息中的关键信息;步骤502,第一电子设备一方面从摄像头所采集的图像信息中获取摄像头缓存数据(即上文中的第二视频中的10s长度的第二视频片段),另一方面,第一电子设备获取缓存直播数据(即上文中的第一视频中的10s长度的第一视频片段);步骤503,第一电子设备生成精彩的视频片段文件,或者生成多个图片;步骤504,第一电子设备获取关联设备信息,例如用户的朋友的设备信息;步骤505,第一电子设备向关联设备分享链接。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作,触发视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,有效地改善了用户体验。
实施例2
参见图6,为本申请实施例提供的一种视频剪辑方法流程示意图。该方法可以由图1所示的至少两个电子设备共同实现。以下以第一电子设备和第二电子设备执行该方法为例进行说明,如图6所示,该流程包括:
步骤601,第二电子设备在第一电子设备播放第一视频的过程中,从采集装置获取观看第一视频的用户的语音信息。
其中,该采集装置可以包括语音采集装置和图像采集装置等。其中,语音采集装置可 以是第二电子设备中音频模块,也可以是通过有线或无线连接的外部设备,图像采集装置也可以是与第二电子设备连接的外设设备,如第二电子设备外接的摄像头,或者与第二电子设备无线连接的智能摄像头等设备,具体参见上述实施例1的描述。
示例性地,如图7所示,用户在观看智能电视播放的足球比赛的直播视频过程中,可能发出“太棒了!”等感叹语句,这时,智能电视的音频模块(如麦克风)或智能音箱可以采集到用户在视频播放时段所发出的语音信息,用户的手机可以从语音采集装置获取该语音信息。
步骤602,第二电子设备识别所述语音信息和/或所述第二视频中与用户情绪相关的M个关键信息。
具体识别M个关键信息的方式可以参见上述步骤402,在此不再重复赘述。
示例性地,用户在足球比赛的直播视频播放过程中,发出“太棒了!”这一感叹语句,智能电视对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内存在用户的唤醒词。
步骤603,第二电子设备从第一电子设备的第一视频中获取与M个关键信息的采集时间单元相对应的N个第一视频片段。
具体地,第二电子设备确定采集时间单元对应的第一视频的第一视频片段的一种可能的方式是:第二电子设备可以预先将第一视频分割成L个第一视频片段,可以在关键信息的采集时间单元对应的第一视频的第一视频片段上打点;然后第二电子设备可以第一视频播放结束后,或者在第一视频的部分视频播放结束后,从第一视频中获取第一视频片段的打点信息,根据打点信息,从L个第一视频片段中确定与M个唤醒词的采集时间单元对应的所述第一视频的N个第一视频片段。也就是说,第二电子设备以固定时长(例如10秒)分割第一视频,这样,第一视频就被分割成多个第一视频片段,因此,第二电子设备就可以唤醒词的采集时间单元对应的第一视频的第一视频片段上打点。
需要说明的是,本申请实施例中并不限定所述打点信息中包括的打点位置的数量,可以是一个也可以是多个。
示例性地,如图7所示,用户在足球比赛的直播视频播放过程中,发出“太棒了!”这一感叹语句,手机对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内存在用户的唤醒词,从而手机从智能电视获取足球比赛的直播视频,并且手机在北京时间9点45分10秒至北京时间9点45分11秒这一时段内,或者北京时间9点45分10秒之后的10秒内,智能电视所播放的足球比赛的直播视频片段上打点,之后手机根据打点信息确定出被打点的第一视频片段。
步骤604,第二电子设备对N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
可选地,在第二电子设备生成剪辑后的视频之后,用户可以将第二电子设备上的该视频分享至其他用户的电子设备,也可以分享至社交网络,如朋友圈等。
具体地,一种可能的方式是:第二电子设备可以将全部或者部分关键信息的采集时间单元对应的第一视频的第一视频片段进行拼接组合,合成精彩的视频片段。
在另一种可能的方式是:第二电子设备还可以确定N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;第二电子设备对N个第一视频片段和M个第二视频片段进行编 辑,生成编辑后的视频,其中,M和N为正整数。
示例性地,用户在足球比赛的直播视频播放过程中,手机对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内存在用户的唤醒词,从而手机不仅确定在北京时间9点45分10秒至北京时间9点45分11秒这一时段内,智能电视所播放的足球比赛的直播视频片段,而且手机还从智能摄像头采集的第二视频中确定在北京时间9点45分10秒至北京时间9点45分11秒这一时段内,或者北京时间9点45分10秒之后的10秒内,第二视频中的第二视频片段,最终,手机可以将第一视频中的第一视频片段和第二视频中的第二视频片段进行拼接组合,合成可以多窗口播放的精彩视频片段,示例性地,最终合成的可以多窗口播放的精彩视频片段可以如图5B所示。
本申请实施例中,第二电子设备可以基于观看视频的用户的无意识发出的语音或做出的动作,触发剪辑第一电子设备所播放的视频,相比实施例1,该方法并不需要播放视频的设备具有视频编辑功能,通过分布式系统中多个设备合作完成视频剪辑,生成精彩视频片段,有效地改善用户体验。
实施例3
参见图8,为本申请实施例提供的另一种视频剪辑方法流程示意图。该方法可以由图1所示的电子设备实现。以下以第一电子设备执行该方法为例进行说明,如图8所示,该流程包括:
步骤801,在第一电子设备播放第一视频的过程中,第一电子设备从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
其中,该采集装置可以包括语音采集装置和图像采集装置等。其中,语音采集装置可以是第二电子设备中音频模块,也可以是通过有线或无线连接的外部设备,图像采集装置也可以是与第二电子设备连接的外设设备,如第二电子设备外接的摄像头,或者与第二电子设备无线连接的智能摄像头等设备,具体参见上述实施例1的描述。
步骤802,第一电子设备将语音信息和/或第二视频按照采集时间单元进行划分,识别M个采集时间单元对应的语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分。
具体地,第一电子设备确定M个采集时间单元分别对应的用户情绪评分的方法可以采用如下任意一下方式:
方式一,第一电子设备对语音信息中的唤醒词进行识别,根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
也就是说,第一电子设备从语音采集装置获取到语音信息之后,第一电子设备基于预设的语音识别模型,对语音信息进行识别,例如声纹识别,从中识别出用户的语音信息,第一电子设备在基于预设的神经网络模型,确定每个采集时间单元所对应的用户情绪评分。
示例性地,若第一电子设备识别出第一采集时间单元(如北京时间9点45分10秒至北京时间9点45分20秒)内包括用户发出的语音信息“太棒了!”,则该第一采集时间单元的用户情绪评分为9分;若第一电子设备识别出第二采集时间单元(北京时间9点45分20秒至北京时间9点45分30秒)内不包括用户发出的语音信息,则该第二采集时间单元的用户情绪评分为0分。再比如,若第一电子设备识别出第三采集时间单元(如北京时间10点45分10秒至北京时间10点45分20秒)内,智能电视的扬声器发出欢呼鼓掌 的声音,则该第三采集时间单元的用户情绪评分为9分;若第一电子设备识别出第四采集时间单元(北京时间10点45分20秒至北京时间10点45分30秒)内不包括任何语音信息,则该第四采集时间单元的用户情绪评分为0分。
方式二,第一电子设备对第二视频中关键动作进行识别;根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
也就是说,如图9,第一电子设备从图像采集装置获取到第二视频之后,第一电子设备基于预设的图像识别模型,对第二视频进行识别,从中识别出用户的表情、动作或语言中至少一个,第一电子设备在基于预设的神经网络模型,确定每个采集时间单元所对应的用户情绪评分。示例性地,若第一电子设备识别出第一采集时间单元(如北京时间9点45分10秒至北京时间9点45分20秒)内包括用户大笑的表情,则该第一采集时间单元的用户情绪评分为9分;若第一电子设备识别出第二采集时间单元(北京时间9点45分20秒至北京时间9点45分30秒)内用户表情平淡,则该第二采集时间单元的用户情绪评分为0分。
方式三,第一电子设备对语音信息中的唤醒词进行识别和对所述第二视频中关键动作中至少一个信息进行识别,根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
也就是说,第一电子设备结合上述实施例1和实施例2中的方式对语音信息和第二视频进行识别,综合识别结果,确定每个采集时间单元分别对应的用户情绪评分。
步骤803,第一电子设备根据与M个采集时间单元对应的用户情绪评分,从第一视频中确定L个第一视频片段分别对应的精彩度。
具体地,第一电子设备可以通过预设的函数,将M个采集时间单元对应的用户情绪评分转换为精彩度,本申请实施例并不限定函数的表征方式,凡是可以用户情绪评分转换为精彩度的函数均适用于本申请实施例。
示例性地,用户在足球比赛的直播视频播放过程中,发出“太棒了!”这一感叹语句,智能电视对获取的语音信息识别后,确定在北京时间9点45分10秒至北京时间9点45分11秒这一时间段内用户情绪评分9分,从而智能电视确定在北京时间9点45分10秒至北京时间9点45分11秒这一时段内,或者北京时间9点45分10秒之后的10秒内,智能电视所播放的足球比赛的直播视频片段的精彩度为9分。
步骤804,第一电子设备对第一视频的L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
可选地,在第一电子设备生成剪辑后的视频之后,用户可以将第一电子设备上的该视频分享至其他用户的电子设备,也可以分享至社交网络,如朋友圈等。
具体地,第一电子设备可以采用以下任意一种方式剪辑视频:
一种可能的方式是:第一电子设备可以将全部或者部分精彩度大于设定阈值的第一视频片段进行拼接组合,合成精彩的视频片段。
在另一种可能的方式是:第一电子设备还可以确定N个第一视频片段相对应的第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与M个第二视频片段的采集时段相重叠,然后第一电子设备对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。具体示例可以参见上述实施例1。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作, 实现对用户的观影情绪的评分,从而评估视频片段的精彩度,完成视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,有效地改善了用户体验。
实施例4
参见图10,为本申请实施例提供的另一种视频剪辑方法流程示意图。该方法可以由图1所示的至少两个电子设备共同实现。以下以第一电子设备和第二电子设备执行该方法为例进行说明,如图10所示,该流程包括:
步骤1001,第二电子设备在第一电子设备播放第一视频的过程中,从采集装置获取观看第一视频的用户的语音信息和/或用户的第二视频。
其中,语音采集装置可以是第二电子设备中音频模块,也可以是通过有线或无线连接的外部设备,具体参见上述步骤401的描述。也就是说,在用户观看第一电子设备播放的第一视频的过程中,第二电子设备可以从语音采集装置获取用户的语音信息或第一视频的音频信息。
另外,图像采集装置可以是第二电子设备中摄像头。图像采集装置也可以是与第二电子设备连接的外设设备,如第二电子设备外接的摄像头,或者与第二电子设备无线连接的智能摄像头等设备。也就是说,在用户观看第一视频的过程中,图像采集装置可以实时地采集用户的图像信息。这样的话,图像采集装置能够采集到用户鼓掌的动作等图像信息,从而生成第二视频。
示例性地,如图7所示,用户在观看智能电视播放的足球比赛的直播视频过程中,可能发出“太棒了!”等感叹语句,这时,智能电视的音频模块(如麦克风)或智能音箱可以采集到用户在视频播放时段所发出的语音信息,用户的手机可以从语音采集装置获取该语音信息。
步骤1002,第二电子设备将语音信息和/或第二视频按照采集时间单元进行划分,识别M个采集时间单元对应的语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分。
具体地,第二电子设备确定M个采集时间单元分别对应的用户情绪评分的方法可以采用如下任意一下方式:
方式一,第二电子设备对语音信息中的唤醒词进行识别,根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
方式二,第二电子设备对第二视频中用户的语言、表情和肢体动作中至少一个信息进行识别;根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
方式三,第二电子设备对语音信息中的唤醒词进行识别和对所述第二视频中用户的语言、表情和肢体动作中至少一个信息进行识别,根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
上述方式和具体示例可以参见上述步骤802。
步骤1003,第二电子设备从第一电子设备的第一视频中获取与M个采集时间单元对应的L个第一视频片段。
步骤1004,第二电子设备根据与M个采集时间单元对应的用户情绪评分,确定所述第一视频的L个第一视频片段分别对应的精彩度。
具体地,第二电子设备可以通过预设的函数,将M个采集时间单元对应的用户情绪评 分转换为精彩度,本申请实施例并不限定函数的表征方式,凡是可以用户情绪评分转换为精彩度的函数均适用于本申请实施例。
步骤1005,第二电子设备对第一视频的L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
可选地,在第二电子设备生成剪辑后的视频之后,用户可以将第二电子设备上的该视频分享至其他用户的电子设备,也可以分享至社交网络,如朋友圈等。
具体地,第二电子设备可以采用上述步骤804提供的任意一种方式剪辑视频,在此不再重复描述。
本申请实施例中,电子设备基于观看视频的用户的无意识发出的语音或做出的动作,实现对用户的观影情绪的评分,从而评估视频片段的精彩度,完成视频剪辑,该方法并不需要用户的主动触发视频剪辑,就可以生成精彩视频片段,有效地改善了用户体验。相比实施例3,该方法并不需要播放视频的设备具有视频编辑功能,通过分布式系统中多个设备合作完成视频剪辑,生成精彩视频片段,有效地改善了用户体验。
基于与方法实施例1和实施例3的同一发明构思,本发明实施例提供一种第一电子设备,具体用于实现上述实施例1和实施例3中第一电子设备执行的方法,该第一电子设备的结构如图11所示,包括播放单元1101、获取单元1102、确定单元1103和剪辑单元1104.
当第一电子设备拥有实现上述实施例1中第一电子设备执行的方法时,第一电子设备中各个模块单元执行如下动作:
所述播放单元1101,用于播放第一视频。
所述获取单元1102,用于在所述第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
所述确定单元1103,用于识别所述语音信息和/或所述第二视频中与用户情绪相关的M个关键信息,确定与所述M个关键信息的采集时间单元相对应的所述第一视频的N个第一视频片段。
所述剪辑单元1104,用于对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
在一种可能的实施例中,所述确定单元1103,还用于确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;
所述剪辑单元1104,还用于对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
在一种可能的实施例中,所述确定单元1103,具体用于:将所述第一视频分割成L个第一视频片段;当识别到所述关键信息时,在与所述关键信息的采集时间单元相对应的所述第一视频的第一视频片段上打点;
从所述第一视频中获取第一视频片段的打点信息,根据所述打点信息,从所述L个第一视频片段中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段。
其中,关键信息包括如下唤醒词或唤醒动作中的至少一个:
所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
当第一电子设备用于实现上述实施例3中第一电子设备执行的方法时,第一电子设备中各个模块单元执行如下动作:
所述播放单元1101,用于播放第一视频。
所述获取单元1102,用于在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
所述确定单元1103,用于将所述语音信息和/或第二视频按照采集时间单元进行划分,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分;根据与M个采集时间单元对应的用户情绪评分,确定所述第一视频的L个第一视频片段分别对应的精彩度。
所述剪辑单元1104,用于对第一视频的L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
在一种可能的实施例中,所述确定单元1103,用于按照预设的神经网络模型,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息;
根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
在一种可能的实施例中,所述确定单元1103,用于确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;对N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
其中,关键信息包括如下唤醒词或唤醒动作中的至少一个:
所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
基于与方法实施例2和实施例4的同一发明构思,本发明实施例还提供一种第二电子设备,具体用于实现上述实施例2和实施例4中第二电子设备执行的方法,该第二电子设备的结构如图12所示,包括获取单元1201、确定单元1202和剪辑单元1203,其中:
当第二电子设备用于实现上述实施例2中第二电子设备执行的方法时,第二电子设备中各个模块单元执行如下动作:
所述获取单元1201,用于在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
所述确定单元1202,用于识别所述语音信息和/或所述第二视频中与用户情绪相关的M个关键信息。
所述获取单元1201,还用于从所述第一电子设备获取与所述M个关键信息的采集时间单元相对应的所述第一视频的N个第一视频片段。
所述剪辑单元1203,用于对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
在一种可能的实施例中,所述确定单元1202,还用于确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述剪辑单元1203,还用于对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频。
在一种可能的实施例中,所述确定单元1202,还用于将所述第一视频分割成L个第一视频片段;当识别到所述关键信息时,在与所述关键信息的采集时间单元相对应的所述第一视频的第一视频片段上打点;从所述第一视频中获取第一视频片段的打点信息,根据所述打点信息,从所述L个第一视频片段中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段。
当第二电子设备用于实现上述实施例4中第二电子设备执行的方法时,第二电子设备中各个模块单元执行如下动作:
所述获取单元1201,用于在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频。
所述确定单元1202,用于将所述语音信息和/或第二视频按照采集时间单元进行划分,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分。
所述获取单元1201,还用于从所述第一电子设备获取与所述M个采集时间单元对应的所述第一视频的L个第一视频片段。
所述确定单元1202,还用于根据与M个采集时间单元对应的用户情绪评分,确定所述第一视频的L个第一视频片段分别对应的精彩度。
所述剪辑单元1203,用于对所述第一视频的L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
在一种可能的实施例中,所述确定单元1202,用于,按照预设的神经网络模型,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息;
根据识别结果,确定M个采集时间单元分别对应的用户情绪评分。
在一种可能的实施例中,所述确定单元1202,还用于确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述剪辑单元1203,还用于对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
其中,关键信息包括如下唤醒词或唤醒动作中的至少一个:
所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
本实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述实施例所执行的一个或多个步骤,以实现上述实施例中的方法。
本实施例还提供了一种程序产品,当该程序产品在计算机上运行时,使得计算机执行上述实施例中的一个或多个步骤,以实现上述实施例中的方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片系统,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述实施例中的一个或多个步骤,以实现上述实施例中的方法。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述 功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。
Claims (18)
- 一种视频编辑方法,应用于第一电子设备,其特征在于,所述方法包括:在所述第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频;识别所述语音信息和/或所述第二视频中与用户情绪相关的M个关键信息,在所述第一视频中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段;对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述对所述N个第一视频片段进行编辑,生成编辑后的视频,包括:对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:将所述第一视频分割成L个第一视频片段;在所述第一视频中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段,包括:当识别到所述关键信息时,在所述L个第一视频片段中与所述关键信息的采集时间单元相对应的第一视频片段上打点;从所述第一视频中获取第一视频片段的打点信息,根据所述打点信息,从所述L个第一视频片段中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述关键信息包括如下唤醒词或唤醒动作中的至少一个:所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
- 一种视频编辑方法,应用于第一电子设备,其特征在于,所述方法包括:在所述第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频;将所述语音信息和/或第二视频按照采集时间单元进行划分,得到M个采集时间单元;根据所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定所述M个采集时间单元分别对应的用户情绪评分;根据所述用户情绪评分,在所述第一视频中确定与M个采集时间单元对应的L个第一视频片段的精彩度;对所述L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
- 根据权利要求5所述的方法,其特征在于,根据所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定所述M个采集时间单元分别对应的用户情绪评分,包括:按照预设的神经网络模型,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息;根据识别结果,确定所述M个采集时间单元分别对应的用户情绪评分。
- 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,N为正整数,包括:对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
- 根据权利要求5至7任一项所述的方法,其特征在于,所述关键信息包括如下唤醒词或唤醒动作中的至少一个:所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
- 一种视频编辑方法,应用第二电子设备,其特征在于,所述方法包括:在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的语音信息和/或所述用户的第二视频;识别所述语音信息和/或所述第二视频中与用户情绪相关的M个关键信息;从所述第一视频中获取与所述M个关键信息的采集时间单元相对应的N个第一视频片段;对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
- 根据权利要求9所述的方法,其特征在于,所述方法还包括:确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述对所述N个第一视频片段进行编辑,生成编辑后的视频,包括:对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频。
- 根据权利要求9或10所述的方法,其特征在于,所述方法还包括:将所述第一视频分割成L个第一视频片段;从所述第一视频中获取与所述M个关键信息的采集时间单元相对应的N个第一视频片段,包括:当识别到所述关键信息时,在所述L个第一视频片段中与所述关键信息的采集时间单元相对应的第一视频片段上打点;从所述第一视频中获取第一视频片段的打点信息,根据所述打点信息,从所述L个第一视频片段中确定与所述M个关键信息的采集时间单元相对应的N个第一视频片段。
- 根据权利要求9至11任一项所述的方法,其特征在于,所述关键信息包括如下唤醒词或唤醒动作中的至少一个:所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
- 一种视频编辑方法,应用于第二电子设备,其特征在于,所述方法包括:在第一电子设备播放第一视频的过程中,从采集装置获取观看所述第一视频的用户的 语音信息和/或所述用户的第二视频;将所述语音信息和/或第二视频按照采集时间单元进行划分,得到M个采集时间单元;根据所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定所述M个采集时间单元分别对应的用户情绪评分;从所述第一视频中获取与所述M个采集时间单元对应的L个第一视频片段;根据所述用户情绪评分,确定所述L个第一视频片段的精彩度;对所述L个第一视频片段中的精彩度大于设定阈值的N个第一视频片段进行编辑,生成编辑后的视频,其中,M、L和N为正整数。
- 根据权利要求13所述的方法,其特征在于,根据所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定所述M个采集时间单元分别对应的用户情绪评分,包括:按照预设的神经网络模型,识别所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息;根据所述M个采集时间单元对应的所述语音信息和/或第二视频中的关键信息,确定M个采集时间单元分别对应的用户情绪评分。
- 根据权利要求13或14所述的方法,其特征在于,所述方法还包括:确定所述N个第一视频片段相对应的所述第二视频的M个第二视频片段;其中,N个第一视频片段的播放时段与所述M个第二视频片段的采集时段相重叠;所述对所述N个第一视频片段进行编辑,生成编辑后的视频,其中,N为正整数,包括:对所述N个第一视频片段和M个第二视频片段进行编辑,生成编辑后的视频,其中,M和N为正整数。
- 根据权利要求13至15任一项所述的方法,其特征在于,所述关键信息包括如下唤醒词或唤醒动作中的至少一个:所述唤醒词包括所述用户因情绪波动做出设定肢体动作所发出的声音、发出的设定语音信息;所述唤醒动作包括所述用户因情绪波动做出的设定肢体动作、设定面部表情。
- 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;所述存储器存储有程序指令;所述处理器用于运行所述存储器存储的所述程序指令,使得所述电子设备执行如权利要求1至16任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括程序指令,当所述程序指令在电子设备上运行时,使得所述电子设备执行如权利要求1至16任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010909167.5A CN114205534A (zh) | 2020-09-02 | 2020-09-02 | 一种视频编辑方法及设备 |
CN202010909167.5 | 2020-09-02 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022048347A1 true WO2022048347A1 (zh) | 2022-03-10 |
Family
ID=80492124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/108646 WO2022048347A1 (zh) | 2020-09-02 | 2021-07-27 | 一种视频编辑方法及设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114205534A (zh) |
WO (1) | WO2022048347A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114786059A (zh) * | 2022-04-25 | 2022-07-22 | 中国平安人寿保险股份有限公司 | 视频生成方法、视频生成装置、电子设备、存储介质 |
WO2023239562A1 (en) * | 2022-06-06 | 2023-12-14 | Cerence Operating Company | Emotion-aware voice assistant |
CN118633939A (zh) * | 2024-08-12 | 2024-09-13 | 沈阳康泰电子科技股份有限公司 | 一种多模态情感识别方法及系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118488263A (zh) * | 2023-02-10 | 2024-08-13 | Oppo广东移动通信有限公司 | 视频编辑方法、装置、电子设备及计算机可读介质 |
CN116684665B (zh) * | 2023-06-27 | 2024-03-12 | 广东星云开物科技股份有限公司 | 娃娃机精彩片段的剪辑方法、装置、终端设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130004138A1 (en) * | 2011-06-30 | 2013-01-03 | Hulu Llc | Commenting Correlated To Temporal Point Of Video Data |
CN103609128A (zh) * | 2011-06-17 | 2014-02-26 | 微软公司 | 基于环境传感的视频精彩片段标识 |
CN105872765A (zh) * | 2015-12-29 | 2016-08-17 | 乐视致新电子科技(天津)有限公司 | 制作视频集锦的方法、装置、电子设备、服务器及系统 |
CN107241622A (zh) * | 2016-03-29 | 2017-10-10 | 北京三星通信技术研究有限公司 | 视频定位处理方法、终端设备及云端服务器 |
CN107809673A (zh) * | 2016-09-09 | 2018-03-16 | 索尼公司 | 根据情绪状态检测处理视频内容的系统和方法 |
CN110381367A (zh) * | 2019-07-10 | 2019-10-25 | 咪咕文化科技有限公司 | 一种视频处理方法、设备及计算机可读存储介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104837036B (zh) * | 2014-03-18 | 2018-04-10 | 腾讯科技(北京)有限公司 | 生成视频看点的方法、服务器、终端及系统 |
-
2020
- 2020-09-02 CN CN202010909167.5A patent/CN114205534A/zh active Pending
-
2021
- 2021-07-27 WO PCT/CN2021/108646 patent/WO2022048347A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103609128A (zh) * | 2011-06-17 | 2014-02-26 | 微软公司 | 基于环境传感的视频精彩片段标识 |
US20130004138A1 (en) * | 2011-06-30 | 2013-01-03 | Hulu Llc | Commenting Correlated To Temporal Point Of Video Data |
CN105872765A (zh) * | 2015-12-29 | 2016-08-17 | 乐视致新电子科技(天津)有限公司 | 制作视频集锦的方法、装置、电子设备、服务器及系统 |
CN107241622A (zh) * | 2016-03-29 | 2017-10-10 | 北京三星通信技术研究有限公司 | 视频定位处理方法、终端设备及云端服务器 |
CN107809673A (zh) * | 2016-09-09 | 2018-03-16 | 索尼公司 | 根据情绪状态检测处理视频内容的系统和方法 |
CN110381367A (zh) * | 2019-07-10 | 2019-10-25 | 咪咕文化科技有限公司 | 一种视频处理方法、设备及计算机可读存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114786059A (zh) * | 2022-04-25 | 2022-07-22 | 中国平安人寿保险股份有限公司 | 视频生成方法、视频生成装置、电子设备、存储介质 |
CN114786059B (zh) * | 2022-04-25 | 2023-06-20 | 中国平安人寿保险股份有限公司 | 视频生成方法、视频生成装置、电子设备、存储介质 |
WO2023239562A1 (en) * | 2022-06-06 | 2023-12-14 | Cerence Operating Company | Emotion-aware voice assistant |
CN118633939A (zh) * | 2024-08-12 | 2024-09-13 | 沈阳康泰电子科技股份有限公司 | 一种多模态情感识别方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN114205534A (zh) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022048347A1 (zh) | 一种视频编辑方法及设备 | |
US11227626B1 (en) | Audio response messages | |
US11716301B2 (en) | Generating interactive messages with asynchronous media content | |
CN112397062B (zh) | 语音交互方法、装置、终端及存储介质 | |
US10659684B2 (en) | Apparatus and method for providing dynamic panorama function | |
EP3631798B1 (en) | Voice driven dynamic menus | |
CN107925799B (zh) | 用于生成视频内容的方法和设备 | |
KR20160026317A (ko) | 음성 녹음 방법 및 장치 | |
CN115668957B (zh) | 音频检测和字幕呈现 | |
CN114173000B (zh) | 一种回复消息的方法、电子设备和系统、存储介质 | |
US11695899B2 (en) | Subtitle presentation based on volume control | |
US11908489B2 (en) | Tap to advance by subtitles | |
CN115037975B (zh) | 一种视频配音的方法、相关设备以及计算机可读存储介质 | |
CN117133281B (zh) | 语音识别方法和电子设备 | |
CN117478818B (zh) | 语音通话方法、终端和存储介质 | |
WO2024113999A1 (zh) | 游戏管理的方法及终端设备 | |
US20240244298A1 (en) | Video sound control | |
US20240251130A1 (en) | Video notification system | |
US20230419559A1 (en) | Double camera streams | |
CN118227718A (zh) | 一种轨迹播放方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21863415 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21863415 Country of ref document: EP Kind code of ref document: A1 |