WO2022253053A1 - 一种播放视频的方法及装置 - Google Patents

一种播放视频的方法及装置 Download PDF

Info

Publication number
WO2022253053A1
WO2022253053A1 PCT/CN2022/094784 CN2022094784W WO2022253053A1 WO 2022253053 A1 WO2022253053 A1 WO 2022253053A1 CN 2022094784 W CN2022094784 W CN 2022094784W WO 2022253053 A1 WO2022253053 A1 WO 2022253053A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
video
playback
playback speed
speed
Prior art date
Application number
PCT/CN2022/094784
Other languages
English (en)
French (fr)
Inventor
张雪莲
庄琰
蔡佳
唐少华
王小龙
魏何
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22815100.7A priority Critical patent/EP4329320A1/en
Publication of WO2022253053A1 publication Critical patent/WO2022253053A1/zh
Priority to US18/521,881 priority patent/US20240107092A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/005Reproducing at a different information rate from the information rate of recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4532Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Definitions

  • the embodiments of the present application relate to the field of video processing, and in particular, to a method and device for playing video.
  • the constant speed function is generally launched on domestic and foreign video playback platforms.
  • the user selects the constant speed function on the playback interface and selects the desired playback speed, and the electronic device plays the video according to the playback speed selected by the user.
  • the constant speed function may cause dizziness and sharp voice for fast-paced clips in the video. For slow-paced clips, it may not meet the user's viewing speed requirements. Users need to manually switch the playback speed frequently to meet their own needs. need.
  • video adaptive speed change solutions such as video adaptive speed change solutions based on big data statistics, picture content, voice speed, or voice and picture quality.
  • the video speed-up solution based on big data statistics requires a large amount of historical user data as support, and it is not possible to cold-start adaptive speed change for newly launched videos.
  • the video double-speed playback solution Based on the video image content, or voice speed, or picture content and voice speed as reference information, the video double-speed playback solution only determines the final playback video playback speed based on the reference information. The final playback speed is absolutely affected by the reference information, without considering the user. individual needs.
  • the current video adaptive speed change scheme still needs to be improved, so as to meet the individual needs of users at the same time.
  • the method for playing video realizes the playback speed of adaptive variable speed, and at the same time considers the user's playback settings, which improves the user's perception experience of adaptive variable speed playback of video.
  • a method for playing a video which is applied to an electronic device, and the method may include: acquiring a first playback speed; acquiring first information, where the first information includes image information of the video and/or voice information of the video; The video is played at the second playback speed, and the second playback speed is obtained based on the first playback speed and the first information.
  • the second playback speed of the final video is determined based on the first playback speed related to the user's individual needs and the first information.
  • the video content and user needs are taken into account to achieve adaptive speed change , while the overall video playback time is close to the user's needs, adaptive speed change is realized according to the video screen content, sound speed and other information, which improves the user's viewing experience.
  • the above-mentioned first playback speed is related to the user's playback settings, so as to meet the personalized needs of the user.
  • the playback speed of the video may represent the playback magnification or the playback duration of each frame of the video, or may represent the ratio of the duration required to play the entire video at this speed to the duration required to play the video at 1x speed.
  • the second playback speed can represent a speed sequence to correspond to the playback speed of each frame in the video, and the playback speeds of different frames can be the same or different.
  • the first duration of playing the video at the first playback speed is different from the second duration of playing the video at the second playback speed.
  • the first playback speed can be played at a fixed speed set by the user. The speed of the video, which reflects how long the user wants to watch the video.
  • the playback speed may vary according to the video content due to the consideration of the video content, speech rate and other information. changes, so the second duration of playing the entire video will be different from the first duration.
  • the difference between the second duration and the duration of playing the video at the playback speed R0 specified by the user is less than or equal to the threshold value, so that the adaptive speed change meets the user's requirements for the overall playback duration. needs and improve the user experience.
  • obtaining the first playback speed may specifically be implemented as: displaying the first interface according to the obtained first operation information.
  • the first interface includes at least two options, one option indicates a playback speed; second operation information is obtained; and the first playback speed is determined according to the second operation information and at least two options.
  • the playback speed indicated by the option selected by the user in the first interface through the second operation information is determined as the first playback speed, realizing that the first playback speed is the playback speed selected by the user.
  • the first operation information may be the operation information that triggers the display of the first interface, or the first operation may be the operation information that triggers the selection of the playback speed.
  • the first interface displays at least two selection items indicating different playback speeds, so that the user can select a playback speed on this interface.
  • acquiring the first playback speed may specifically be implemented as: displaying the first interface according to the acquired first operation information.
  • the first interface includes the first speed; according to the acquired second operation information, display the second interface, wherein the second interface includes the second speed; and determine the first playback speed according to the second speed.
  • the second speed in the second interface triggered by the user through the second operation information is determined as the first playback speed, realizing that the first playback speed is the playback speed selected by the user.
  • the first operation information in this implementation manner may be operation information for invoking a menu
  • the first interface is an interface presenting the current first speed in the current video playback interface.
  • the second operation information is operation information for adjusting the playback speed, and the second operation information may be selecting a second speed from multiple options, or the second operation information may be stepping to determine the second speed.
  • obtaining the first playback speed may specifically be implemented as follows: according to the obtained first operation information, stop playing the previous video of the video and start playing the video; according to the playback speed of the previous video, , to determine the first playback speed.
  • the playback speed of the original video is determined as the first playback speed, realizing that the first playback speed is the playback speed that the user is accustomed to.
  • the playback speed of the original video may be the first playback speed of the original video, or may be the second playback speed of the original video.
  • obtaining the second playback speed based on the first playback speed and the first information includes: determining a corresponding third playback speed according to each type of information in the first information; All third playback speeds determine the second playback speed.
  • All third playback speeds determine the second playback speed.
  • the second playback speed is obtained based on the first playback speed and the first information, including: determining a corresponding third playback speed according to each type of information in the first information; and part of the third playback speed to determine the second playback speed.
  • first information respectively determine the third playback speed corresponding to each type of information, and then determine the second playback speed according to the selected third playback speed and the first playback speed, so that the determined second playback speed conforms to
  • the user setting also conforms to the characteristics of each first information of the video to be played, which can well improve the user's visual experience. This screening can filter out the third playback speed that obviously does not meet the conditions, and can improve the efficiency of determining the second playback speed.
  • the third playback speed corresponding to the first information may include a theoretical playback magnification value or a maximum allowable playback magnification value for each frame in the video to be played determined by the first information.
  • the playback magnification of a frame in the second playback speed is less than or equal to the playback magnification of the same frame in any third playback speed including the maximum allowable value of the playback magnification.
  • the third playback speed is determined by a plurality of first information, and the plurality of third playback speeds are fused, so that the second playback speed obtained by the final fusion can reflect the requirements of each first information in the playback magnification, and can also restrain certain If the playback magnification determined by the first information is too high, the viewing experience of other first information will be reduced, so as to provide users with a viewing experience with complete information and comfortable audio and video.
  • each piece of first information corresponds to a third playback speed
  • the above-mentioned third playback speed for determining the second playback speed is called an alternative third playback speed (all third playback speeds or Part of the third playback speed), according to the alternative third playback speed and the first playback speed, determine the second playback speed.
  • the fusion operation is carried out on the alternative third playback speed of the value to obtain the fourth playback speed; the fusion operation is performed on the alternative third playback speed including the maximum allowable value of the playback magnification to obtain the fifth playback speed; the fourth playback speed, the fifth playback speed
  • the playback speed is numerically optimized according to the first playback speed R 0 to obtain the second playback speed.
  • This implementation method provides a specific method for obtaining the second playback speed.
  • the third playback speed is fused through the fusion operation to improve the accuracy and effectiveness of the theoretical value of the playback magnification and the maximum allowable value of the playback magnification in the playback speed. , and then perform numerical optimization according to the relevant first playback speed R 0 set by the user, so that the final second playback speed can not only meet the user's desired speed change ratio, but also have high accuracy and effectiveness.
  • the fourth playback speed and the fifth playback speed are numerically optimized according to the first playback speed R 0 to obtain the second playback speed, which can be specifically implemented as: the fourth playback speed, the fifth playback speed 5.
  • the playback speed and R 0 are input to the objective function for numerical optimization, and the playback speed that minimizes the objective function is used as the second playback speed.
  • the objective function is used to describe the degree of satisfaction of the playback speed obtained according to the fourth playback speed and the fifth playback speed to R 0 .
  • the smaller the value of the objective function the closer the playback speed obtained according to the fourth playback speed and the fifth playback speed is to R.
  • the numerical optimization is realized through the objective function, which improves the feasibility and accuracy of the scheme, and can ensure that the determined second playback speed is the highest. excellent.
  • different playback speeds can be obtained according to the fourth playback speed and the fifth playback speed by adjusting the preset parameters of the objective function.
  • the third playback speed corresponding to the image information of the video includes the third playback speed corresponding to the theoretical value of the playback under multiple different playback speeds of the image information of the video, according to the alternative third
  • the playback speed and the first playback speed, and obtaining the second playback speed can be specifically implemented as: the third playback speed corresponding to the image information of the video at multiple different playback speeds, and the third playback speed corresponding to other first information respectively Perform a fusion operation, or perform a fusion operation on the third playback speed corresponding to the image information of the video at multiple different playback speeds, and the third playback speed corresponding to other first information including the theoretical value of the playback magnification, to obtain multiple
  • the fourth playback speed; the fusion operation is performed on each third playback speed including the maximum allowable value of the playback magnification to obtain the fifth playback speed; the fourth playback speed, respectively, the fifth playback speed and the first playback speed R 0 , input
  • the objective function uses the playback speed that minimizes the objective
  • the objective function is used to describe the degree of satisfaction of the playback speed obtained according to the fourth playback speed and the fifth playback speed to R 0 .
  • This implementation method provides a specific method for obtaining the second playback speed. By configuring multiple different playback speeds in advance, multiple third playback speeds corresponding to the image information of the video are obtained, and then multiple fourth playback speeds are obtained by fusion. Afterwards, the objective function is substituted to obtain the final second playback speed. The solution is simple and efficient, and the processing efficiency and speed are improved.
  • the above objective function may satisfy the following expression: argmin s E speed (S, V)+ ⁇ E rate (S,R 0 )+ ⁇ E smooth (S′,n)+ ⁇ E A ( S, A). in:
  • argmin s means selecting the second playback speed S to make the function value the lowest, and ⁇ , ⁇ , ⁇ are preset parameters.
  • E speed (S,V) is used to control the low acceleration segment close to the minimum playback magnification R min specified by the user, is the normalized playback magnification of frame t in the fourth playback speed, S(t) is the playback magnification of frame t in the second playback speed; ⁇ is a preset parameter.
  • E rate (S,R 0 ) is used to control the overall speed ratio close to R 0
  • T is the total number of frames in the video to be played.
  • E smooth (S′,n) is used to control the smoothness of the second playback speed
  • n is the smoothing width of the objective function
  • E A (S, A) is used to control the second playback speed not to exceed the playback magnification of the same frame in the fifth playback speed, if A(t)>0 and S(t)>A(t); A(t) is the playback magnification of the tth frame in the fifth playback speed.
  • the fusion operation includes selecting a playback magnification between the maximum playback magnification and the minimum playback magnification of the same frame among different third playback speeds, so as to realize the fusion of multiple third playback speeds into A playback speed.
  • the third playback speed participating in the fusion operation includes the maximum allowable value of the playback magnification
  • the above-mentioned fusion operation includes: if it is different from the third playback speed, the minimum playback magnification of the same frame is the maximum allowable value of the playback magnification Value, select the minimum playback magnification and maximum allowable value of the frame in the third playback speed participating in fusion; if it is different from the third playback speed, the minimum playback magnification of the same frame is the theoretical value of playback magnification, select the third playback speed participating in fusion
  • the calculated value of the minimum playback magnification, the maximum allowable value and the minimum playback magnification of the frame in the speed, the calculation value can be the average value, or the maximum value, or the minimum value, or others.
  • the third playback speed participating in the fusion operation does not include the maximum allowable value of the playback magnification.
  • the calculated value of the theoretical value of the magnification, the calculated value can be the average value, or the maximum value, or the minimum value, or others.
  • the foregoing first information may further include content that the user is interested in.
  • the content that the user is interested in may include at least one of the following information: character description information of the video, content description information of the video, and content structure information of the video.
  • the third playback speed corresponding to the content that the user is interested in may include a theoretical value of playback magnification for each frame of the video to be played.
  • the character description information of the video is used to indicate the information of the character in the video that the user is interested in, and the character may be an actor or a role played or others.
  • the content description information of the video is used to indicate the plot or content in the video that the user is interested in.
  • the content structure information of the video is used to indicate information about chapters or positions in the video that the user is interested in.
  • the speed at which the frame related to the content of interest to the user in the video is played at the second playback speed is not faster than the speed at which the frame is played at the first playback speed, so as to realize the content of interest to the user.
  • the content is played at a slow speed to improve the user experience.
  • the above-mentioned first information may further include first playback mode information of the video to be played, where the first playback mode information is associated with playback size information corresponding to the video.
  • the playback size information corresponding to the video may be used to indicate the display ratio or display size of the video, such as full screen display or small window display.
  • the first playback mode information is used to determine the theoretical value of the playback speed of the video. When the playback size is large, a higher playback speed can be used, and when the playback size is small, a smaller playback speed can be used, so that the user can watch the video clearly. content.
  • the above-mentioned first information may further include second play mode information of the video to be played, where the second play mode information is associated with the definition information of the video.
  • the definition information of the video may be used to indicate the playback resolution of the video, such as high-definition mode, Blu-ray mode, stream-saving mode, and the like.
  • the second playback mode information is used to determine the theoretical value of the playback speed of the video. When the video definition is high, a higher playback speed can be used, and when the video definition is low, a smaller playback speed can be used, so that Users can watch video content clearly.
  • the foregoing first information may further include motion state information of the electronic device.
  • the motion state information may be used to indicate the moving speed of the electronic device or the pose relative to the user, and the like.
  • the motion state information is used to determine the theoretical value of the playback speed of the video.
  • a smaller playback speed can be used; when the moving speed of the electronic device is low or at an angle A higher playback speed may be used at an angle that is convenient for the user to watch, so that the user can watch the video content clearly.
  • the first information may further include noise intensity information of the electronic device.
  • the noise intensity information of the electronic equipment may be used to indicate the environmental interference degree of the electronic equipment.
  • the noise intensity information of the electronic device is used to determine the theoretical value of the playback speed of the video. When the noise intensity of the electronic device is high, a lower playback speed can be used, and when the noise intensity of the electronic device is low, a higher playback speed can be used. , so that users can clearly hear the voice in the video.
  • the foregoing first information may further include user viewpoint information.
  • the user's viewpoint information may be used to indicate where the user's gaze falls when watching a video, which reflects the user's interest.
  • the user viewpoint information is used to determine the theoretical value of the playback speed of the video.
  • the user viewpoint information indicates that the user can use a lower playback speed when watching the video for a long time, and the user viewpoint information indicates that the user can use a higher playback speed when there are unwatched videos.
  • the playback speed of the video so that the video playback speed meets the user's viewpoint information.
  • the foregoing first information may further include connection status information of the audio playback device.
  • the audio playback device may be an earphone or a speaker.
  • the connection status information of the audio playback device is used to indicate whether the audio playback device is connected. When connected, the user has high sensitivity to video and voice and is not easily disturbed by the external environment; when not connected, the user has low sensitivity to video and voice and is easy to interfered by the external environment.
  • the connection status information of the audio playback device is used to determine the theoretical value of the playback speed of the video.
  • connection status information of the audio playback device is that when it is connected, a higher playback speed can be used, and the connection status information of the audio playback device is When not connected, a smaller playback speed can be used so that users can clearly hear the voice in the video.
  • the foregoing first information may further include network state information.
  • the network status information is used to indicate the quality or type of the network that the electronic device is connected to. When the electronic device is connected to a high-quality network, the video playback is smooth, otherwise the video playback is stuck.
  • the network status information is used to determine the theoretical value of the playback speed of the video. The network status information indicates that the quality of the network connected to the electronic device can be higher, and the network status information indicates that the quality of the network connected to the electronic device is low. , you can use a smaller playback speed to avoid stuttering when the user watches the video.
  • the first information above may further include environment information
  • the environment information may include internal state information of the device playing the video to be played or external environment information of the device playing the video to be played.
  • the third playback speed corresponding to the external environment information includes the maximum allowable value of the playback magnification of each frame in the video to be played determined by the external environment information; the third playback speed corresponding to the internal state information includes the to-be-played video determined by the internal state information The maximum allowable playback magnification of each frame in the video.
  • the third playback speed corresponding to the image information of the video includes a theoretical value of the playback magnification of each frame in the video determined by the moving speed of the object in the screen.
  • the third playback speed corresponding to the voice information of the video includes the maximum allowed value of the playback magnification of each frame in the video to be played determined by the voice speed.
  • the above-mentioned theoretical value of the playback magnification may include: different theoretical values of the playback magnification correspond to different degrees of the content that the user is interested in, or different theoretical values of the playback magnification correspond to different intervals of the target moving speed.
  • the higher the degree of conforming to the content that the user is interested in the smaller the corresponding theoretical value of the playback magnification; the faster the moving speed of the target, the smaller the corresponding theoretical value of the playback magnification.
  • the determined playback speed is guaranteed to satisfy the user's interest, or the determined second playback speed can ensure the user's visual experience.
  • the method for playing a video provided in this application may further include: storing the second playback speed corresponding to the video to be played, so that other devices can obtain the video to be played and the second playback speed , to play the video to be played at the second playback speed.
  • an apparatus for playing video may include a first acquiring unit, a second acquiring unit, and a playing unit. in:
  • the first acquiring unit is configured to acquire the first playback speed.
  • the first playback speed is related to user playback settings.
  • the second acquiring unit is configured to acquire first information, where the first information includes video image information and/or video voice information.
  • the playback unit is used to play the video at a second playback speed, and the second playback speed is obtained based on the first playback speed and the first information.
  • the second playback speed of the final video is determined based on the first playback speed related to the user's personalized settings and the first information, so that the user's needs for the overall playback time of the video and the overall video playback time are taken into account.
  • the playback look and feel of the video screen, speech rate and other content improves the user experience when watching the video.
  • the present application provides an electronic device that can implement the functions in the method example described in the first aspect above, and the functions can be implemented by hardware, or by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules with corresponding functions above.
  • the electronic device may exist in the product form of a chip.
  • the structure of the electronic device includes a processor and a transceiver, and the processor is configured to support the electronic device to perform corresponding functions in the foregoing method.
  • the transceiver is used to support communication between the electronic device and other devices.
  • the electronic device may also include a memory, which is used to be coupled with the processor, and stores necessary program instructions and data of the electronic device.
  • a computer-readable storage medium including instructions, which, when run on a computer, cause the computer to execute the method for playing video provided in the above-mentioned first aspect or any possible implementation thereof.
  • a fifth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for playing video provided in the above first aspect or any possible implementation thereof.
  • the present application provides a system-on-a-chip, which includes a processor and may further include a memory, configured to implement corresponding functions in the above methods.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the present application provides a system for playing video, the system includes a first device, the first device may be the electronic device described in the third aspect, and the electronic device has the above-mentioned first aspect and any possible implementation function.
  • the system for playing video may also include a second device, which is used to obtain the second playback speed of the video to be played from the first device, and play the video to be played at the second playback speed .
  • Figure 1 is a schematic diagram of the survey results
  • FIG. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a software structure of an electronic device provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a convolutional neural network (convolutional neuron network, CNN) network provided by the embodiment of the present application;
  • FIG. 6 is a schematic diagram of a hardware structure of a chip provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a setting interface of an electronic device provided in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a playback interface provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a playback interface provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a method for playing a video provided in an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a playback interface provided by an embodiment of the present application.
  • Fig. 12 is a schematic diagram of the playback interface provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of the playback interface provided by the embodiment of the present application.
  • FIG. 14 is a schematic diagram of the playback interface provided by the embodiment of the present application.
  • Fig. 15 is a schematic diagram of an acquisition method of a third playback speed corresponding to different first information provided in an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a scene where playback speed is fused according to an embodiment of the present application.
  • FIG. 17 is a schematic diagram of a fusion scene of another playback speed provided by the embodiment of the present application.
  • FIG. 18 is a schematic flowchart of a method for playing a video provided in an embodiment of the present application.
  • Fig. 19 is a comparative schematic diagram of a speed change curve provided by the embodiment of the present application.
  • Fig. 20 is a schematic diagram of an adaptive shifting curve provided by the embodiment of the present application.
  • FIG. 21 is another adaptive shift curve provided by the embodiment of the present application.
  • FIG. 22 is a schematic structural diagram of a device for playing video provided in an embodiment of the present application.
  • FIG. 23 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • the technical features described in “first” and “second” have no sequence or order of magnitude.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. To be precise, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner for easy understanding.
  • At least one can also be described as one or more, and multiple can be two, three, four or more, which is not limited in this application.
  • the network architecture and scenarios described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute limitations on the technical solutions provided by the embodiments of the present application.
  • the evolution of the network architecture and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.
  • Video refers to a dynamic continuous image sequence and a voice sequence corresponding to the image.
  • the image information of a video refers to a sequence of images (also referred to as frames) included in a video.
  • the image information of the video is a collection of static pictures.
  • the voice information of the video refers to the voice sequence included in the video, and the voice segment corresponding to each image frame in the video is regarded as a voice frame, and all the voice frames in the video constitute the voice information of the video.
  • the playback speed of the image frame can indicate the number of frames played per unit time (frame rate) when playing a video, or the duration of playing one frame.
  • frame rate the duration of playing each frame of the video
  • the original playback frame rate of the video can be obtained.
  • the original playback frame rate of the video is an attribute parameter of the video, and the video can be played at the original playback frame rate by default.
  • the video playback speed refers to a playback speed sequence composed of the playback speed (or frame rate) of each frame (image frame and voice frame) in the video, which may be constant or variable.
  • electronic devices such as terminals
  • electronic devices can watch online videos, such as videos of film and television programs, online education videos, surveillance videos, etc., and provide users with double-speed functions during the viewing process, so that users can increase or decrease the playback speed of videos according to their personal preferences, and change the speed of videos. Playing time.
  • the electronic device may be a smart phone, a tablet computer, a wearable device, an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, and the like.
  • the present application does not limit the specific form of the electronic device.
  • Wearable devices can also be called wearable smart devices, which is a general term for the application of wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is worn directly on the body or integrated into the user's clothing or accessories. Wearable devices are not only a hardware device, but also achieve powerful functions through software support, data interaction, and cloud interaction.
  • Generalized wearable smart devices include full-featured, large-sized, complete or partial functions without relying on smart phones, such as smart watches or smart glasses, etc., and only focus on a certain type of application functions, and need to cooperate with other devices such as smart phones Use, such as various smart bracelets and smart jewelry for physical sign monitoring.
  • the structure of the electronic device may be as shown in FIG. 2 .
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , a display screen 194, and a subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure shown in this embodiment does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the processor 110 may obtain the first playback speed, optionally, the first playback speed is related to the user's playback settings; obtain the first information, the first information includes the image information of the video and/or the voice of the video information; and obtaining a second playback speed based on the first playback speed and the first information.
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 communicates with the camera 193 through the CSI interface to realize the shooting function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to realize the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193 , the display screen 194 , the wireless communication module 160 , the audio module 170 , the sensor module 180 and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface conforming to the USB standard specification, specifically, it can be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100 , and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules shown in this embodiment is only for schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be disposed in the processor 110 .
  • the power management module 141 and the charging management module 140 can also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, a baseband processor, and the like.
  • the electronic device 100 realizes the display function through the GPU, the display screen 194 , and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oled, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • a series of graphical user interfaces can be displayed on the display screen 194 of the electronic device 100, and these GUIs are the main screen of the electronic device 100.
  • GUI graphical user interface
  • the size of the display screen 194 of the electronic device 100 is fixed, and only limited controls can be displayed on the display screen 194 of the electronic device 100 .
  • a control is a GUI element, which is a software component contained in an application that controls all data processed by the application and the interaction of these data. Users can interact with the control through direct manipulation. , so as to read or edit the relevant information of the application.
  • controls may include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, and Widgets.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be realized through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 .
  • the processor 110 may acquire the playback speed of the video played by the electronic device 100 by executing the instructions stored in the internal memory 121 to execute the method for playing a video provided in this application.
  • the internal memory 121 may include an area for storing programs and an area for storing data. Wherein, the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data created during the use of the electronic device 100 (such as audio data, phonebook, etc.) and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions through the audio module 170 , the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, voice playback, music playback, recording, etc. in videos.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 may be set in the processor 110 , or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also referred to as a "horn" is used to convert audio electrical signals into sound signals.
  • Electronic device 100 can listen to music through speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the receiver 170B can be placed close to the human ear to receive the voice.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 170C to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In some other embodiments, the electronic device 100 may be provided with two microphones 170C, which may also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions, etc.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface, or other.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • pressure sensors 180A such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors.
  • a capacitive pressure sensor may be comprised of at least two parallel plates with conductive material.
  • the electronic device 100 determines the intensity of pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view short messages is executed. When a touch operation whose intensity is greater than or equal to the first pressure threshold acts on the icon of the short message application, the instruction of creating a new short message is executed.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the gyro sensor 180B can also determine whether the electronic device 100 is in a moving state.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the electronic device 100 when the electronic device 100 is a clamshell machine, the electronic device 100 can detect opening and closing of the clamshell according to the magnetic sensor 180D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc. It can also be used to determine whether the electronic device 100 is in a moving state.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 may measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F for distance measurement to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user is holding the electronic device 100 close to the ear to make a call, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, automatic unlock and lock screen in pocket mode.
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access to application locks, take pictures with fingerprints, answer incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the electronic device 100 may reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to prevent the electronic device 100 from being shut down abnormally due to the low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 180K is also called “touch device”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
  • the bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure beating signal. In some embodiments, the bone conduction sensor 180M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.
  • the keys 190 include a power key, a volume key and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate a vibrating reminder.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 may also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • an operating system runs on top of the above components.
  • the iOS operating system developed by Apple the Android open source operating system developed by Google
  • the Windows operating system developed by Microsoft the Windows operating system developed by Microsoft.
  • Applications can be installed and run on this operating system.
  • the operating system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture or other architectures.
  • the embodiment of the present application takes the Android system with layered architecture as an example to illustrate the software structure of the electronic device 100.
  • FIG. 3 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the camera application can access the camera interface management service provided by the application framework layer.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 3, the application framework layer can include window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the electronic device 100 . For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 4, H.264, MP3, advanced audio coding (advanced audio coding, AAC), adaptive multi-rate (adaptive multi rate, AMR), joint photo graphic experts group (JPEG), portable network graphic format (PNG), etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • a two-dimensional (2dimensions, 2D) graphics engine is a graphics engine for 2D graphics.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the method for playing a video provided in the embodiment of the present application can be applied to a scene where a user uses an electronic device to play a video.
  • the video playing method provided in the embodiment of the present application may also be applied to a scene where a video server preprocesses a video.
  • the video server can configure an adaptive playback speed for the video it provides, and provide the configured adaptive playback speed when other devices acquire the video, so that other devices can choose to play the acquired video at the adaptive playback speed.
  • the workflow of the software and hardware of the electronic device 100 will be exemplarily described below in conjunction with FIG. 2 and the scene where the user uses the electronic device to play a video.
  • the touch sensor 180K of the electronic device 100 receives the user's touch operation on the playback speed "2.0X" and reports it to the processor 110, so that the processor 110 responds to the above touch operation and sets
  • the video currently played by the electronic device 100 is displayed on the display screen 194 at a frame rate twice that of the original playback frame rate.
  • it is used to select a fixed double frame rate to play the video through touch operation, that is, the aforementioned constant double speed.
  • the constant speed function may cause dizziness and sharp voice for fast-paced clips in the video.
  • For slow-paced clips it may not meet the user's viewing speed requirements. Users need to manually switch the double speed frequently to meet their own needs. Therefore, many video adaptive speed change schemes have emerged as the times require.
  • the video speed-up solution based on big data statistics requires a large amount of historical user data as support, and it is not possible to cold-start adaptive speed change for newly launched videos.
  • the double-speed video playback solution based on picture content has certain application value for specific scenes that focus on picture information (such as security, sports, etc.), but it is of application value for audio-visual scenes (there are information intake and perception requirements for pictures and voices). not tall.
  • the multi-speed video playback solution based on voice speed only adaptively adjusts the video speed ratio according to the fastest speech speed that humans can understand, and cannot consider the visual experience of the screen in the video.
  • the double-speed video playback solution based on voice speed and picture quality considers that the clips with too loud sound and the clips with large picture shaking in the video are secondary and can be accelerated quickly, while other clips can be accelerated slowly.
  • There are two kinds of information on the picture but the voice and picture that are skipped by Kuaidi have almost no information, and the application scenarios are very limited. For most film and television works, the quality of audio and picture is high, and this solution cannot effectively change the speed.
  • the final playback speed is absolutely affected by the reference information (video image content, or voice speed, or picture content and voice speed), and does not consider the relevant playback speed set by the user , therefore, the user's visual experience needs to be improved.
  • the present application provides a method for playing a video, which specifically includes: determining a second playback speed for playing the video based on the first playback speed related to the user playback setting and the image and/or voice information of the video.
  • determining a second playback speed for playing the video based on the first playback speed related to the user playback setting and the image and/or voice information of the video.
  • the method for playing video provided in the embodiment of the present application involves video processing.
  • data processing methods such as data training, machine learning, and deep learning can be applied to symbolize and formalize the training data (such as the first information in the video) Intelligent information modeling, extraction, preprocessing, training, etc., to finally obtain a trained video understanding network; and, the method for playing video provided in the embodiment of the present application can use the above-mentioned trained video understanding network (video image understanding network, Video speech understanding network), input the input data (such as the video to be played in this application) into the trained video understanding network, and obtain the output data (the third playback speed corresponding to the first information).
  • video image understanding network Video speech understanding network
  • video image understanding network or video speech understanding network training method and video playback method provided in the embodiment of the present application are inventions based on the same idea, and can also be understood as two parts in one system, or one Two stages of the overall process: such as the model training stage and the model application stage.
  • neural network neural network, NN
  • Neural network is a machine learning model, which is a machine learning technology that simulates the neural network of the human brain to achieve artificial intelligence.
  • the input and output of the neural network can be configured according to actual needs, and the neural network can be trained through sample data so that the error between its output and the real output corresponding to the sample data is minimized.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
  • W s is the weight of x s
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function may be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • a deep neural network also known as a multilayer neural network
  • DNN can be understood as a neural network with many hidden layers, and there is no special metric for the "many” here.
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • the coefficient of the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as It should be noted that the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, the more complex the model with more parameters, the greater the "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • CNN is a deep neural network with convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of a convolutional layer and a subsampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolutional feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. That means that the image information learned in one part can also be used in another part. So for all positions on the image, the same learned image information can be used.
  • multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Recurrent neural networks are used to process sequence data.
  • RNN Recurrent neural networks
  • the layers are fully connected, and each node in each layer is unconnected.
  • this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as that of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is expanded into the network, then the parameters, such as W, are shared; while the above-mentioned traditional neural network is not the case.
  • the output of each step depends not only on the network of the current step, but also depends on the state of the previous several steps of the network.
  • the learning algorithm is called the time-based backpropagation algorithm Back propagation Through Time (BPTT).
  • BPTT Back propagation Through Time
  • an embodiment of the present invention provides a system architecture 500 .
  • the data collection device 560 is used to collect training data.
  • the training data includes: the first information of the video (the first information includes the image information of the video and/or the voice information of the video) and store the training data in the database 530, the training equipment 520 obtains the target model/rule 501 based on the training data training maintained in the database 530, the target model/rule 501 can be the video understanding network described in the embodiment of the present application, to be The playing video is input into the target model/rule 501, and the third playing speed corresponding to the first information of the video to be played can be obtained, and the third playing speed is used to describe the playback magnification of each frame in the video to be played.
  • the video understanding network is obtained through training.
  • the training data maintained in the database 530 may not all be collected by the data collection device 560, but may also be received from other devices.
  • the training device 520 does not necessarily perform the training of the target model/rule 501 based entirely on the training data maintained by the database 530, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.
  • the target model/rule 501 trained according to the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in FIG. 4 , which can be an electronic device, such as a mobile phone terminal, a tablet computer , laptops, AR/VR, vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 510 is configured with an I/O interface 512 for data interaction with external devices.
  • the user can input data to the I/O interface 512 through the client device 540.
  • the input data is described in the embodiment of this application. can include the video to be played.
  • the execution device 510 can call the data, codes, etc. Stored in the data storage system 550.
  • the I/O interface 512 returns the processing result, such as the obtained second playback speed of the video to be played, to the client device 540, and the client device 540 plays the video to be played according to the second playback speed, providing the user with complete information, audio and video. Draw a comfortable viewing experience.
  • the training device 520 can generate corresponding target models/rules 501 based on different training data for different first information of videos, and the corresponding target models/rules 501 can be used to obtain different first information corresponding to The third playback speed of .
  • the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 512 .
  • the client device 540 can automatically send the input data to the I/O interface 512 . If the client device 540 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 540 .
  • the client device 540 can also be used as a data collection terminal, collecting input data from the input I/O interface 512 and output results from the output I/O interface 512 as shown in FIG. 4 as new sample data, and storing them in the database 530 .
  • the I/O interface 512 directly uses the input data of the input I/O interface 512 as shown in Figure 4 and the output result of the output I/O interface 512 as a new Sample data is stored in database 530 .
  • FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 550 is an external memory relative to the execution device 510 , and in other cases, the data storage system 550 may also be placed in the execution device 510 .
  • the executing device 510 and the client device 540 may be deployed centrally as one device.
  • the method and device provided by the embodiment of the present application can also be used to expand the training database. As shown in FIG. To the database 530, so that the training data maintained by the database 530 is more abundant, so as to provide more abundant training data for the training work of the training device 520.
  • a target model/rule 501 is trained according to a training device 520 , and the target model/rule 501 may be a video understanding network in this embodiment of the application.
  • the video understanding network provided in the embodiment of the present application may be a convolutional neural network, a recurrent neural network, or others.
  • the function of the video image understanding network is to determine whether the image satisfies the condition for the image information of the video.
  • the video image understanding network will predict whether each frame of image satisfies the condition, configure a smaller theoretical value of playback magnification for the image that meets the condition, and configure a larger theoretical value of playback magnification for the image that does not meet the condition, and the playback of each frame of image
  • the magnification is output as the third playback speed corresponding to the image of the video to be played.
  • the specific value of the playback magnification configured according to whether the condition is satisfied can be configured according to actual needs, and is not limited in this embodiment of the present application.
  • the function of the video image understanding network is to determine the degree to which the image satisfies a condition for video image information.
  • the video image understanding network will predict the degree to which each frame of image satisfies the condition, configure a smaller playback magnification for the image that fully meets the condition, configure a larger playback magnification for the image that does not meet the condition at all, and configure the image that partially meets the condition.
  • the playback magnification of each frame of image is output as the third playback speed corresponding to the image information of the video to be played.
  • the specific values of the playback magnification configured in different degrees that meet the conditions can be configured according to actual needs, which is not limited in this embodiment of the present application.
  • the function of the video image understanding network is to predict the moving speed of the object in the image according to the image information.
  • the video image understanding network will predict the speed of the target in each frame of the image, configure a smaller playback magnification for the image with faster target movement, and configure a larger playback magnification for the image with a relatively static target.
  • the playback magnification is output as the third playback speed corresponding to the image information of the video to be played.
  • the specific value of the playback magnification configured at different speeds for the movement of the object in the image can be configured according to actual needs, which is not limited in this embodiment of the present application.
  • the third playback speed corresponding to the image information may be a relative suggestion for the playback magnification between image frames, and does not constrain the final variable speed magnification.
  • the target in the image may be the fastest moving target in the image, or the target in the image may be a target in the central area of the image, and the embodiment of the present application does not limit the target in the image.
  • the central area of the image may be configured according to actual requirements, which is not limited in this embodiment of the present application.
  • the function of the video-speech understanding network is to predict the speech rate for the speech information of the video.
  • the video speech understanding network predicts the speech rate in the speech frame corresponding to each image frame, obtains the highest human comfort speech rate tolerance value according to statistics, calculates the highest playable magnification corresponding to each frame, and uses the highest playback rate of each frame of speech
  • the magnification is output as the third playback speed corresponding to the voice information of the video to be played.
  • the third playback speed corresponding to the voice information can be an absolute limit on the playback magnification, indicating the highest recommended value for the final playback speed. If the playback magnification exceeds this playback magnification, the video will not look good.
  • the convolutional neural network is a deep neural network with a convolutional structure, and it is a deep learning architecture.
  • the deep learning architecture refers to the algorithm through machine learning. Multiple levels of learning are performed on the abstraction level.
  • CNN is a feed-forward artificial neural network in which individual neurons can respond to images input into it.
  • a convolutional neural network (CNN) 600 may include an input layer 610 , a convolutional layer/pooling layer 620 (where the pooling layer is optional), and a neural network layer 630 .
  • the convolutional layer/pooling layer 620 may include layers 621-626 as examples, for example: in one implementation, the 621st layer is a convolutional layer, the 622nd layer is a pooling layer, and the 623rd layer is a volume Multilayer, 624 is a pooling layer, 625 is a convolutional layer, 626 is a pooling layer; in another implementation, 621, 622 is a convolutional layer, 623 is a pooling layer, 624, 625 is a convolutional layer Layer, 626 is a pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 621 can include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. During the convolution operation on the image, the weight matrix is usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image. ...It depends on the value of the stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row ⁇ column) are applied, That is, multiple matrices of the same shape.
  • the output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to filter unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple feature maps of the same size are combined to form the convolution operation. output.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 600 can make correct predictions .
  • the initial convolutional layer (such as 621) often extracts more general features, which can also be referred to as low-level features;
  • the features extracted by the later convolutional layers (such as 626) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • pooling layer can be followed by one layer of convolutional layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of pooling layers is to reduce the spatial size of the image.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling an input image to obtain an image of a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of maximum pooling. Also, just like the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image.
  • the size of the image output after being processed by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 600 After being processed by the convolutional layer/pooling layer 620, the convolutional neural network 600 is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer 620 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 600 needs to use the neural network layer 630 to generate one or a group of outputs with the required number of classes. Therefore, the neural network layer 630 may include multiple hidden layers (631, 632 to 63n as shown in FIG. 5 ) and an output layer 640, and the parameters contained in the multi-layer hidden layers may be based on specific task types. The related training data is pre-trained. For example, the task type can include image recognition, image classification, image super-resolution reconstruction, and so on.
  • the output layer 640 which has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error.
  • the convolutional neural network 600 shown in FIG. 5 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.
  • a chip hardware structure provided by the embodiment of the present application is introduced below.
  • FIG. 6 is a chip hardware structure provided by an embodiment of the present invention, and the chip includes a neural network processor (NPU) 70 .
  • the chip can be set in the execution device 510 shown in FIG. 4 to complete the computing work of the computing module 511 .
  • the chip can also be set in the training device 520 shown in FIG. 4 to complete the training work of the training device 520 and output the target model/rule 501 .
  • the algorithms of each layer in the convolutional neural network shown in FIG. 5 can be implemented in the chip shown in FIG. 6 .
  • the NPU 70 is mounted on the main central processing unit (central processing unit, CPU) (Host CPU) as a coprocessor, and the tasks are assigned by the Host CPU.
  • the core part of the NPU is the operation circuit 70, and the controller 704 controls the operation circuit 703 to extract data in the memory (weight memory or input memory) and perform operations.
  • the operation circuit 703 includes multiple processing units (process engine, PE).
  • arithmetic circuit 703 is a two-dimensional systolic array.
  • the arithmetic circuit 703 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 703 is a general-purpose matrix processor.
  • the operation circuit 703 fetches the data corresponding to the matrix B from the weight memory 702, and caches it in each PE in the operation circuit.
  • the operation circuit 703 takes the data of matrix A from the input memory 701 and performs matrix operation with the matrix B, and the obtained partial or final result of the matrix is stored in the accumulator 708 (accumulator).
  • the vector calculation unit 707 can perform further processing on the output of the operation circuit 703, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 707 can be used for network calculations of non-convolutional/non-fully connected layers (fully connected layers, FC) layers in neural networks, such as pooling (Pooling), batch normalization (Batch Normalization), and local response Normalization (Local Response Normalization), etc.
  • the vector computation unit can 707 store the processed output vectors to the unified buffer 706 .
  • the vector calculation unit 707 may apply a non-linear function to the output of the operation circuit 703, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 707 generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as an activation input to operational circuitry 703, eg, for use in subsequent layers in a neural network.
  • each layer in the convolutional neural network shown in FIG. 5 can be executed by 703 or 707 .
  • Algorithms of the computing module 511 and the training device 520 in FIG. 4 can be executed by 703 or 707 .
  • the unified memory 706 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 701 and/or unified memory 706 through the storage unit access controller (direct memory access controller, DMAC) 705, stores the weight data in the external memory into the weight memory 702, And store the data in the unified memory 706 into the external memory.
  • DMAC direct memory access controller
  • a bus interface unit (bus interface unit, BIU) 710 is configured to implement interaction between the main CPU, DMAC and instruction fetch memory 709 through the bus.
  • An instruction fetch buffer 709 connected to the controller 704 is used to store instructions used by the controller 704.
  • the controller 704 is configured to invoke instructions cached in the memory 709 to control the operation process of the computing accelerator.
  • the data here may be explanatory data, may be the input or output data of each layer in the convolutional neural network shown in FIG. 5 , or may be the input or Output Data.
  • the unified memory 706, the input memory 701, the weight memory 702, and the instruction fetch memory 709 are all on-chip (On-Chip) memories
  • the external memory is a memory outside the NPU
  • the external memory can be a double data rate synchronous dynamic random Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • FIG. 4 and FIG. 5 are completed by the cooperation of the main CPU and the NPU.
  • An embodiment of the present application provides a method for playing a video, which is applied to a process in which a user uses an electronic device to play a video, or a process in which a video service preprocesses a video.
  • the method for playing a video may be executed by the executing device 510 in FIG.
  • the client device 540 plays the video to be played according to the second playback speed obtained in S1002.
  • the method for playing a video may be executed by the client device 540 (such as an electronic device) in FIG. 4 .
  • the embodiment of the present application does not limit the execution subject of the method for playing video.
  • the method for playing video is performed by an electronic device, and the units in the electronic device can be deployed on one device. It can also be separately deployed in multiple devices, which is not limited in this embodiment of the present application.
  • the video playback solution provided by this application is called adaptive acceleration or AI acceleration, and electronic devices can provide users with ways to enable the adaptive acceleration function in various ways.
  • the adaptive acceleration function can be set to be enabled by the user, and when the user enables the function, the solution provided by this application will be executed for all videos to perform adaptive acceleration.
  • the adaptive acceleration function setting icon 701 is used to enable or disable adaptive acceleration according to user selection.
  • the adaptive acceleration function is turned on, when the user plays each video, he does not need to choose to enable this function separately, and can enjoy the acceleration effect of the combination of the adaptive acceleration function and the playback rate he chooses.
  • the adaptive acceleration function may be enabled by default, and the user may disable the adaptive acceleration function in the setting interface shown in FIG. 7 .
  • the adaptive acceleration function may be disabled by default, and the user may enable the adaptive acceleration function in the setting interface shown in FIG. 7 .
  • the user in the video playback software of the electronic device, can operate on the video playback interface to enable or disable the adaptive acceleration function for the currently playing video.
  • the current Playing videos executes the solution provided by this application for adaptive acceleration.
  • the video A is about to be played or is being played according to the user's operation.
  • the interface of FIG. 8 provides an intelligent adaptive (AI) acceleration icon 801 , and the icon 801 indicates on and off of the adaptive acceleration function through different colors (or fonts, symbols, background colors, etc.).
  • AI intelligent adaptive
  • the user performs a click operation on the icon 801 indicating that the adaptive acceleration function is turned off, and the electronic device responds to the click operation to turn on the adaptive acceleration function, and the icon 801 Displayed in a color that indicates Adaptive Acceleration is on, such as red.
  • This application makes no limitation on how to display the on and off of the adaptive acceleration function. Before or after the user chooses to enable the adaptive acceleration function, he can click the double-speed icon 802 in the interface to select the playback rate he wants, and the electronic device will use the solution provided by this application to combine Video content characteristics Adaptively accelerate and play the video.
  • the video A is about to be played or is being played according to the user's operation.
  • the user clicks on the double-speed icon 901 in the interface of (a) in FIG. interface the interface includes multiple options, and one option indicates a playback speed.
  • the options indicated by the playback speed and AI may also include other options not shown in Figure 9, such as 0.5X+ AI, 0.75X+AI, 1.25X+AI, 1.5X+AI, 1.75X+AI, 2.25X+AI, 2.5X+AI, 2.75X+AI, 3.0X+AI, etc.
  • This application does not limit this ) for selecting a constant magnification and performing adaptive acceleration based on the constant magnification.
  • the electronic device may only display the selection items indicating the playback speed and AI acceleration in (b) in FIG. 9 above in response to the click operation. Selections that only indicate constant acceleration are not displayed.
  • the present application does not make a limitation on how to display the options for indicating the adaptive acceleration function provided by the present application.
  • the electronic device executes the solution provided by this application, and the final playback speed of the video is determined based on the first playback speed selected by the user and the first information.
  • the above-mentioned adaptive acceleration function can be turned on or off by combining the solutions in Figure 7 and Figure 8 (or Figure 9), for example, the user turns on the adaptive acceleration function in the overall setting interface of the software , and then when playing a specific video, you can turn off the adaptive acceleration function on this interface to disable the adaptive acceleration function for the specific video, while still using the adaptive acceleration function for other videos.
  • the method for playing a video may include:
  • the electronic device acquires a first playback speed.
  • the electronic device acquires the first playback speed of the video to be played.
  • the video to be played is a video selected by the user on the electronic device and desired to be played. For example, if a user chooses to play a movie video in a certain online audio-visual playing software in the electronic device, the movie video is the video to be played.
  • the video to be played is any video provided by the video server.
  • the first playback speed is related to user playback settings.
  • the first playback speed may be a desired playback speed selected by the user, or a playback speed accustomed to by the user, or a default playback speed of the user may be obtained.
  • the embodiment of the present application does not specifically limit the manner of acquiring the first playback speed.
  • acquiring the first playback speed may specifically be implemented as: displaying the first interface according to the acquired first operation information.
  • the first interface includes at least two options, one option indicates a playback speed; second operation information is obtained; and the first playback speed is determined according to the second operation information and at least two options.
  • the playback speed indicated by the option selected by the user in the first interface through the second operation information is determined as the first playback speed.
  • the first operation information may be the operation information that triggers the display of the first interface, or the first operation may be the operation information that triggers the selection of the playback speed.
  • the first interface is an interface for selecting the playback speed.
  • the electronic device displays the first interface as shown in (b) in FIG. 11 .
  • the first interface includes A plurality of options (such as 2.0X, 1.5X, etc. in (b) in Figure 11 can also include other options not shown in Figure 11, such as 0.5X, 0.75X, 1.25X, 1.5X, 1.75X , 2.25X, 2.5X, 2.75X, 3.0X, etc., this application does not limit this).
  • this application does not limit this).
  • the option for example, 2.0X
  • acquiring the first playback speed may specifically be implemented as: displaying the first interface according to the acquired first operation information.
  • the first interface includes the first speed; according to the acquired second operation information, display the second interface, wherein the second interface includes the second speed; and determine the first playback speed according to the second speed.
  • the second speed in the second interface triggered by the user through the second operation information is determined as the first playback speed, realizing that the first playback speed is the playback speed selected by the user.
  • the first operation information in this implementation manner may be operation information for invoking a menu
  • the first interface is an interface presenting the current first speed in the current video playback interface.
  • the second operation information is operation information for adjusting the playback speed, and the second operation information may be selecting a second speed from multiple options, or the second operation information may be stepping to determine the second speed.
  • the electronic device is playing video A, and the interface does not include any operation items.
  • the user clicks any position on the screen (the first operation) to call up the operation item, and the electronic device displays the screen shown in (b) in Figure 12 according to the click operation.
  • the first interface is shown, and the first interface includes the current playback speed 1201 (first speed) of the video B played by the electronic device.
  • the user clicks on the current playback speed 1201 (second operation), and the electronic device displays the display as shown in (c) in Figure 12 according to the second operation.
  • the second interface shown is the playback speed selection interface.
  • the second interface includes a plurality of options, one option corresponds to a playback speed, and the second interface includes the second speed desired by the user (for example, "2.0X ”), the user performs a click operation on the desired “2.0X”, and the electronic device determines that the first playback speed is the second speed “2.0X” according to the click operation.
  • the second speed desired by the user for example, "2.0X ”
  • the user performs a click operation on the desired “2.0X”
  • the electronic device determines that the first playback speed is the second speed “2.0X” according to the click operation.
  • the user performs a click operation (second operation) on the current playback speed 1201 in the first interface as shown in (b) in FIG.
  • the second operation displays the second interface as shown in (d) in Figure 12, the second interface includes the second speed (such as "2.0X") selected by the user through one or more click operations, the electronic device
  • the first playback speed is determined to be the second speed "2.0X”.
  • the user clicks on the current playback speed 1201 (the second operation), and the electronic device presents the user with the In the second interface shown in (e), the second interface includes a progress bar 1202 for selecting the playback speed. The user drags the progress bar 1202 to select the second speed, and the electronic device determines that the first playback speed is the second speed.
  • obtaining the first playback speed may specifically be implemented as follows: according to the obtained first operation information, stop playing the previous video of the video and start playing the video; according to the playback speed of the previous video, , to determine the first playback speed.
  • the playback speed of the original video is determined as the first playback speed, realizing that the first playback speed is the playback speed that the user is used to.
  • the playback speed of the original video may be the first playback speed of the original video, or may be the second playback speed of the original video.
  • the electronic device is playing video A.
  • the user clicks on the switching icon 1301 or selection icon 1302 for switching to the next video (the first operation), and the electronic device stops playing video A and starts playing video B according to the clicking operation.
  • the electronic device may determine the playback speed when playing video A as the first playback speed when playing video B.
  • the electronic device is playing video A, and the interface also includes multiple peripheral videos.
  • the user performs a click operation (first operation) on the surrounding video D in the interface of FIG. 12 , and the electronic device stops playing video C and starts playing video D according to the click operation.
  • the electronic device may determine the playback speed when playing video C as the first playback speed when playing video D.
  • the first playback speed when the user does not select the first playback speed, the first playback speed may be a default video playback speed.
  • the electronic device acquires first information.
  • the first information is the first information of the video to be played.
  • the first information may include image information of the video and/or voice information of the video.
  • the image information may be a sequence of image frames
  • the voice information may be a sequence of voice frames.
  • the electronic device may extract the video to be played into image frames and voice frames, so as to obtain picture information and voice information of the video to be played.
  • the foregoing first information may further include content that the user is interested in.
  • the content that the user is interested in may include at least one of the following information: character description information of the video, content description information of the video, and content structure information of the video.
  • the character description information of the video is used to indicate the information of the character in the video that the user is interested in, and the character may be an actor or a role played or others.
  • the content description information of the video is used to indicate information about the plot or content in the video that the user is interested in, such as specific scenery, specific actions, and the like.
  • the content structure information of the video is used to indicate the chapters or positions in the video that the user is interested in, such as the chapter number or the content related to a specific chapter in a long video organized by chapters.
  • the electronic device may internally acquire the content that the user of the device is interested in, for example, determine the content that the user is interested in according to the user's viewing history.
  • the electronic device can also acquire the content that the user of the device is interested in from the outside. For example, if the user uses the same account to log in to multiple electronic devices, information such as historical viewing records of multiple electronic devices using the same account will be synchronized; or, The user can manually input the content he is interested in.
  • the electronic device can display several content that the user may be interested in for the user to choose, or the user can input the content he is interested in through text, voice, image, etc.
  • the above-mentioned specific way of obtaining the content of interest to the user is merely an example, and is not limited in this application.
  • the above-mentioned first information may further include first playback mode information of the video to be played, where the first playback mode information is associated with playback size information corresponding to the video.
  • the playback size information corresponding to the video may be used to indicate the display ratio or display size of the video, such as full screen display or small window display.
  • the above-mentioned first information may further include second play mode information of the video to be played, where the second play mode information is associated with the definition information of the video.
  • the definition information of the video may be used to indicate the playback resolution of the video, such as high-definition mode, Blu-ray mode, stream-saving mode, and the like.
  • the electronic device can internally read the content that the user is interested in, the first play mode information, and the second play mode information stored in the device.
  • the foregoing first information may further include motion state information of the electronic device.
  • the motion state information may be used to indicate the moving speed or pose information of the electronic device.
  • an electronic device can determine whether the device is moving or the speed of movement through a gyroscope, and determine the angle of the device through an orientation sensor.
  • the foregoing first information may further include noise intensity information of the electronic device.
  • the noise intensity information of the electronic equipment may be used to indicate the environmental interference degree of the electronic equipment.
  • the device may determine noise intensity information of the electronic device through a sound sensor.
  • the foregoing first information may further include user viewpoint information.
  • the user's viewpoint information may be used to indicate where the user's gaze falls when watching a video, which reflects the user's interest.
  • the electronic device may determine whether the human eye is watching the device by using the line-of-sight estimation/viewpoint estimation technology through the image captured by the camera.
  • the foregoing first information may further include connection status information of the audio playback device.
  • the audio playback device may be an earphone or a speaker.
  • the connection status information of the audio playback device is used to indicate whether the audio playback device is connected. When connected, the user has high sensitivity to video and voice and is not easily disturbed by the external environment; when not connected, the user has low sensitivity to video and voice and is easy to interfered by the external environment.
  • the foregoing first information may further include network status information.
  • the network status information is used to indicate the quality or type of the network that the electronic device is connected to. When the electronic device is connected to a high-quality network, the video playback is smooth, otherwise the video playback is stuck.
  • first information described in S1002 of the embodiment of the present application and the method of obtaining each first information are only examples and do not constitute specific limitations. In practical applications, the first information of S1002 can be configured according to actual needs The content of and the manner of obtaining each first information will not be described one by one in this embodiment of the present application.
  • the electronic device plays the video at a second playback speed, where the second playback speed is obtained based on the first playback speed and the first information.
  • the first duration for playing the video at the first playback speed is different from the second duration for playing the video at the second playback speed.
  • obtaining the second playback speed based on the first playback speed and the first information includes: determining a corresponding third playback speed according to each type of information in the first information; The third playback speed determines the second playback speed.
  • the second playback speed is obtained based on the first playback speed and the first information, including: determining a corresponding third playback speed according to each type of information in the first information; and part of the third playback speed to determine the second playback speed.
  • some of the third playback speeds are selected from all the third playback speeds. This screening can filter out the third playback speed that obviously does not meet the conditions, and can improve the efficiency of determining the second playback speed.
  • the embodiment of the present application does not limit the screening rules.
  • the third playback speed corresponding to one piece of first information may include the theoretical value of the playback magnification or the maximum allowable value of the playback magnification of each frame in the video to be played determined by the first information.
  • the playback magnification of a frame in the second playback speed is less than or equal to the playback magnification of the same frame in any third playback speed including the maximum allowable value of the playback magnification.
  • different degrees of the content in line with the user's interest correspond to different theoretical values of the playback magnification, or different intervals of the target moving speed correspond to different theoretical values of the playback magnification.
  • the higher the degree of conforming to the content that the user is interested in the smaller the corresponding theoretical value of the playback magnification; the faster the moving speed of the target, the smaller the corresponding theoretical value of the playback magnification.
  • the content of the third playback speed corresponding to the first information is also different, and examples are given below to illustrate:
  • the third playback speed corresponding to the image information may include a theoretical value of the playback magnification of each frame in the video to be played determined by the moving speed of the object in the image.
  • the playback magnification of images with fast target movements is low, and the playback magnification of images with slow target movements is high.
  • Specific playback magnifications corresponding to different moving speeds can be configured according to actual needs, which is not limited in this embodiment of the present application.
  • the third playback speed corresponding to the image information includes the third playback speed corresponding to the image information at multiple different playback speeds including the theoretical value of the playback magnification.
  • Example 2 the image information at one playback speed corresponds to a third playback speed including the theoretical value of the playback magnification, which is similar to that in Example 1 and will not be repeated here.
  • Example 2 multiple image information at different playback speeds can be obtained through interpolation, and then according to the description in Example 1, the third playback speed corresponding to multiple image information at different playback speeds can be obtained as The third playback speed corresponding to the image information.
  • the following embodiment one describes the specific implementation of the example 2, and details are not repeated here.
  • the third playback speed corresponding to the voice information may include the maximum allowable value of the playback magnification of each frame in the video to be played determined by the voice speed.
  • the third playback speed corresponding to the voice information may be a set of maximum allowable values of the playback magnification to ensure user experience.
  • the voice information of the video may be input into the voice understanding module to obtain a third playback speed corresponding to the voice information of the video.
  • the speech comprehension module may be the video speech comprehension network described in the foregoing embodiments, of course, it may also be other modules, which are not limited in this embodiment of the present application.
  • the speech rate in the speech frame of each image in the video can be predicted first, the highest human comfort speech rate tolerance value can be obtained according to statistics, and the highest playable multiple speed corresponding to each frame can be calculated.
  • the highest human comfortable speech rate tolerance value divided by the speech rate in the speech frame, minus the preset margin is the highest playable multiple speed corresponding to the speech frame.
  • the preset margin may be configured according to actual requirements, which is not limited in this embodiment of the present application.
  • the preset margin can be 0.
  • the third playback speed corresponding to the first information includes the maximum allowable value of the playback magnification of each frame in the video to be played determined by the external environment information.
  • the first information of the external environment may include any one of motion state information of the electronic device, noise intensity information of the electronic device, user viewpoint information, and connection state information of the audio playback device.
  • the electronic device may determine the third playback speed corresponding to the first information in the external environment information according to the policy corresponding to the first information.
  • the strategy may be the maximum allowable value of the playback magnification to ensure the user's visual experience.
  • the configured policy could be as follows:
  • the maximum allowable value of the playback magnification is lower. allowance.
  • a lower maximum allowed value of the playback magnification can be configured so that the user can watch the video clearly.
  • the user viewpoint information indicates that when the user focuses on the device, the user can watch the video played at a higher speed, so a higher maximum allowable value of the playback magnification can be configured.
  • the user viewpoint information indicates that the user of the device is not focusing on the device. , you can configure a lower maximum allowable value of the playback magnification.
  • the content of the strategy configured when acquiring the third playback speed corresponding to the first information of the external environment can be configured according to actual needs, and the embodiments of the present application are only described as examples and do not constitute specific limitations.
  • the first information of the external environment is fixed information, and the playback magnification in the third playback speed corresponding to the first information of the external environment at this moment may be the same.
  • the third playback speed corresponding to the first information may include the maximum allowable value of the playback magnification of each frame in the video to be played determined by the internal state information.
  • the first information of the internal state of the video may include any one of network state information, first play mode information, and second play mode information.
  • the internal state information can be input into the internal state understanding module to obtain the third playback speed corresponding to the internal state information.
  • the electronic device may determine the third playback speed corresponding to the first information according to the policy corresponding to the first information.
  • the configured policy could be as follows:
  • the user can watch videos played at a higher speed, so a higher maximum allowable value of the playback magnification can be configured; when the device is played externally, in order to ensure that the user can hear clearly, a lower maximum allowable value of the playback magnification can be configured .
  • the content of the policy configured when acquiring the third playback speed corresponding to the first information of the internal state can be configured according to actual needs, and this embodiment of the present application is only described as an example and does not constitute a specific limitation.
  • the first information of the internal state is fixed information, and the playback magnification in the third playback speed corresponding to the first information of the internal state at this moment may be the same.
  • the third playback speed corresponding to the first information may include the theoretical value of the playback magnification of each frame in the video to be played determined by the image content corresponding to the first information.
  • the user-personalized first information may include content that the user is interested in.
  • the higher the degree of satisfying the user's interest the lower the playback magnification.
  • Specific playback magnifications corresponding to different degrees of satisfying user interests can be configured according to actual needs, which is not limited in this embodiment of the present application.
  • the speed of playing the frame related to the content of interest to the user in the video at the second playback speed is not faster than the speed of playing the frame at the first playback speed, so as to realize the slow playback of the content of interest to the user and improve the user's visual experience.
  • the third playback speed corresponding to the personalized first information can achieve the effect that the user's favorite star part is played at a slow speed, while other plots are played at a fast speed.
  • the third playback speed corresponding to the first information may include the maximum allowable value of the playback magnification of each frame in the video to be played determined by the first information.
  • the personalized first information in Example 7 may include the user's age and the like.
  • Example 7 the older the user is, in order to prevent the user from getting dizzy watching the video, configure a lower maximum allowable value of playback magnification.
  • the user's personalized first information may also include whether the user accepts to adjust the variable speed magnification according to information such as the internal state of the device and the external environment state.
  • information such as the internal state of the device and the external environment state.
  • it may be determined according to the user's personalized first information Status, external environment status and other information to adjust the playback magnification.
  • the number of playback magnifications included in the third playback speed corresponding to a certain first information is less than the number of frames of the video to be played, it can be aligned by sampling, interpolation, etc., so that the playback magnifications included in the third playback speed The number is equal to the number of frames of the video to be played.
  • the playback magnifications included in the third playback speed are all effective playback magnifications.
  • the effective playback magnification may be a non-zero playback magnification, or the effective playback magnification may be a playback magnification greater than a threshold. If the acquired third playback speed corresponding to the first information includes an invalid playback magnification, it needs to be processed and updated to a valid playback magnification.
  • the processing can be taking the average value of two adjacent frames, or taking the playback magnification of the previous frame, or taking the playback magnification of the next frame, or taking the average value of the effective playback magnification in the third playback speed, or taking other first
  • the effective playback magnification of the position in the three playback speeds, or other processing methods, are not limited in this embodiment of the present application.
  • the playback magnification of this frame in the third playback speed corresponding to the determined voice information may be 0, and the playback magnification is an invalid playback magnification, which needs to be processed as a valid playback magnification.
  • the unit deployed in the cloud can obtain the third playback speed corresponding to the image information and the third playback speed corresponding to the voice information, and store them in the cloud corresponding to the video to be played, and the electronic device will play the video to be played.
  • the cloud may be the source server of the video to be played or other, which is not limited in this embodiment of the present application.
  • a unit in the electronic device may acquire the third playback speed corresponding to the first information other than image information and voice information, so as to meet real-time requirements.
  • FIG. 15 illustrates a manner of acquiring a third playback speed corresponding to different first information.
  • the third playback speed corresponding to the image information of the video to be played may include a theoretical value sequence of screen change speed playback magnification, a specified sequence of star playback magnification theoretical values, a theoretical value of interest plot playback magnification, or others.
  • the third playback speed corresponding to the voice information of the video to be played may include a theoretical value sequence of playback magnification only for speech speed, a maximum allowable value sequence of playback magnification considering background sound, a theoretical value sequence of playback magnification for interesting lines, or others.
  • the third playback speed corresponding to the image information and the third playback speed corresponding to the voice information can be completed through cloud computing.
  • the third playback speed corresponding to the first information of the internal state may be a sequence of the theoretical value of the playback magnification (the maximum allowable value of the playback magnification), and the third playback speed corresponding to the first information of the external environment may be the theoretical value of the playback magnification (the maximum allowable value of the playback magnification). value) sequence, the third playback speed corresponding to the user-personalized first information may be a sequence of theoretical playback magnification values (maximum allowable value of playback magnification), which may be acquired by the video playback device in real time.
  • the electronic device may determine the second playback speed according to the third playback speed and the first playback speed corresponding to each piece of first information.
  • the second playback speed includes the playback magnification of each frame in the video to be played, and the playback magnification of a frame in the second playback speed is less than or equal to the third value corresponding to any first information, including the maximum allowable value of the playback magnification.
  • the playback magnification of the same frame in the playback speed is less than or equal to the third value corresponding to any first information, including the maximum allowable value of the playback magnification.
  • each piece of first information corresponds to a third playback speed
  • a fusion operation is performed on each third playback speed acquired in S1003 to obtain the second playback speed.
  • the merging operation described in this application may include selecting a playback magnification between the maximum playback magnification and the minimum playback magnification of the same frame among different third playback speeds (participating in the fusion of the third playback speed).
  • the third playback speed participating in the fusion operation includes the maximum allowable value of the playback magnification
  • the above fusion operation may include: if different from the third playback speed (the third playback speed participating in the fusion), the minimum The playback magnification is the maximum allowable value of the playback magnification, and the maximum allowable value of the minimum playback magnification of the frame in the third playback speed participating in the fusion is selected; if it is different from the third playback speed (the third playback speed participating in the fusion), the same frame
  • the minimum playback magnification is the theoretical value of the playback magnification, and the calculation value of the minimum allowable maximum playback magnification value and the minimum playback magnification theoretical value of the frame in the third playback speed participating in the fusion is selected, and the calculated value can be an average value, or Maximum value, or minimum value, or other.
  • the third playback speed participating in the fusion operation does not include the maximum allowable value of the playback magnification, which can be understood as only including the theoretical value of the playback magnification.
  • the above-mentioned fusion operation may include: selecting a different third playback speed (participating in In the fused third playback speed), the calculated value of the maximum playback magnification theoretical value and the minimum playback magnification theoretical value of the same frame, the calculation value can be an average value, or a maximum value, or a minimum value, or other.
  • FIG. 16 illustrates a scene where the third playback speed corresponding to the video image information is fused with the third playback speed corresponding to the voice information.
  • the playback magnification of the voice frame without the voice part is an invalid playback magnification
  • the playback magnification of the voice part is an effective playback magnification
  • the voice frame positions of the non-voice part are fused
  • the playback magnification of is the playback magnification of this position in the third playback speed corresponding to the image information.
  • the difference between the playback duration of the video to be played at the second playback speed determined in S1003 and the playback duration of the video to be played at the first playback speed R0 is value, less than or equal to the threshold value.
  • the threshold can be configured according to actual requirements, which is not limited in this application.
  • each first piece of information corresponds to a third playback speed
  • the above-mentioned third playback speed for determining the second playback speed is called an alternative third playback speed (all third playback speeds or part of the third playback speed Three playback speeds)
  • the alternative third playback speed and the first playback speed determine the second playback speed, which can be specifically implemented as: performing a fusion operation on all alternative third playback speeds, or performing a fusion operation on all the third playback speeds that include the theoretical value of the playback magnification
  • the fusion operation is performed on the third playback speed of the candidate to obtain the fourth playback speed; the fusion operation is carried out to the third playback speed of the candidate including the maximum allowable value of the playback magnification to obtain the fifth playback speed; the fourth playback speed, the fifth playback speed , perform numerical optimization according to the first playback speed R 0 to obtain the second playback speed.
  • a numerical optimization method such as stochastic gradient descent may be used to perform numerical optimization to obtain the second playback speed.
  • the fourth playback speed and the fifth playback speed are numerically optimized according to R 0 to obtain the second playback speed.
  • the fourth playback speed, the fifth playback speed and R 0 are input into the objective function Perform numerical optimization, and use the playback speed that minimizes the objective function as the second playback speed.
  • the objective function is used to describe the degree of satisfaction of the playback speed obtained according to the fourth playback speed and the fifth playback speed to R 0 .
  • the fourth playback speed and the fifth playback speed are input into the objective function, and different playback speeds can be obtained by adjusting preset parameters in the objective function.
  • the third playback speed corresponding to the image information of the video includes the third playback speed corresponding to the image information of the video at multiple different playback speeds including the theoretical value of the playback magnification.
  • Playing speed in S1003, according to the third playing speed and the first playing speed, determine the second playing speed, specifically, it can be implemented as: the third playing speed corresponding to the image information of the video at a plurality of different playing speeds, and the other third playing speed respectively
  • the third playback speed corresponding to the information is fused, or the third playback speed corresponding to the image information of the video at multiple different playback speeds is respectively corresponding to the third playback speed including the theoretical value of the playback magnification corresponding to other first information.
  • the fusion operation is carried out at the speed to obtain a plurality of fourth playback speeds; the fusion operation is performed on each third playback speed including the maximum allowable value of the playback magnification to obtain the fifth playback speed; the fourth playback speed is respectively combined with the fifth playback speed and the
  • the first playback speed R 0 is input to the objective function, and the playback speed that minimizes the objective function is used as the second playback speed.
  • the above objective function can satisfy the following expression:
  • argmin s indicates that the second playback speed S is selected to minimize the function value; ⁇ , ⁇ , and ⁇ are preset parameters, which can be configured according to actual needs, and are not limited in this embodiment of the application.
  • is a custom hyperparameter. The larger the value, the more emphasis is placed on the overall magnification approaching R0 during the optimization process.
  • is a custom hyperparameter. The larger the value, the more emphasis is placed on the smoothness of the curve during the optimization process; parameter, the larger the value, the stricter the limit of the maximum allowable playback magnification on the final result.
  • different playback speeds can be obtained according to the fourth playback speed and the fifth playback speed by adjusting the value of the preset parameter.
  • E speed (S,V) is used to control the low acceleration segment close to the minimum playback speed R min specified by the user, R min can be provided by the user, or can be a default value, which can be 1.
  • S(t) is the playback magnification of frame t in the second playback speed.
  • is a preset parameter, and the smaller the value is, the closer the minimum playback magnification of the optimized final result is to R min .
  • E rate (S,R 0 ) is used to control the overall playback rate close to R 0 .
  • T is the total number of frames in the video to be played.
  • E smooth (S′,n) is used to control the smoothness of the second playback speed
  • n is the smoothing width of the objective function.
  • E A (S, A) is used to control the second playback speed not to exceed the playback magnification of the same frame in the fifth playback speed
  • A(t) is the playback magnification of the tth frame in the fifth playback speed.
  • FIG. 17 illustrates a scene where the third playback speed corresponding to the picture information of the video is fused with the third playback speed corresponding to the voice information.
  • the picture information of the video corresponds to a third playback speed.
  • the playback magnification of the voice frame without the voice part is an invalid playback magnification
  • the playback magnification of the voice part is an effective playback magnification.
  • the third playback speed corresponding to the picture information of the video is fused with the third playback speed corresponding to the voice information to obtain the fusion relative playback speed V (the fourth playback speed); the third playback speed that includes the theoretical value of the playback magnification has only the corresponding voice sequence
  • the third playback speed of which is used as the fusion absolute playback speed A (the fifth playback speed), V, A and R 0 are input into the objective function shown in the above formula 2, and the second playback speed obtained after optimization can be as shown in Figure 17 shown.
  • the second playback speed shown in FIG. 17 the picture at the position of the first segment is faster, and the low magnification in the second playback speed prevents dizziness; the second segment and the third segment ensure that the user can hear the voice clearly through the low magnification.
  • the second playback speed of the final video is determined based on the first playback speed related to the user’s playback device and the first information of the video.
  • the variable speed makes the overall playback time of the video close to the user's needs, but also considers the clear picture of the video, moderate speech speed and other factors that are strongly related to the viewing experience during playback, which improves the user's perception experience of adaptive variable-speed video playback.
  • the user watches two video X and video Y with the same total duration on the electronic device, and selects the same first playback speed, and video X includes many war plots, And video Y is a humanities-related documentary.
  • video X includes content of interest
  • the second playback speed of video X is less than the second playback speed of video Y. , the user will spend more time watching video X than video Y.
  • FIG. 18 illustrates a method for playing a video.
  • image information and voice information in the video are used to determine a second playback speed to play the video.
  • the image information is used to The third playback speed corresponding to the image information is determined based on the moving speed of the target, and the third playback speed corresponding to the image information includes the third playback speed corresponding to the image information at multiple different playback speeds including the theoretical value of the playback magnification .
  • the method for playing the video may specifically include:
  • the image information V is generated into K segments of image frame sequences with different playback speeds, and each segment of the image frame sequence is sequentially sent into the rate prediction module speednet in the manner of a sliding window, and sequentially sends segments of W frames, An image speed prediction sequence of K segment image frame sequence is obtained.
  • the playback speeds of the K segment image frame sequences are respectively X 0 ⁇ X K-1 , and X 0 ⁇ X K-1 can be configured according to actual needs, and this application does not limit their values.
  • the speed prediction module speednet is used to predict the moving speed of the target in each image, and the speed prediction module speednet may be a neural network.
  • the image speed prediction sequence is the output of the speed prediction module speednet, and K segment image frame sequences are input into the speed prediction module speednet to generate K image speed prediction sequences.
  • the image speed prediction sequence includes a scalar prediction value between 0 and 1, which represents the speed prediction of the target in the image frame. For example, 1 means that the target in the image is considered to be moving normally by the algorithm, and other values other than 1 indicate that there is a target in the image. The target is considered by the algorithm to be in fast motion.
  • the K image speed prediction sequences are interpolated into sequences of length F by an interpolation method to achieve sequence length alignment.
  • multiple thresholds are sequentially selected from a threshold set between 0 and 1 (for example, the threshold set can be ⁇ 0.1,...,0.9 ⁇ , and the threshold is used to define whether the fast motion , greater than the threshold is considered fast movement, less than the threshold is considered normal movement.
  • each binary sequence is multiplied by its corresponding playback speed (X 0 ⁇ X K- 1 ).
  • the corresponding maximum value is taken at the same frame position, and the K sequences become one sequence, which is called the playback magnification theoretical value sequence.
  • Each value in the playback magnification theoretical value sequence Represents the maximum possible playback magnification of its corresponding image frame that makes the classification network (speed prediction module speednet) judged as a non-accelerated state (the output value of the speed prediction module speednet is 1).
  • select a plurality of (for example, select 9) different thresholds and perform the above operations respectively, and form a total of multiple (for example, 9) playback magnification theoretical value sequences, and the playback magnification theoretical value sequence can be It is understood as the third playback speed corresponding to the image information in the foregoing embodiments.
  • the speech rate estimation module can be used for subtitle speed statistics.
  • the highest human-comfortable speech rate is placed in the speech rate of each speech frame in the speech rate sequence, and the maximum variable speed playback magnification of each speech frame is obtained, and the maximum variable speed magnification sequence of speech is obtained, and finally the speed is changed.
  • the playback magnification that can change the maximum speed cannot be exceeded.
  • the maximum allowable value sequence of playback magnification obtained in S4-A can be understood as the third playback speed corresponding to the voice information in the foregoing embodiments.
  • the values in the 9 fused relative variable speed sequences can be normalized to 0 to 1 respectively;
  • the maximum allowable value sequence and the playback magnification R 0 specified by the user are brought into the objective function for numerical optimization, and the optimization result that makes the objective function the lowest is selected as the final speed change sequence, that is, the second playback speed described in the foregoing embodiments.
  • the objective function may be the objective function shown in the foregoing formula 2 or other, which is not limited in this embodiment of the present application.
  • the normalization value used in the normalization process may be configured according to actual requirements, which is not limited in this embodiment of the present application.
  • the maximum value in the fused relative variable speed sequence may be selected to normalize the fused relative variable speed sequence.
  • Fig. 19 shows the comparison between the shift curve of the image-based adaptive shifting scheme and the adaptive shifting curve of the scheme of the present application.
  • the speed change curve of the adaptive speed change scheme based only on images has a very high playback magnification, and the voice information is almost lost, while the speed change curve of this application has a lower playback rate.
  • the magnification ensures that the user can hear the lines clearly and the voice is clear.
  • Fragment 2 where the picture shakes faster, the playback magnification of this application is lower than 2x, and the overall look and feel is more soothing and natural than the constant 2x.
  • Fig. 20 shows an adaptive speed change curve of the scheme of the present application.
  • the adaptive speed change curve provided by the application means that the playback magnification of this segment 3 is close to 1.0x, and this A line is inaudible (information lost) when played at constant 2x.
  • Fig. 21 shows an adaptive speed change curve of an application scheme.
  • the application For a scene in a fierce war movie with severe screen shaking, watching it at a constant 2x magnification has produced visual dizziness and discomfort, and tense lines cannot be heard clearly, as shown in the figure
  • the application For segment 4 shown in 21, the application’s playback magnification in each segment included in segment 4 is much lower than 2x, which greatly alleviates the feeling of dizziness and discomfort.
  • the above-mentioned electronic device includes corresponding hardware structures and/or software modules for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • the embodiment of the present application can divide the functional modules of the video playing device provided by the present application according to the above method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 22 shows a possible structural diagram of the device 220 for determining to play a video involved in the above embodiment.
  • the device 220 for determining to play a video may be a functional module or a chip.
  • the apparatus 220 for playing video may include: a first obtaining unit 2201 , a second obtaining unit 2202 , and a playing unit 2203 .
  • the first obtaining unit 2201 is used to execute the process S1001 in FIG. 10 ;
  • the second obtaining unit 2202 is used to execute the process S1002 in FIG. 10 ;
  • the playing unit 2203 is used to execute the process S1003 in FIG. 10 .
  • all relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding function module, and will not be repeated here.
  • FIG. 23 shows a possible structural diagram of the electronic device 230 involved in the above-mentioned embodiment.
  • the electronic device 230 may include: a processing module 2301 and a communication module 2302 .
  • the processing module 2301 is used to control and manage the actions of the electronic device 230, and the communication module 2302 is used to communicate with other devices.
  • the processing module 2301 is configured to execute any one of the processes S1001 to S1003 in FIG. 10 .
  • the electronic device 230 may also include a storage module 2303 for storing program codes and data of the electronic device 230 .
  • the processing module 2301 may be the processor 110 in the physical structure of the electronic device 100 shown in FIG. 2 , and may be a processor or a controller. For example, it may be a CPU, a general processor, DSP, ASIC, FPGA or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processing module 2301 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication module 2302 can be the mobile communication module 150 or the wireless communication module 160 in the physical structure of the electronic device 100 shown in FIG.
  • the above-mentioned communication interface may realize communication with other devices through the above-mentioned components having the function of sending and receiving.
  • the above-mentioned elements having the function of sending and receiving may be realized by an antenna and/or a radio frequency device.
  • the storage module 2303 may be the internal memory 121 in the physical structure of the electronic device 100 shown in FIG. 2 .
  • the device 220 for playing video or the electronic device 230 provided by the embodiment of the present application can be used to implement the corresponding functions in the methods implemented by the above-mentioned embodiments of the present application.
  • the specific technical details are not disclosed, please refer to the various embodiments of the present application.
  • a computer-readable storage medium on which instructions are stored, and when the instructions are executed, the method for playing a video in the foregoing method embodiments is executed.
  • a computer program product containing instructions is provided, and when the computer program product runs on a computer, the computer executes the method for playing video in the above method embodiments when executed.
  • An embodiment of the present application further provides a chip system, where the chip system includes a processor, configured to implement the technical method of the embodiment of the present invention.
  • the system-on-a-chip further includes a memory for storing necessary program instructions and/or data of the embodiments of the present invention.
  • the chip system further includes a memory, which is used for the processor to call the application program code stored in the memory.
  • the system-on-a-chip may consist of one or more chips, and may also include chips and other discrete devices, which is not specifically limited in this embodiment of the present application.
  • the steps of the methods or algorithms described in connection with the disclosure of this application can be implemented in the form of hardware, or can be implemented in the form of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in RAM, flash memory, ROM, erasable programmable read-only memory (erasable programmable ROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), registers, hard disk, removable hard disk, compact disc read-only (CD-ROM), or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC may be located in the core network interface device.
  • the processor and the storage medium may also exist in the core network interface device as discrete components.
  • the memory may be coupled to the processor, for example, the memory may exist independently and be connected to the processor through a bus. Memory can also be integrated with the processor.
  • the memory may be used to store application program codes for executing the technical solutions provided by the embodiments of the present application, and the execution is controlled by the processor.
  • the processor is used to execute the application program code stored in the memory, so as to realize the technical solution provided by the embodiment of the present application.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be Incorporation or may be integrated into another device, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may be one physical unit or multiple physical units, that is, it may be located in one place, or may be distributed to multiple different places . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium Among them, several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请公开一种播放视频的方法及装置,涉及视频处理领域,实现自适应变速的播放速度,符合用户的播放设置,提高了自适应变速播放视频的用户观感体验。该方案包括:获取第一播放速度;获取第一信息,第一信息包括视频的图像信息和/或视频的语音信息;以第二播放速度播放该视频,第二播放速度基于第一播放速度和第一信息得到。

Description

一种播放视频的方法及装置
本申请要求于2021年05月31日提交国家知识产权局、申请号为202110604488.9、发明名称为“一种播放视频的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及视频处理领域,尤其涉及一种播放视频的方法及装置。
背景技术
随着电子设备的广泛应用,在电子设备上观看视频(例如影视剧视频、在线教学视频等)已经成为人们日常生活不可或缺的项目。而当前,视频倍速播放已成为许多青年用户的使用习惯,经调查,如图1所示的调研结果,约70%的用户会使用倍速功能。
目前国内外视频播放平台普遍上线了恒定倍速功能,用户在播放界面选取恒定倍速功能并选择期望的播放速度,电子设备则根据用户选择的播放速度播放视频。但恒定倍速功能对于视频中较快节奏的片段可能导致画面变得眩晕、语音变得尖锐,对于较慢节奏的片段可能无法满足用户的观看速度需求,用户需要频繁手动切换播放速度以满足自己的需要。
基于恒定倍速功能的不足,业界提出了众多视频自适应变速的方案,例如基于大数据统计、基于画面内容、基于语音速度或基于语音和画面质量进行视频自适应变速的方案。其中,基于大数据统计的视频倍速方案,需要大量用户历史数据作为支撑,对于新上线的视频,无法冷启动自适应变速。基于视频的图像内容、或语音速度、或画面内容和语音速度作为参考信息的视频倍速播放方案,仅仅根据参考信息确定最终播放视频播放速度,最终的播放速度受到参考信息的绝对影响,没有考虑用户的个性化需求。
因此,当前的视频自适应变速方案仍然需要改进,以同时满足用户的个性化需求。
发明内容
本申请提供的播放视频的方法,实现自适应变速的播放速度,同时考虑用户的播放设置,提高了自适应变速播放视频的用户观感体验。
为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种播放视频的方法,应用于电子设备,该方法可以包括:获取第一播放速度;获取第一信息,第一信息包括视频的图像信息和/或视频的语音信息;以第二播放速度播放该视频,第二播放速度基于第一播放速度和第一信息得到。
通过本申请提供的播放视频的方法,基于用户个性化需求相关的第一播放速度以及第一信息确定最终播放视频的第二播放速度,这样一来,兼顾了视频内容以及用户需求实现自适应变速,在视频整体播放时长靠近用户需求的同时,根据视频的画面内容、音速等信息实现自适应变速,提高了用户的观感体验。
在一种可能的实现方式中,上述第一播放速度与用户的播放设置相关,以满足用 户的个性化需求。
其中,视频的播放速度,可以表示播放视频中每一帧图像的播放倍率或播放时长,也可以表示以该速度播放整个视频所需要的时长与以1倍速播放该视频所需要的时长的比值。以第二播放速度播放一个视频时,第二播放速度可以表示一个速度序列,以对应视频中每一帧的播放速度,不同帧的播放速度可以相同,也可以不同。
在一种可能的实现方式中,采用第一播放速度播放该视频的第一时长,与采用第二播放速度播放该视频的第二时长不同,第一播放速度可以是用户设定的固定速度播放视频的速度,其体现了用户希望在多长的时间内观看这个视频,采用第二播放速度播放该视频时,由于同时考虑了视频的画面内容、语速等信息,播放速度可能会根据视频内容变化,因而播放整个视频的第二时长会与第一时长不同。
在一种可能的实现方式中,第二时长与采用用户指定的播放速度R 0播放该视频的时长的差值,小于或等于门限值,以使得自适应的变速满足了用户对整体播放时长的需求,提高了用户体验。
在一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,显示第一界面。其中,第一界面包括至少两个选择项,一个选择项指示一个播放速度;获取第二操作信息;根据第二操作信息和至少两个选择项,确定第一播放速度。在该实现方式中,将用户通过第二操作信息在第一界面中选取的选择项指示的播放速度,确定为第一播放速度,实现了第一播放速度为用户选择的播放速度。
其中,第一操作信息可以为触发显示第一界面的操作信息,或者,第一操作可以为触发进行播放速度选择的操作信息。第一界面显示至少两个指示不同播放速度的选择项,以使得用户可以在此界面选择播放速度。
在另一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,显示第一界面。其中,第一界面包括第一速度;根据获取的第二操作信息,显示第二界面,其中,第二界面包括第二速度;根据第二速度,确定第一播放速度。在该实现方式中,将用户通过第二操作信息触发的第二界面中的第二速度,确定为第一播放速度,实现了第一播放速度为用户选择的播放速度。
其中,该实现方式中的第一操作信息,可以为调起菜单的操作信息,第一界面为视频当前播放界面中呈现了当前的第一速度的界面。第二操作信息为调整播放速度的操作信息,第二操作信息可以为从多个选项中选取第二速度,或者,第二操作信息可以为步进确定第二速度。
在另一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,停止播放该视频的前一视频,开始播放该视频;根据该前一视频的播放速度,确定第一播放速度。在该实现方式中,将用户通过第一操作信息触发的切换播放目标视频时,原播放视频的播放速度确定为第一播放速度,实现了第一播放速度为用户习惯的播放速度。
其中,原播放视频的播放速度可以为原视频时第一播放速度,或者,也可以为原视频的第二播放速度。
在另一种可能的实现方式中,第二播放速度基于第一播放速度和第一信息得到包括:根据第一信息中的每一种信息确定对应的第三播放速度;根据第一播放速度和所 有第三播放速度,确定第二播放速度。通过多种第一信息,分别确定每一种信息对应的第三播放速度,再根据第三播放速度和第一播放速度确定第二播放速度,使得确定的第二播放速度既符合用户设置也符合待播放视频的各个第一信息上的特征,可以很好的提高用户的观感体验。
在另一种可能的实现方式中,第二播放速度基于第一播放速度和第一信息得到,包括:根据第一信息中的每一种信息确定对应的第三播放速度;根据第一播放速度和部分第三播放速度,确定第二播放速度。通过多种第一信息,分别确定每一种信息对应的第三播放速度,再根据筛选出的部分第三播放速度和第一播放速度确定第二播放速度,使得确定的第二播放速度既符合用户设置也符合待播放视频的各个第一信息上的特征,可以很好的提高用户的观感体验。该筛选可以将明显不符合条件的第三播放速度过滤掉,可以提高确定第二播放速度的效率。
在另一种可能的实现方式中,一个第一信息对应的第三播放速度可以包括该第一信息决定的对待播放视频中每一帧的播放倍率理论值或播放倍率最大允许值。第二播放速度中对一帧的播放倍率小于或等于任一包含播放倍率最大允许值的第三播放速度中对同一帧的播放倍率。通过多个第一信息确定第三播放速度,并且将多个第三播放速度融合,可以使得最终融合得到的第二播放速度体现了每个第一信息在播放倍率中的需求,也可以牵制某个第一信息确定的播放倍率过高而降低其他第一信息的观感体验,为用户提供信息完整、音画舒适的观感体验。
在另一种可能的实现方式中,每个第一信息对应一个第三播放速度,将上述确定第二播放速度的第三播放速度称之为备选第三播放速度(所有第三播放速度或部分第三播放速度),根据备选第三播放速度以及第一播放速度,确定第二播放速度,具体可以实现为:对所有备选第三播放速度进行融合操作,或者,对包括播放倍率理论值的备选第三播放速度进行融合操作,得到第四播放速度;对包括播放倍率最大允许值的备选第三播放速度进行融合操作,得到第五播放速度;将第四播放速度、第五播放速度,按照第一播放速度R 0进行数值优化,得到第二播放速度。该实现方式提供了一种具体的获取第二播放速度的方法,先通过融合操作将第三播放速度进行融合,以提高播放速度中播放倍率理论值以及播放倍率最大允许值的准确性以及有效性,然后再按照用户设置相关的第一播放速度R 0进行数值优化,使得最终得到的第二播放速度不仅可以满足用户的期望变速倍率,也具有高的准确性和有效性。
在另一种可能的实现方式中,将第四播放速度、第五播放速度,按照第一播放速度R 0进行数值优化,得到第二播放速度,具体可以实现为:将第四播放速度、第五播放速度以及R 0,输入目标函数进行数值优化,将使得该目标函数最小的播放速度作为第二播放速度。其中,该目标函数用于描述根据第四播放速度、第五播放速度得到的播放速度对R 0的满足程度。目标函数值越小,根据第四播放速度、第五播放速度得到的播放速度越接近R 0通过目标函数实现数值优化,提高了方案的可行性以及准确度,可以保证确定的第二播放速度最优。
其中,可以通过调整目标函数的预设参数,根据第四播放速度、第五播放速度得到不同的播放速度。
在另一种可能的实现方式中,视频的图像信息对应的第三播放速度,包括视频的 图像信息在多个不同播放速度下对应的包括播放理论值的第三播放速度,根据备选第三播放速度以及第一播放速度,获取第二播放速度,具体可以实现为:将视频的图像信息在多个不同播放速度下对应的第三播放速度,分别与其他第一信息对应的第三播放速度进行融合操作,或者,将视频的图像信息在多个不同播放速度下对应的第三播放速度,分别与其他第一信息对应的包括播放倍率理论值的第三播放速度进行融合操作,得到多个第四播放速度;对包括播放倍率最大允许值的每个第三播放速度进行融合操作,得到第五播放速度;将第四播放速度、分别与第五播放速度以及第一播放速度R 0,输入目标函数,将使得该目标函数最小的播放速度作为第二播放速度。该目标函数用于描述根据第四播放速度、第五播放速度得到的播放速度对R 0的满足程度。目标函数值越小,根据第四播放速度、第五播放速度得到的播放速度越接近R 0。该实现方式提供了一种具体的获取第二播放速度的方法,通过预先配置多个不同播放速度,得到视频的图像信息对应的多个第三播放速度,进而融合得到多个第四播放速度,之后代入目标函数获取最终的第二播放速度,方案实现简单高效,提高了处理效率以及速度。
在另一种可能的实现方式中,上述目标函数可以满足如下表达式:argmin sE speed(S,V)+βE rate(S,R 0)+αE smooth(S′,n)+δE A(S,A)。其中:
argmin s表示选取第二播放速度S使函数值最低,α、β、δ为预设参数。
E speed(S,V)用于控制低加速段接近用户指定的最低播放倍率R min
Figure PCTCN2022094784-appb-000001
为第四播放速度中第t帧的归一化播放倍率,S(t)为第二播放速度中第t帧的播放倍率;γ为预设参数。
E rate(S,R 0)用于控制整体变速倍率接近R 0
Figure PCTCN2022094784-appb-000002
T为待播放视频中的画面总帧数。
E smooth(S′,n)用于控制第二播放速度的平滑性,
Figure PCTCN2022094784-appb-000003
n为目标函数的平滑宽度。
E A(S,A)用于控制第二播放速度不超过第五播放速度中相同帧的播放倍率,
Figure PCTCN2022094784-appb-000004
ifA(t)>0且S(t)>A(t);A(t)为第五播放速度中第t帧的播放倍率。
在另一种可能的实现方式中,上述融合操作包括选取不同第三播放速度中,相同帧的最大播放倍率与最小的播放倍率之间的播放倍率,以实现将多个第三播放速度融合为一个播放速度。
在另一种可能的实现方式中,参与融合操作的第三播放速度包括播放倍率最大允许值,上述融合操作包括:若不同第三播放速度中,相同帧的最小的播放倍率为播放倍率最大允许值,选取参与融合的第三播放速度中该帧的最小的播放倍率最大允许值;若不同第三播放速度中,相同帧的最小的播放倍率为播放倍率理论值,选取参与融合的第三播放速度中该帧的最小的播放倍率最大允许值与最小的播放倍率的计算值,该计算值可以为平均值,或者最大值,或者最小值,或者其他。
在另一种可能的实现方式中,参与融合操作的第三播放速度不包括播放倍率最大 允许值,上述融合操作为选取不同第三播放速度中,相同帧的最大播放倍率理论值与最小的播放倍率理论值的计算值,该计算值可以为平均值,或者最大值,或者最小值,或者其他。
在另一种可能的实现方式中,上述第一信息还可以包括用户感兴趣的内容。其中,用户感兴趣的内容可以包括以下信息中的至少一种:视频的人物描述信息、视频的内容描述信息、视频的内容结构信息。相应的,用户感兴趣的内容对应的第三播放速度中可以包括对待播放视频中每一帧的播放倍率理论值。
其中,视频的人物描述信息用于指示用户对视频中感兴趣的人物的信息,该人物可以为演员或者扮演的角色或者其他。视频的内容描述信息用于指示用户对视频中感兴趣的情节或内容的信息。视频的内容结构信息用于指示用户对视频中感兴趣的章节或位置的信息。
在另一种可能的实现方式中,以第二播放速度播放视频中与用户感兴趣的内容相关的帧时的速度不快于以第一播放速度中播放该帧的速度,以实现用户感兴趣的内容慢速播放,提高用户观感体验。
另一种可能的实现方式中,上述第一信息还可以包括该待播放视频的第一播放模式信息,其中,该第一播放模式信息与视频所对应的播放尺寸信息相关联。其中,视频所对应的播放尺寸信息,可以用于指示视频的显示比例或显示尺寸,例如全屏显示或小窗口显示等。相应的,第一播放模式信息用于决定对视频的播放速度的理论值,播放尺寸大时可以采用较大的播放速度,播放尺寸小时可以采用较小的播放速度,以使得用户可以清晰观看视频内容。
另一种可能的实现方式中,上述第一信息还可以包括待播放的视频的第二播放模式信息,其中,该第二播放模式信息与视频的清晰度信息相关联。其中,视频的清晰度信息,可以用于指示视频播放分辨率,例如高清模式、蓝光模式、省流模式等。相应的,第二播放模式信息用于决定对视频的播放速度的理论值,视频的清晰度高时可以采用较大的播放速度,视频的清晰度低时可以采用较小的播放速度,以使得用户可以清晰观看视频内容。
另一种可能的实现方式中,上述第一信息还可以包括电子设备的运动状态信息。其中,运动状态信息,可以用于指示电子设备的移动速度或相对于用户的位姿等。相应的,运动状态信息用于决定对视频的播放速度的理论值,电子设备移动速度高时或者与处于不便于用户观看的角度时可以采用较小的播放速度,电子设备移动速度低时或者处于便于用户观看的角度时可以采用较大的播放速度,以使得用户可以清晰观看视频内容。
另一种可能的实现方式中,上述第一信息还可以包括电子设备的噪声强度信息。其中,电子设备的噪声强度信息,可以用于指示电子设备的环境干扰程度。相应的,电子设备的噪声强度信息用于决定对视频的播放速度的理论值,电子设备的噪声强度高时可以采用较小的播放速度,电子设备的噪声强度低时可以采用较大的播放速度,以使得用户可以清晰听清视频中的语音。
另一种可能的实现方式中,上述第一信息还可以包括用户视点信息。其中,用户视点信息,可以用于指示用户观看视频时视线的落点,体现了用户的兴趣。相应的, 用户视点信息用于决定对视频的播放速度的理论值,用户视点信息指示用户长时间观看视频时可以采用较小的播放速度,用户视点信息指示用户存在未观看视频时可以采用较大的播放速度,以使得播放视频的速度满足用户的视点信息。
另一种可能的实现方式中,上述第一信息还可以包括音频播放设备的连接状态信息。其中,音频播放设备可以为耳机或者音箱。音频播放设备的连接状态信息用于指示音频播放设备是否连接,当连接时用户对视频语音的敏感度高,不易受到外界环境的干扰;当未连接时,用户对视频语音的敏感度低,容易受到外界环境的干扰。相应的,音频播放设备的连接状态信息用于决定对视频的播放速度的理论值,音频播放设备的连接状态信息为连接时,时可以采用较大的播放速度,音频播放设备的连接状态信息为未连接时可以采用较小的播放速度,以使得用户可以清晰听清视频中的语音。
另一种可能的实现方式中,上述第一信息还可以包括网络状态信息。网络状态信息用于指示电子设备接入的网络的质量或类型,当电子设备接入高质量的网络时,播放视频顺畅,否则播放视频卡顿。相应的,网络状态信息用于决定对视频的播放速度的理论值,网络状态信息指示电子设备连接的网络质量高时,可以采用较大的播放速度,网络状态信息指示电子设备连接的网络质量低时,可以采用较小的播放速度,以避免用户观看视频时出现卡顿的情况。
在另一种可能的实现方式中,上述第一信息还可以包括环境信息,该环境信息可以包括播放待播放视频的设备的内部状态信息或播放待播放视频的设备的外部环境信息。外部环境信息对应的第三播放速度包括由该外部环境信息决定的待播放视频中每一帧的播放倍率最大允许值;内部状态信息对应的第三播放速度包括由该内部状态信息决定的待播放视频中每一帧的播放倍率最大允许值。
在另一种可能的实现方式中,视频的图像信息对应的第三播放速度包括由画面中目标运动速度决定的视频中每一帧的播放倍率理论值。视频的语音信息对应的第三播放速度包括由语音速度决定的待播放视频中每一帧的播放倍率最大允许值。
在另一种可能的实现方式中,上述播放倍率理论值可以包括:符合用户感兴趣内容的不同程度对应不同的播放倍率理论值,或者,目标移动速度处于不同区间对应不同的播放倍率理论值。其中,符合用户感兴趣内容的程度越高,对应的播放倍率理论值越小;目标移动速度越快,对应的播放倍率理论值越小。以保证确定的播放速度满足用户的兴趣,或,保证确定的第二播放速度可以保证用户的视觉感受。
在另一种可能的实现方式中,本申请提供的播放视频的方法还可以包括:将第二播放速度与该待播放视频对应存储,以便于其他设备获取该待播放视频以及该第二播放速度,采用该第二播放速度播放该待播放视频。
第二方面,提供一种播放视频的装置,该装置可以包括第一获取单元、第二获取单元以及播放单元。其中:
第一获取单元,用于用于获取第一播放速度。可选的,第一播放速度与用户播放设置相关。
第二获取单元,用于用于获取第一信息,第一信息包括视频的图像信息和/或视频的语音信息。
播放单元,用于以第二播放速度播放视频,第二播放速度基于第一播放速度和第 一信息得到。
通过本申请提供的播放视频的装置,基于用户个性化设置相关的第一播放速度以及第一信息确定最终播放视频的第二播放速度,这样一来,兼顾了用户对视频整体播放时长的需求和视频的画面、语速等内容的播放观感,提高了用户观看视频时的体验。
需要说明的是,第二方面的各个单元具体实现同第一方面的方法描述,这里不再赘述。
第三方面,本申请提供了一种电子设备,该电子设备可以实现上述所述第一方面描述的方法示例中的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。该电子设备可以以芯片的产品形态存在。
在一种可能的实现方式中,该电子设备的结构中包括处理器和收发器,该处理器被配置为支持该电子设备执行上述方法中相应的功能。该收发器用于支持该电子设备与其他设备之间的通信。该电子设备还可以包括存储器,该存储器用于与处理器耦合,其保存该电子设备必要的程序指令和数据。
第四方面,提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或其任一种可能的实现方式提供的播放视频的方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或其任一种可能的实现方式提供的播放视频的方法。
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述方法中相应的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
第七方面,本申请提供一种播放视频的系统,该系统包括第一设备,该第一设备可以为第三方面描述的电子设备,该电子设备具备上述第一方面以及任一可能的实现方式的功能。
在一种可能的实现方式中,该播放视频的系统还可以包括第二设备,该第二设备用于从第一设备获取待播放视频的第二播放速度,采用第二播放速度播放待播放视频。
其中,需要说明的是,上述各个方面中的任意一个方面的各种可能的实现方式,在方案不矛盾的前提下,均可以进行组合。
附图说明
图1为一种调研结果示意图;
图2为本申请实施例提供的一种电子设备的结构示意图;
图3为本申请实施例提供的一种电子设备的软件结构示意图;
图4为本申请实施例提供的一种系统架构示意图;
图5为本申请实施例提供的一种卷积神经网络(convolutional neuron network,CNN)网络的结构示意图;
图6为本申请实施例提供的一种芯片的硬件结构示意图;
图7为本申请实施例提供的电子设备的设置界面示意图;
图8为本申请实施例提供的播放界面示意图;
图9为本申请实施例提供的播放界面示意图;
图10为本申请实施例提供的一种播放视频的方法的流程示意图;
图11为本申请实施例提供的播放界面示意图;
图12为本申请实施例提供的播放界面示意图;
图13为本申请实施例提供的播放界面示意图;
图14为本申请实施例提供的播放界面示意图;
图15为本申请实施例提供的一种不同第一信息对应的第三播放速度的获取方式示意图;
图16为本申请实施例提供的一种播放速度进行融合的场景示意图;
图17为本申请实施例提供的另一种播放速度进行融合的场景示意图;
图18为本申请实施例提供的一种播放视频的方法的流程示意图;
图19为本申请实施例提供的一种变速曲线对比示意图;
图20为本申请实施例提供的一种自适应变速曲线示意图;
图21为本申请实施例提供的另一种自适应变速曲线;
图22为本申请实施例提供的一种播放视频的装置的结构示意图;
图23为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
在本申请实施例中,为了便于清楚描述本申请实施例的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。该“第一”、第二”描述的技术特征间无先后顺序或者大小顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
在本申请实施例中,至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本申请不做限制。
此外,本申请实施例描述的网络架构以及场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
在描述本申请的实施例之前,此处先对本申请涉及的名词统一进行解释说明,后续不再一一进行说明。
视频,是指动态的连续图像序列以及与图像对应的语音序列。
视频的图像信息,是指视频中包括的图像(也称为画面)的序列。视频的图像信息是一个一个静态画面的集合。
视频的语音信息,是指视频中包括的语音序列,将视频中每帧图像对应的语音片段,作为一个语音帧,视频中所有语音帧构成该视频的语音信息。语音帧与图像帧一一对应,当播放某一图像帧时,其对应的语音帧同步播放。
图像帧的播放速度,可以表示播放视频时,单位时间内播放的帧数量(帧率),也可以表示播放一帧的时长。以一个播放速度播放一个视频时,播放该视频中每一帧图像的时长可以相同,也可以不同。根据人的眼睛对光的闪烁感知能力,可以得到视频的原始播放帧率。视频的原始播放帧率是视频的属性参数,视频可以默认按照原始播放帧率播放。
视频播放速度,是指视频中每一帧(图像帧和语音帧)的播放速度(或帧率)组成的播放速度序列,可以是恒定的,也可以变化。
为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍:
近年来,电子设备(例如终端)的功能越来越丰富,给用户带来了更好的使用体验。例如,电子设备可以观看在线视频,例如影视节目视频、在线教育视频、监控视频等,并在观看过程中,向用户提供倍速功能,实现用户按照个人喜好增加或降低视频的播放速度,改变视频的播放时长。
其中,电子设备可以是智能手机、平板电脑、可穿戴设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备等。本申请对电子设备的具体形态不予限定。可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。
在本申请中,电子设备的结构可以如图2所示。如图2所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本实施例示意的结构并不构成对电子设备100的具体限定。在另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器 (neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。例如,在本申请中,处理器110可以获取第一播放速度,可选的,第一播放速度与用户播放设置相关;获取第一信息,第一信息包括视频的图像信息和/或视频的语音信息;以及基于第一播放速度和第一信息得到第二播放速度。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理 模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oled,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100的显示屏194上可以显示一系列图形用户界面(graphical user interface,GUI),这些GUI都是该电子设备100的主屏幕。一般来说,电子设备100的显示屏194的尺寸是固定的,只能在该电子设备100的显示屏194中显示有限的控件。控件是一种GUI元素,它是一种软件组件,包含在应用程序中,控制着该应用程序处理的所有数据以及关于这些数据的交互操作,用户可以通过直接操作(direct manipulation)来与控件交互,从而对应用程序的有关信息进行读取或者编辑。一般而言,控件可以包括图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。例如,在本实施例中,处理器110可以通过执行存储在内部存储器121中的指令,执行本申请提供的播放视频的方法,获取电子设备100播放的视频的播放速度。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如视频中语音播放、音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以 是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口,或者其他。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。陀螺仪传感器180B还可以判断电子设备100是否处于移动状态。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。还可以用于判断电子设备100是否处于移动状态。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
另外,在上述部件之上,运行有操作系统。例如苹果公司所开发的iOS操作系统,谷歌公司所开发的Android开源操作系统,微软公司所开发的Windows操作系统等。在该操作系统上可以安装运行应用程序。
电子设备100的操作系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构或其他架构。本申请实施例以分层架构的Android系统为例,示例性 说明电子设备100的软件结构。
图3是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。如图3所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。例如,在拍照时,相机应用可以访问应用程序框架层提供的相机接口管理服务。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图3所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:动态图像专家组(moving picture experts group,MPEG)4,H.264,MP3,高级音频编码(advanced audio coding,AAC),自适应多速率(adaptive multi rate,AMR),联合摄影制图专家组(joint photo graphic experts group,JPEG),可移植网络图形格式(portable network graphic format,PNG)等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
二维(2dimensions,2D)图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
需要说明的是,本申请实施例虽然以
Figure PCTCN2022094784-appb-000005
系统为例进行说明,但是其基本原理同样适用于基于
Figure PCTCN2022094784-appb-000006
Figure PCTCN2022094784-appb-000007
等操作系统的电子设备。
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的播放视频的方法,可以应用在用户使用电子设备播放视频的场景。或者,本申请实施例提供的播放视频的方法还可以应用在视频服务器预处理视频的场景。视频服务器可向其提供的视频配置自适应的播放速度,在其他设备获取视频时提供配置的自适应的播放速度,用于其他设备可以选择按照该自适应的播放速度播放获取的视频。
下面结合图2和用户使用电子设备播放视频的场景,示例性说明电子设备100软件以及硬件的工作流程。
示例性的,在电子设备100播放视频的界面,电子设备100触摸传感器180K接收到用户对播放速度“2.0X”的触摸操作,上报给处理器110,使得处理器110响应于上述触摸操作,将电子设备100当前播放的视频,按照原始播放帧率的两倍帧率,在显示屏194上显示。在该示例中,用于通过触摸操作,选取固定两倍帧率播放视频,即前述的恒定倍速。恒定倍速功能对于视频中较快节奏的片段可能导致画面变得眩晕、语音变得尖锐,对于较慢节奏的片段可能无法满足用户的观看速度需求,用户需要频繁手动切换倍速以满足自己的需要。因此,众多视频自适应变速的方案应运而生。
基于大数据统计、基于画面内容、基于语音速度或基于语音和画面质量进行视频自适应变速的方案。其中,基于大数据统计的视频倍速方案,需要大量用户历史数据作为支撑,对于新上线的视频,无法冷启动自适应变速。
基于画面内容的视频倍速播放方案,对于特定的关注画面信息的场景(例如安防、体育等场景)有一定应用价值,但对于影音场景(对画面、语音都有信息摄入及观感需求)应用价值不高。
基于语音速度的视频倍速播放方案,仅按照人能听懂的最快吐词速度自适应调整视频变速倍率,无法考虑视频中画面的观感体验。
基于语音速度和画面质量的视频倍速播放方案,认为视频中声音过于嘈杂的片段以及画面晃动较大的片段是次要的,可以快加速,其他片段则慢加速,但该方案虽然考虑了声音和画面两种信息,但被快加速略过的语音和画面几乎无信息量,应用场景很有限,对于大多数影视作品,音画质量都较高,该方案并不能有效变速。
并且,上述自适应的视频倍速播放方案,最终的播放速度受到参考信息(视频的 图像内容、或语音速度、或画面内容和语音速度)的绝对影响,并未考虑用户播放设置的相关的播放速度,因此,用户的观感体验有待提高。
基于此,本申请提供了一种播放视频的方法,具体包括:以用户播放设置相关的第一播放速度,以及视频的图像和/或语音的信息,确定播放视频的第二播放速度。这样一来,兼顾了视频内容以及用户需求实现自适应变速,提高了自适应变速播放视频的用户观感体验。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的播放视频的方法,涉及视频的处理,具体可以应用数据训练、机器学习、深度学习等数据处理方法,对训练数据(如视频中的第一信息)进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的视频理解网络;并且,本申请实施例提供的播放视频的方法可以运用上述训练好的视频理解网络(视频图像理解网络、视频语音理解网络),将输入数据(如本申请中的待播放视频)输入到所述训练好的视频理解网络中,得到输出数据(第一信息对应的第三播放速度)。需要说明的是,本申请实施例提供的视频图像理解网络或视频语音理解网络的训练方法和视频播放方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)神经网络(neural network,NN)
神经网络是机器学习模型,是一种模拟人脑的神经网络以能够实现类人工智能的机器学习技术。可以根据实际需求配置神经网络的输入及输出,并通过样本数据对神经网络训练,以使得其输出与样本数据对应的真实输出的误差最小。神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2022094784-appb-000008
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是 如下线性关系表达式:
Figure PCTCN2022094784-appb-000009
其中,
Figure PCTCN2022094784-appb-000010
是输入向量,
Figure PCTCN2022094784-appb-000011
是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022094784-appb-000012
经过如此简单的操作得到输出向量
Figure PCTCN2022094784-appb-000013
由于DNN层数多,则系数W和偏移向量b的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022094784-appb-000014
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022094784-appb-000015
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
CNN是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上 述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法Back propagation Through Time(BPTT)。
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(6)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
下面介绍本申请实施例提供的系统架构。
参见附图4,本发明实施例提供了一种系统架构500。如所述系统架构500所示,数据采集设备560用于采集训练数据,本申请实施例中训练数据包括:视频的第一信息(第一信息包括视频的图像信息和/或视频的语音信息);并将训练数据存入数据库530,训练设备520基于数据库530中维护的训练数据训练得到目标模型/规则501,该目标模型/规则501可以为本申请实施例中描述的视频理解网络,即将待播放视频输入该目标模型/规则501,即可得到待播放视频的第一信息对应的第三播放速度,该第三播放速度用于描述待播放视频中每个帧的播放倍率。在本申请提供的实施例中,该视频理解网络络是通过训练得到的。需要说明的是,在实际的应用中,所述数据库530中维护的训练数据不一定都来自于数据采集设备560的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备520也不一定完全基于数据库530维护的训练数据进行目标模型/规则501的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备520训练得到的目标模型/规则501可以应用于不同的系统或设备中,如应用于图4所示的执行设备510,所述执行设备510可以是电子设备,如手机终端,平板电脑,笔记本电脑,AR/VR,车载终端等,还可以是服务器或者云端等。在附图4中,执行设备510配置有I/O接口512,用于与外部设备进行数据交互,用户可以通过客户设备540向I/O接口512输入数据,所述输入数据在本申请实施例中可以包括 待播放视频。
在执行设备510的计算模块511执行计算等相关的处理过程中,执行设备510可以调用数据存储系统550中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统550中。
最后,I/O接口512将处理结果,如得到的待播放视频的第二播放速度返回给客户设备540,由客户设备540按照该第二播放速度播放待播放视频,为用户提供信息完整、音画舒适的观感体验。
值得说明的是,训练设备520可以针对视频不同的第一信息,基于不同的训练数据生成相应的目标模型/规则501,该相应的目标模型/规则501即可以用于获取不同的第一信息对应的第三播放速度。
在图4中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口512提供的界面进行操作。另一种情况下,客户设备540可以自动地向I/O接口512发送输入数据,如果要求客户设备540自动发送输入数据需要获得用户的授权,则用户可以在客户设备540中设置相应权限。客户设备540也可以作为数据采集端,采集如图4所示输入I/O接口512的输入数据及输出I/O接口512的输出结果作为新的样本数据,并存入数据库530。当然,也可以不经过客户设备540进行采集,而是由I/O接口512直接将如图4所示输入I/O接口512的输入数据及输出I/O接口512的输出结果,作为新的样本数据存入数据库530。
值得注意的是,图4仅是本发明实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图4中,数据存储系统550相对执行设备510是外部存储器,在其它情况下,也可以将数据存储系统550置于执行设备510中。在一些实施例中,执行设备510与客户设备540可以集中部署,作为一个设备。
本申请实施例提供的方法和装置还可以用于扩充训练数据库,如图4所示执行设备510的I/O接口512可以将经执行设备510处理过的视频以及处理结果,作为训练数据对发送给数据库530,以使得数据库530维护的训练数据更加丰富,从而为训练设备520的训练工作提供更丰富的训练数据。
如图4所示,根据训练设备520训练得到目标模型/规则501,该目标模型/规则501在本申请实施例中可以是视频理解网络。在本申请实施例提供的视频理解网络都可以是卷积神经网络、循环神经网络,或者其他。
示例性的,当目标模型/规则501为视频图像理解网络时,视频图像理解网络的功能为针对视频的图像信息,确定图像是否满足条件。视频图像理解网络将预测出每一帧图像是否满足条件,对满足条件的图像配置较小的播放倍率理论值,不满足条件的图像配置较大的播放倍率理论值,将每一帧图像的播放倍率作为待播放视频的图像对应的第三播放速度输出。
需要说明的是,对于视频图像信息对应的第三播放速度中,是否满足条件配置的播放倍率具体值,可以根据实际需求配置,本申请实施例不予限定。
示例性的,当目标模型/规则501为视频图像理解网络时,视频图像理解网络的功能为针对视频图像信息,确定图像满足条件的程度。视频图像理解网络将预测出每一 帧图像满足条件的程度,对完全满足条件的图像配置较小的播放倍率,对完全不满足条件的图像配置较大的播放倍率,对部分满足条件的图像配置中等的播放倍率,将每一帧图像的播放倍率作为待播放视频的图像信息对应的第三播放速度输出。
需要说明的是,对于视频图像信息对应的第三播放速度中,满足条件的不同程度配置的播放倍率具体值,可以根据实际需求配置,本申请实施例不予限定。
示例性的,当目标模型/规则501为视频图像理解网络,条件为图像中目标的运动速率大于阈值时,视频图像理解网络的功能为针对图像信息,进行图像中目标的运动速度预测。视频图像理解网络将预测出每一帧图像中目标的运动快慢,对目标运动较快的图像配置较小的播放倍率,目标较静止的图像配置较大的播放倍率,将每一帧图像面的播放倍率作为待播放视频的图像信息对应的第三播放速度输出。
需要说明的是,对于图像信息对应的第三播放速度中,图像中目标的运动不同速度配置的播放倍率具体值,可以根据实际需求配置,本申请实施例不予限定。
还需要说明的是,图像信息对应的第三播放速度,可以是对图像帧之间播放倍率快慢的相对建议,不约束最终变速倍率。
其中,图像中的目标可以为图像中运动最快的目标,或者,图像中的目标可以为图像中心区域内的目标,本申请实施例对于图像中目标不予限定。图像中心区域可以根据实际需求配置,本申请实施例对此也不予限定。
示例性的,当目标模型/规则501为视频语音理解网络时,视频语音理解网络的功能为针对视频的语音信息进行语速预测。视频语音理解网络预测出每个图像帧对应语音帧中的语速,根据统计得到人类最高舒适语速耐受值,计算出每一帧对应的最高可播放倍率,将每一帧语音的最高播放倍率作为待播放视频的语音信息对应的第三播放速度输出。
需要说明的是,语音信息对应的第三播放速度,可以是对播放倍率的绝对限制,表明最终播放速度的最高建议值,超过该播放倍率,视频将观感不佳。
如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
如图5所示,卷积神经网络(CNN)600可以包括输入层610,卷积层/池化层620(其中池化层为可选的),以及神经网络层630。
卷积层/池化层620:
卷积层:
如图5所示卷积层/池化层620可以包括如示例621-626层,举例来说:在一种实现中,621层为卷积层,622层为池化层,623层为卷积层,624层为池化层,625为卷积层,626为池化层;在另一种实现方式中,621、622为卷积层,623为池化层,624、625为卷积层,626为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层621为例,介绍一层卷积层的内部工作原理。
卷积层621可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络600进行正确的预测。
当卷积神经网络600有多个卷积层的时候,初始的卷积层(例如621)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络600深度的加深,越往后的卷积层(例如626)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图5中620所示例的621-626各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层630:
在经过卷积层/池化层620的处理后,卷积神经网络600还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层620只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络600需要利用神经网络层630来生成一个或者一组所需要的类的数量的输出。因此, 在神经网络层630中可以包括多层隐含层(如图5所示的631、632至63n)以及输出层640,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在神经网络层630中的多层隐含层之后,也就是整个卷积神经网络600的最后层为输出层640,该输出层640具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络600的前向传播(如图5由610至640方向的传播为前向传播)完成,反向传播(如图5由640至610方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络600的损失,及卷积神经网络600通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图5所示的卷积神经网络600仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。
下面介绍本申请实施例提供的一种芯片硬件结构。
图6为本发明实施例提供的一种芯片硬件结构,该芯片包括神经网络处理器(NPU)70。该芯片可以被设置在如图4所示的执行设备510中,用以完成计算模块511的计算工作。该芯片也可以被设置在如图4所示的训练设备520中,用以完成训练设备520的训练工作并输出目标模型/规则501。如图5所示的卷积神经网络中各层的算法均可在如图6所示的芯片中得以实现。
如图6所示,NPU 70作为协处理器挂载到主中央处理器(central processing unit,CPU)(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路70,控制器704控制运算电路703提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路703内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路703是二维脉动阵列。运算电路703还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路703是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路703从权重存储器702中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路703从输入存储器701中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器708(accumulator)中。
向量计算单元707可以对运算电路703的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元707可以用于神经网络中非卷积/非全连接层(fully connected layers,FC)层的网络计算,如池化(Pooling),批归一化(Batch Normalization),局部响应归一化(Local Response Normalization)等。
在一些实现种,向量计算单元能707将经处理的输出的向量存储到统一缓存器706。例如,向量计算单元707可以将非线性函数应用到运算电路703的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元707生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路703的激活输入,例如用于在神经网络中的后续层中的使用。
例如,如图5所示的卷积神经网络中各层的算法均可以由703或707执行。图4中计算模块511、训练设备520的算法均可以由703或707执行。
统一存储器706用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)705将外部存储器中的输入数据搬运到输入存储器701和/或统一存储器706、将外部存储器中的权重数据存入权重存储器702,以及将统一存储器706中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)710,用于通过总线实现主CPU、DMAC和取指存储器709之间进行交互。
与控制器704连接的取指存储器(instruction fetch buffer)709,用于存储控制器704使用的指令。
控制器704,用于调用指存储器709中缓存的指令,实现控制该运算加速器的工作过程。
示例性的,这里的数据可以为是说明数据,可以是图5所示的卷积神经网络中各层的输入或输出数据,或者,可以是图4中计算模块511、训练设备520的输入或输出数据。
一般地,统一存储器706,输入存储器701,权重存储器702以及取指存储器709均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
可选的,图4和图5中的程序算法是由主CPU和NPU共同配合完成的。
本申请实施例提供一种播放视频的方法,应用于用户使用电子设备播放视频的过程中,或者视频服务预处理视频的过程中。
一种可能的实现方式中,该播放视频的方法可以由图4中的执行设备510(例如提供视频的服务器)执行,执行设备510中的计算模块511可以用于执行下述S1001至S1002,由客户设备540按照S1002得到的第二播放速度播放待播放视频。
另一种可能的实现方式中,该播放视频的方法可以由图4中的客户设备540(例如电子设备)执行。
需要说明的是,本申请实施例对于该播放视频的方法的执行主体不予限定,下述实施例中由电子设备执行该播放视频的方法,该电子设备中的单元可以部署于一个设备上,也可以分离的部署在多个设备中,本申请实施例对此不予限定。
具体的,将本申请提供的播放视频的方案称之为自适应加速或AI加速,电子设备可以通过多种方式向用户提供开启自适应加速功能的方式。
在一种可能的实现方式中,电子设备的视频播放软件中,可以由用户设置开启自适应加速功能,当用户开启该功能时,对所有视频执行本申请提供的方案进行自适应加速。
示例性的,如图7所示的某软件的用户设置界面中,自适应加速功能设置图标701用于根据用户选择,开启或关闭自适应加速。当开启自适应加速功能时,用户播放每 一个视频时,无需再单独选择开启该功能,即可享受自适应加速功能与自己选择的播放速率结合之后的加速效果。
在一种可能的实现方式中,电子设备的视频播放软件中,可以默认开启自适应加速功能,由用户在图7示意的设置界面中关闭自适应加速功能。
另一种可能的实现方式中,电子设备的视频播放软件中,可以默认关闭自适应加速功能,由用户在图7示意的设置界面中开启自适应加速功能。
另一种可能的实现方式中,电子设备的视频播放软件中,可以在视频播放界面,由用户操作,开启或关闭对当前播放的视频的自适应加速功能,当用户开启该功能时,对当前播放视频执行本申请提供的方案进行自适应加速。
示例性的,在图8示意的电子设备的界面中,根据用户的操作即将播放或正在播放视频A。图8的界面中提供了智能自适应(AI)加速图标801,图标801通过不同的颜色(或字体、符号、背景色等)指示自适应加速功能的开启与关闭。例如,当用颜色来区分自适应加速功能的开启与关闭时,用户对指示自适应加速功能的关闭的图标801进行一次点击操作,电子设备响应该点击操作,打开自适应加速功能,将图标801显示为指示自适应加速功能开启的颜色,如红色。用户对指示自适应加速功能的开启的图标801进行一次点击操作,电子设备响应该点击操作,关闭自适应加速功能,将图标801显示为指示自适应加速功能关闭的颜色,如白色。本申请对于如何显示自适应加速功能的开启与关闭不做限定。在用户选择开启自适应加速功能之前或之后,可以点击界面中的倍速图标802,以选择自己希望的播放速率,电子设备就会采用本申请的提供的方案在用户选择的播放速率的基础上结合视频内容特点对视频进行自适应加速并播放。
示例性的,在图9中的(a)示意的电子设备的界面中,根据用户的操作即将播放或正在播放视频A。当用户期望对视频A进行自适应加速时,用户在图9中的(a)的界面中对倍速图标901进行点击操作,电子设备响应该点击操作,显示如图9中的(b)所示的界面,该界面中包括多个选择项,一个选择项指示一个播放速度。如图9中的(b)所示的界面中,仅有播放速度的选择项(例如2.0X、1.0X等,还可以包括图9未示出的其他选择项,例如0.5X、0.75X、1.25X、1.5X、1.75X、2.25X、2.5X、2.75X、3.0X等,本申请对此不做限制),用于选择恒定倍率不进行自适应加速。如图9中的(b)所示的界面中,播放速度和AI指示的选择项(例如2.0+AI、1.0+AI等,还可以包括图9未示出的其他选择项,例如0.5X+AI、0.75X+AI、1.25X+AI、1.5X+AI、1.75X+AI、2.25X+AI、2.5X+AI、2.75X+AI、3.0X+AI等,本申请对此不做限制),用于选择恒定倍率且基于恒定倍率进行自适应加速。用户在图9中的(b)的界面中对选择项“2.0+AI”进行点击操作,以选择2.0X的播放速度且进行自适应加速。当用户在图9中的(a)界面中对倍速图标901进行点击操作,电子设备响应该点击操作,也可以只显示上述图9中的(b)中指示播放速度和AI加速的选择项,不显示仅用于指示恒定加速的选择项。对于如何显示用于指示本申请提供的自适应加速功能的选择项,本申请不做限定。
当用户操作开启自适应加速功能后,电子设备执行本申请提供的方案,对视频最后的播放速度是基于用户选择的第一播放速度以及第一信息决定的。
在另一种可能的实现方式中,上述自适应加速功能的开启或关闭,可以结合图7和图8(或图9)的方案,例如,用户在软件的整体设置界面打开了自适应加速功能,然后播放某一特定视频时又可以在该界面关闭自适应加速功能,以对该特定视频关闭自适应加速功能,而对其他的视频仍然采用自适应加速功能。
如图10所示,本申请实施例提供的播放视频的方法可以包括:
S1001、电子设备获取第一播放速度。
具体的,S1001中电子设备获取待播放视频的第一播放速度。一种可能的实现方式中,待播放视频是用户在电子设备上选取的期望播放的视频。例如,用户在电子设备中某一在线影音播放软件中,选择播放电影视频,该电影视频则为待播放视频。
另一种可能的实现方式中,待播放视频是视频服务器提供的任一视频。
其中,第一播放速度与用户播放设置相关。
一种可能的实现方式中,第一播放速度可以为用户选取的期望的播放速度,或者用户习惯的播放速度,获取用户默认的播放速度等。本申请实施例对于第一播放速度的获取方式不进行具体限定。
一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,显示第一界面。其中,第一界面包括至少两个选择项,一个选择项指示一个播放速度;获取第二操作信息;根据第二操作信息和至少两个选择项,确定第一播放速度。在该实现方式中,将用户通过第二操作信息在第一界面中选取的选择项指示的播放速度,确定为第一播放速度。
其中,第一操作信息可以为触发显示第一界面的操作信息,或者,第一操作可以为触发进行播放速度选择的操作信息。第一界面为进行播放速度选择的界面。
示例性的,在图11中的(a)示意的电子设备的界面中,根据用户的操作即将播放或正在播放视频A。用户在在图11中的(a)的界面中,对倍速图标1101进行点击操作(第一操作),电子设备显示如图11中的(b)示意的第一界面,该第一界面中包括多个选择项(如图11中的(b)中的2.0X、1.5X等,还可以包括图11未示出的其他选择项,例如0.5X、0.75X、1.25X、1.5X、1.75X、2.25X、2.5X、2.75X、3.0X等,本申请对此不做限制)。用户在图11中的(b)示意的第一界面中,对期望的播放速度对应的选择项(例如2.0X)进行点击操作(第二操作),电子设备根据该第二操作信息以及该第二操作信息在第一界面中选取的选择项“2.0X”,确定第一播放速度为“2.0X”。
在另一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,显示第一界面。其中,第一界面包括第一速度;根据获取的第二操作信息,显示第二界面,其中,第二界面包括第二速度;根据第二速度,确定第一播放速度。在该实现方式中,将用户通过第二操作信息触发的第二界面中的第二速度,确定为第一播放速度,实现了第一播放速度为用户选择的播放速度。
其中,该实现方式中的第一操作信息,可以为调起菜单的操作信息,第一界面为视频当前播放界面中呈现了当前的第一速度的界面。第二操作信息为调整播放速度的操作信息,第二操作信息可以为从多个选项中选取第二速度,或者,第二操作信息可以为步进确定第二速度。
示例性的,在图12中的(a)示意的电子设备的界面中,电子设备正在播放视频A,该界面中不包括任何操作项。用户在在图12中的(a)的界面中,对屏幕任一位置进行点击操作(第一操作),调起操作项,电子设备根据该点击操作,显示如图12中的(b)所示的第一界面,该第一界面中包括电子设备播放视频B的当前播放速度1201(第一速度)。用户在如图12中的(b)所示的第一界面中,对当前播放速度1201进行点击操作(第二操作),电子设备根据该第二操作,显示如图12中的(c)所示的第二界面,该第二界面为播放速度选择界面,第二界面中包括多个选择项,一个选择项对应一个播放速度,第二界面中包括用户期望的第二速度(例如“2.0X”),用户对期望的“2.0X”进行点击操作,电子设备根据该点击操作,确定第一播放速度为该第二速度“2.0X”。
示例性的,用户在如图12中的(b)所示的第一界面中,对当前播放速度1201进行点击操作(第二操作),每点击一次代表播放速度步进一档,电子设备根据该第二操作,显示如图12中的(d)所示的第二界面,该第二界面中包括用户通过一次或多次点击操作选取的第二速度(例如“2.0X”),电子设备确定第一播放速度为该第二速度“2.0X”。
需要说明的是,用户在如图12中的(b)所示的第一界面中,对当前播放速度1201进行点击操作(第二操作),电子设备根据该第二操作向用户呈现如图12中的(e)所示的第二界面,第二界面中包括用于选取播放速度的进度条1202,用户对该进度条1202进行拖动选取第二速度,电子设备确定第一播放速度为该第二速度。
在另一种可能的实现方式中,获取第一播放速度具体可以实现为:根据获取的第一操作信息,停止播放该视频的前一视频,开始播放该视频;根据该前一视频的播放速度,确定第一播放速度。在该实现方式中,将用户通过第一操作信息触发的切换播放目标视频时,原播放视频的播放速度确定为第一播放速度,实现了第一播放速度为用户习惯的播放速度。
其中,原播放视频的播放速度可以为原视频时第一播放速度,或者,也可以为原视频的第二播放速度。
示例性的,在图13示意的电子设备的界面中,电子设备正在播放视频A。用户在图13的界面中,对切换下一视频的切换图标1301或者选集图标1302,进行点击操作(第一操作),电子设备根据该点击操作,停止播放视频A,开始播放视频B。电子设备可以将播放视频A时的播放速度确定为播放视频B时的第一播放速度。
示例性的,在图14示意的电子设备的界面中,电子设备正在播放视频A,该界面中还包括多个周边视频。用户在图12的界面中,对周边视频D进行点击操作(第一操作),电子设备根据该点击操作,停止播放视频C,开始播放视频D。电子设备可以将播放视频C时的播放速度确定为播放视频D时的第一播放速度。
需要说明的是,上述图11至图14示意的选择第一播放速度的示例中,均已开启了自适应加速功能,具体的开启自适应加速功能的方式,已在前述内容中进行了详细说明,此处不再赘述。
在另一种可能的实现方式中,当用户未选择第一播放速度时,第一播放速度可以为视频默认的播放速度。
需要说明的是,上述获取第一播放速度的示例仅为举例说明,并不构成具体限定。
S1002、电子设备获取第一信息。
其中,第一信息为待播放视频的第一信息。第一信息可以包括视频的图像信息和/或视频的语音信息。图像信息可以为图像帧序列,语音信息可以为语音帧序列。
具体的,电子设备可以将待播放视频抽取为图像帧和语音帧,以获取待播放视频的画面信息和语音信息。
另一种可能的实现方式,上述第一信息还可以包括用户感兴趣的内容。其中,用户感兴趣的内容可以包括以下信息中的至少一种:视频的人物描述信息、视频的内容描述信息、视频的内容结构信息。
其中,视频的人物描述信息用于指示用户对视频中感兴趣的人物的信息,该人物可以为演员或者扮演的角色或者其他。视频的内容描述信息用于指示用户对视频中感兴趣的情节或内容的信息,例如特定的风景、特定的动作等。视频的内容结构信息用于指示用户对视频中感兴趣的章节或位置的信息,例如分篇章组织的长视频中的章节序号或与特定章节相关的内容等。
具体的,电子设备可以从内部获取该设备的用户感兴趣的内容,例如根据用户的历史观看记录等来确定用户感兴趣的内容。或者,电子设备也可以从外部获取该设备的用户感兴趣的内容,例如,用户使用同一账户登陆多个电子设备,使用同一账户的多个电子设备的历史观看记录等信息会进行同步;或者,用户可以手动输入自己感兴趣的内容,例如可以是电子设备显示若干用户可能感兴趣的内容供用户选择,也可以是用户通过文字、语音、图像等方式输入自己感兴趣的内容。上述用户感兴趣内容的具体获取方式,仅仅是示例性的,本申请不加限定。
另一种可能的实现方式中,上述第一信息还可以包括该待播放视频的第一播放模式信息,其中,该第一播放模式信息与视频所对应的播放尺寸信息相关联。其中,视频所对应的播放尺寸信息,可以用于指示视频的显示比例或显示尺寸,例如全屏显示或小窗口显示等。
另一种可能的实现方式,上述第一信息还可以包括待播放的视频的第二播放模式信息,其中,该第二播放模式信息与视频的清晰度信息相关联。其中,视频的清晰度信息,可以用于指示视频播放分辨率,例如高清模式、蓝光模式、省流模式等。
示例性的,电子设备可以在内部读取该设备中存储的用户感兴趣的内容、第一播放模式信息、第二播放模式信息。
另一种可能的实现方式,上述第一信息还可以包括电子设备的运动状态信息。其中,运动状态信息,可以用于指示电子设备的移动速度或位姿信息。
例如,电子设备可以通过陀螺仪确定设备是否移动中或移动速度,通过方向传感器确定设备的角度。
另一种可能的实现方式,上述第一信息还可以包括电子设备的噪声强度信息。其中,电子设备的噪声强度信息,可以用于指示电子设备的环境干扰程度。
例如,设备可以通过声音传感器确定电子设备的噪声强度信息。
另一种可能的实现方式中,上述第一信息还可以包括用户视点信息。其中,用户视点信息,可以用于指示用户观看视频时视线的落点,体现了用户的兴趣。
例如,电子设备可以通过摄像头捕获的图像,利用视线估计/视点估计技术确定人眼是否注视设备,具体方案本申请实施例不予限定也不进行赘述。
另一种可能的实现方式中,上述第一信息还可以包括音频播放设备的连接状态信息。其中,音频播放设备可以为耳机或者音箱。音频播放设备的连接状态信息用于指示音频播放设备是否连接,当连接时用户对视频语音的敏感度高,不易受到外界环境的干扰;当未连接时,用户对视频语音的敏感度低,容易受到外界环境的干扰。
另一种可能的实现方式,上述第一信息还可以包括网络状态信息。网络状态信息用于指示电子设备接入的网络的质量或类型,当电子设备接入高质量的网络时,播放视频顺畅,否则播放视频卡顿。
需要说明的是,本申请实施例S1002中描述的第一信息以及获取各第一信息的方式仅为示例说明,并不构成具体限定,在实际应用中,可以根据实际需求配置S1002的第一信息的内容,以及获取各第一信息的方式,本申请实施例不再一一赘述。
S1003、电子设备以第二播放速度播放该视频,第二播放速度基于第一播放速度和第一信息得到。
在一种可能的实现方式中,采用第一播放速度播放该视频的第一时长,与采用第二播放速度播放该视频的第二时长不同。
在一种可能的实现方式中,第二播放速度基于第一播放速度和第一信息得到包括:根据第一信息中的每一种信息确定对应的第三播放速度;根据第一播放速度和所有第三播放速度,确定第二播放速度。
在另一种可能的实现方式中,第二播放速度基于第一播放速度和第一信息得到,包括:根据第一信息中的每一种信息确定对应的第三播放速度;根据第一播放速度和部分第三播放速度,确定第二播放速度。其中,部分第三播放速度,是从所有第三播放速度中筛选出的。该筛选可以将明显不符合条件的第三播放速度过滤掉,可以提高确定第二播放速度的效率。本申请实施例对于筛选规则不予限定。
一个第一信息对应的第三播放速度可以包括该第一信息决定的对待播放视频中每一帧的播放倍率理论值或播放倍率最大允许值。第二播放速度中对一帧的播放倍率小于或等于任一包含播放倍率最大允许值的第三播放速度中对同一帧的播放倍率。
示例性的,符合用户感兴趣内容的不同程度对应不同的播放倍率理论值,或者,目标移动速度处于不同区间对应不同的播放倍率理论值。其中,符合用户感兴趣内容的程度越高,对应的播放倍率理论值越小;目标移动速度越快,对应的播放倍率理论值越小。
需要说明的是,配置播放倍率理论值的方案可以根据实际需求配置,本申请实施例对此不予限定。
具体的,当第一信息的内容不同时,第一信息对应的第三播放速度的内容也不同,下面分别示例说明:
示例1、对于视频的图像信息,图像信息对应的第三播放速度可以包括由图像中目标运动速度决定的待播放视频中每一帧的播放倍率理论值。
示例性的,在示例1中的图像信息对应的第三播放速度中,目标运动较快的图像播放倍率低,目标运动较慢的图像播放倍率高。具体的不同移动速度对应的播放倍率, 可以根据实际需求配置,本申请实施例对此不予限定。
示例2、对于视频的图像信息,图像信息对应的第三播放速度包括图像信息在多个不同播放速度下对应的包括播放倍率理论值的第三播放速度。
其中,在示例2中,图像信息在一个播放速度下对应的包括播放倍率理论值的第三播放速度,与示例1中的类似,不在赘述。
具体的,在示例2中,可以通过插值的方式,得到多个不同播放速度下的图像信息,然后按照示例1的描述,得到多个不同播放速度下的图像信息对应的第三播放速度,作为该图像信息对应的第三播放速度。下面实施例一对该示例2的具体实现进行了说明,此处不进行赘述。
需要说明的是,本申请实施例对于多个不同播放速度的数量以及取值均不进行具体限定。
示例3、对于视频的语音信息,语音信息对应的第三播放速度可以包括由语音速度决定的待播放视频中每一帧的播放倍率最大允许值。
其中,语音信息对应的第三播放速度可以为保证用户观感体验的播放倍率最大允许值的集合。
具体的,可以将视频的语音信息输入语音理解模块,得到视频的语音信息对应的第三播放速度。语音理解模块可以为前述实施例中描述的视频语音理解网络,当然,也可以为其他模块,本申请实施例对此不予限定。
具体的,在示例3中,可以先预测出视频中每个图像的语音帧中的语音语速,根据统计得到人类最高舒适语速耐受值,计算每一帧对应的最高可播放倍速。
例如,人类最高舒适语速耐受值除以语音帧中的语音语速,减去预设余量,则为语音帧对应的最高可播放倍速。其中,预设余量可以根据实际需求配置,本申请实施例对此不予限定。例如,该预设余量可以为0。
示例4、针对视频的外部环境的第一信息,第一信息对应的第三播放速度包括由该外部环境信息决定的待播放视频中每一帧的播放倍率最大允许值。
具体的,外部环境的第一信息可以包括电子设备的运动状态信息、电子设备的噪声强度信息、用户视点信息、音频播放设备的连接状态信息中的任一项。示例4中电子设备可以根据第一信息对应的策略,确定外部环境信息中的第一信息对应的第三播放速度。该策略可以为保证用户观感体验的播放倍率最大允许值。
例如,在示例4中,配置的策略可以如下:
设备的移动速度越高,为了防止用户观看视频眩晕,则配置较低的播放倍率最大允许值,设备的移动速度越低,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值。设备的角度相对于用户倾斜时,可以配置较低的播放倍率最大允许值,以使得用户可以清晰地观看视频。
设备的噪声强度越大,为了保证用户的观感体验,则可以配置较低的播放倍率最大允许值,设备的噪声强度越小,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值。
用户视点信息指示用户专注看设备时,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值,用户视点信息指示设备的用户未专注看设备时,为 了避免用户遗漏视频内容,则可以配置较低的播放倍率最大允许值。
当然,获取外部环境的第一信息对应的第三播放速度时配置的策略的内容,可以根据实际需求配置,本申请实施例仅示例描述,并不构成具体限定。在某一时刻,外部环境的第一信息是固定的信息,该时刻外部环境的第一信息对应的第三播放速度中的播放倍率可以相同。
示例5、针对视频的内部状态的第一信息,第一信息对应的第三播放速度可以包括由该内部状态信息决定的待播放视频中每一帧的播放倍率最大允许值。
具体的,视频的内部状态的第一信息可以包括网络状态信息、第一播放模式信息、第二播放模式信息中的任一项。
可以将内部状态信息输入内部状态理解模块,得到内部状态信息对应的第三播放速度。示例5中电子设备可以根据第一信息对应的策略,确定第一信息对应的第三播放速度。
例如,在示例5中,配置的策略可以如下:
设备的屏幕越大,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值;设备的屏幕越小,为了防止用户观看视频眩晕,则配置较低的播放倍率最大允许值。
设备的连接了耳机时,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值;设备外放时,为了保证用户可以听清,则配置较低的播放倍率最大允许值。
设备的播放音量越大时,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值;设备的播放音量越小时,为了保证用户可以听清,则配置较低的播放倍率最大允许值。
设备的播放清晰度越高时,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值;设备的播放清晰度越低时,为了保证用户可以看清,则配置较低的播放倍率最大允许值。
设备的网络质量越高时,用户可以观看较高速播放的视频,则可以配置较高的播放倍率最大允许值;设备的网络质量越低时,为了保证用户可以看清以防卡顿,则配置较低的播放倍率最大允许值。
当然,获取内部状态的第一信息对应的第三播放速度时配置的策略的内容,可以根据实际需求配置,本申请实施例仅示例描述,并不构成具体限定。在某一时刻,内部状态的第一信息是固定的信息,该时刻内部状态的第一信息对应的第三播放速度中的播放倍率可以相同。
示例6、针对用户个性化的第一信息,第一信息对应的第三播放速度可以包括由第一信息对应的图像内容决定的待播放视频中每一帧的播放倍率理论值。
用户个性化的第一信息可以包括用户感兴趣的内容。
示例性的,在示例6中的个性化的第一信息对应的第三播放速度中,满足用户兴趣的程度越高,播放倍率越低。具体的满足用户兴趣的不同程度对应的播放倍率,可以根据实际需求配置,本申请实施例对此不予限定。
以第二播放速度播放视频中与用户感兴趣的内容相关的帧时的速度不快于以第一 播放速度中播放该帧的速度,以实现用户感兴趣的内容慢速播放,提高用户观感体验。
比如,个性化的第一信息对应的第三播放速度可以实现用户喜爱的明星部分慢速播放,而其他情节快速播放的效果。
示例7、针对个性化的第一信息,第一信息对应的第三播放速度可以包括由第一信息决定的待播放视频中每一帧的播放倍率最大允许值。
其中,示例7中个性化的第一信息可以包括用户年龄等。
例如,在示例7中,用户的年龄越大,为了防止用户观看视频眩晕,则配置较低的播放倍率最大允许值用户的年龄越大。
当然,上述示例只是通过举例的形式,对不同第一信息对应的第三播放速度的示例说明,并不构成具体限定。需要说明的是,上述各个示例描述的方式可以组合成多种方案,也可以单独使用,本申请实施例不再一一赘述。
可选的,用户的个性化的第一信息还可以包括用户是否接受根据设备内部状态、外部环境状态等信息调整变速倍率,S1003中可以根据该用户个性化的第一信息,决定是否根据设备内部状态、外部环境状态等信息调整播放倍率。
需要说明的是,若某一第一信息对应的第三播放速度中包括的播放倍率数量小于待播放视频的帧数,可以通过采样、插值等方式对齐,使得第三播放速度中包括的播放倍率数量等于待播放视频的帧数。
需要说明的是,第三播放速度中包括的播放倍率,均为有效播放倍率。有效播放倍率可以为非0的播放倍率,或者,有效播放倍率可以为大于门限的播放倍率。若获取的某一第一信息对应的第三播放速度中包括无效播放倍率,需对其进行处理更新为有效播放倍率。该处理可以为取相邻两帧的平均值,或者,取前一帧的播放倍率,或者,取后一阵的播放倍率,或者取第三播放速度中有效播放倍率的平均值,或者取其他第三播放速度中该位置的有效播放倍率,或其他处理方式,本申请实施例对此不予限定。
例如,语音信息中某一语音帧中不存在语音,确定的语音信息对应的第三播放速度中该帧的播放倍率可能为0,该播放倍率则为无效播放倍率,需处理为有效播放倍率。
需要说明的是,S1003中可以由部署于云端的单元,获取图像信息对应的第三播放速度、语音信息对应的第三播放速度,并存储于云端与待播放视频对应,电子设备播放该待播放视频时,直接获取使用即可。其中,该云端可以为待播放视频的源服务器或其他,本申请实施例对此不予限定。
可选的,S1003中可以由电子设备中的单元,获取图像信息、语音信息之外的第一信息对应的第三播放速度,以满足实时性需求。
示例性的,图15示意了不同第一信息对应的第三播放速度的获取方式。
如图15所示,待播放视频的图像信息对应的第三播放速度可以包括画面变化速度播放倍率理论值序列、指定的明星播放倍率理论值序列、兴趣情节播放倍率理论值或其他。待播放视频的语音信息对应的第三播放速度可以包括仅语速播放倍率理论值序列、考虑背景音播放倍率最大允许值序列、兴趣台词播放倍率理论值序列或其他。图像信息对应的第三播放速度、语音信息对应的第三播放速度可以通过云计算完成。
内部状态的第一信息对应的第三播放速度可以为播放倍率理论值(播放倍率最大允许值)序列,外部环境的第一信息对应的第三播放速度可以为播放倍率理论值(播放倍率最大允许值)序列,用户个性化的第一信息对应的第三播放速度可以为播放倍率理论值(播放倍率最大允许值)序列,可以由视频播放设备实时完成获取。
具体的,S1003中电子设备可以根据每个第一信息对应的第三播放速度以及第一播放速度,确定第二播放速度。
其中,该第二播放速度包括对待播放视频中每一帧的播放倍率,第二播放速度中对一帧的播放倍率小于或等于任一第一信息对应的,包含播放倍率最大允许值的第三播放速度中对同一帧的播放倍率。
一种可能的实现方式中,每个第一信息对应一个第三播放速度,将S1003中获取的每个第三播放速度进行融合操作,得到第二播放速度。
其中,本申请描述的融合操作,可以包括选取不同第三播放速度(参与融合第三播放速度)中,相同帧的最大播放倍率与最小的播放倍率之间的播放倍率。
一种可能的实现方式中,参与融合操作的第三播放速度包括播放倍率最大允许值,上述融合操作可以包括:若不同第三播放速度(参与融合的第三播放速度)中,相同帧的最小的播放倍率为播放倍率最大允许值,选取参与融合的第三播放速度中该帧的最小的播放倍率最大允许值;若不同第三播放速度(参与融合的第三播放速度)中,相同帧的最小的播放倍率为播放倍率理论值,选取选取参与融合的第三播放速度中该帧的最小的播放倍率最大允许值与最小的播放倍率理论值的计算值,该计算值可以为平均值,或者最大值,或者最小值,或者其他。
在另一种可能的实现方式中,参与融合操作的第三播放速度不包括播放倍率最大允许值,可以理解为仅包括播放倍率理论值,上述融合操作可以包括:选取不同第三播放速度(参与融合的第三播放速度)中,相同帧的最大播放倍率理论值与最小的播放倍率理论值的计算值,该计算值可以为平均值,或者最大值,或者最小值,或者其他。
图16示意了视频图像信息对应的第三播放速度与语音信息对应的第三播放速度进行融合操作的场景。如图16所示,语音信息对应的第三播放速度中,无语音部分的语音帧的播放倍率是无效播放倍率,有语音部分的播放倍率是有效播放倍率,无语音部分的语音帧位置融合后的播放倍率,是图像信息对应的第三播放速度中该位置的播放倍率。
另一种可能的实现方式中,为了保证用户的观感体验,采用S1003中确定的第二播放速度播放待播放视频的播放时长,与采用第一播放速度R 0播放待播放视频的播放时长的差值,小于或等于门限值。该门限值可以根据实际需求配置,本申请对此不予限定。
一种可能的实现方式中,每个第一信息对应一个第三播放速度,将上述确定第二播放速度的第三播放速度称之为备选第三播放速度(所有第三播放速度或部分第三播放速度),根据备选第三播放速度以及第一播放速度,确定第二播放速度,具体可以实现为:对所有备选第三播放速度进行融合操作,或者,对包括播放倍率理论值的备选第三播放速度进行融合操作,得到第四播放速度;对包括播放倍率最大允许值的备 选第三播放速度进行融合操作,得到第五播放速度;将第四播放速度、第五播放速度,按照第一播放速度R 0进行数值优化,得到第二播放速度。
其中,可以采用随机梯度下降等数值优化方式,进行数值优化,得到第二播放速度。
示例性的,将第四播放速度、第五播放速度,按照R 0进行数值优化,得到第二播放速度,具体可以实现为:将第四播放速度、第五播放速度以及R 0,输入目标函数进行数值优化,将使得目标函数最小的播放速度作为第二播放速度。
其中,目标函数用于描述根据第四播放速度、第五播放速度得到的播放速度对R 0的满足程度。目标函数值越小,根据第四播放速度、第五播放速度以得到的播放速度越接近R 0
具体的,将第四播放速度、第五播放速度以输入目标函数,通过调整目标函数中的预设参数,可以得到的不同的播放速度。
另一种可能的实现方式中,对应于S1003中的示例2,视频的图像信息对应的第三播放速度,包括视频的图像信息在多个不同播放速度下对应的包括播放倍率理论值的第三播放速度,S1003中根据第三播放速度以及第一播放速度,确定第二播放速度,具体可以实现为:将视频的图像信息在多个不同播放速度下对应的第三播放速度,分别与其他第一信息对应的第三播放速度进行融合操作,或者,将视频的图像信息在多个不同播放速度下对应的第三播放速度,分别与其他第一信息对应的包括播放倍率理论值的第三播放速度进行融合操作,得到多个第四播放速度;对包括播放倍率最大允许值的每个第三播放速度进行融合操作,得到第五播放速度;将第四播放速度、分别与第五播放速度以及第一播放速度R 0,输入目标函数,将使得该目标函数最小的播放速度作为第二播放速度。
示例性的,上述目标函数可以满足如下表达式:
argmin sE speed(S,V)+βE rate(S,R 0)+αE smooth(S′,n)+δE A(S,A)     式2
其中:
argmin s表示选取第二播放速度S使函数值最低;α、β、δ为预设参数,可以根据实际需求配置,本申请实施例不予限定。α为自定义超参,该值越大表示优化过程越侧重考虑整体倍率趋近于R0,β为自定义超参,该值越大表示优化过程越侧重曲线的平滑性;δ为自定义超参,该值越大表示播放倍率最大允许值对最终结果的限制越严格。
在采用目标函数进行数值优化的过程中,可以通过调整预设参数的取值,以根据第四播放速度、第五播放速度得到不同的播放速度。
E speed(S,V)用于控制低加速段接近用户指定的最低播放速度R min
Figure PCTCN2022094784-appb-000016
R min可以由用户提供,或者可以为默认值,该默认值可以为1。
Figure PCTCN2022094784-appb-000017
为第四播放速度中第t帧的归一化播放倍率,S(t)为第二播放速度中第t帧的播放倍率。
γ为预设参数,该值越小优化最终结果最低播放倍率越接近R min
E rate(S,R 0)用于控制整体播放倍率接近R 0
Figure PCTCN2022094784-appb-000018
T为待播放视频中的画面总帧数。
E smooth(S′,n)用于控制第二播放速度的平滑性,
Figure PCTCN2022094784-appb-000019
n为目标函数的平滑宽度。
E A(S,A)用于控制第二播放速度不超过第五播放速度中相同帧的播放倍率,
Figure PCTCN2022094784-appb-000020
A(t)为所述第五播放速度中第t帧的播放倍率。
需要说明的是,上述目标函数仅为示例说明,并不构成具体限定。在实际应用中,可以根据实际需求配置目标函数的内容。
图17示意了视频的画面信息对应的第三播放速度与语音信息对应的第三播放速度进行融合操作的场景。在该场景中,视频的画面信息对应一个第三播放速度。如图15所示,语音信息对应的第三播放速度中,无语音部分的语音帧的播放倍率是无效播放倍率,有语音部分的播放倍率是有效播放倍率。视频的画面信息对应的第三播放速度与语音信息对应的第三播放速度进行融合操作,得到融合相对播放速度V(第四播放速度);包括播放倍率理论值的第三播放速度只有语音序列对应的第三播放速度,将其作为融合绝对播放速度A(第五播放速度),将V、A以及R 0输入上述式2所示的目标函数,优化后得到的第二播放速度可以如图17所示。在图17所示的第二播放速度中,第一片段位置画面较快,第二播放速度中的低倍率防止眩晕;第二片段、第三片段通过低倍率保证用户听清语音。
通过本申请提供的播放视频的方法,基于用户播放设备相关的第一播放速度以及视频的第一信息确定最终播放视频的第二播放速度,这样一来,兼顾了视频内容以及用户需求实现自适应变速,使得视频的整体播放时长接近用户需求,但是播放时也考虑了视频的画面清晰、语速适度等与观看体验强相关的因素,提高了自适应变速播放视频的用户观感体验。
示例性的,假设用户对战争情节感兴趣,用户在电子设备中观看两个总时长相同的视频X和视频Y,且选取的相同的第一播放速度,视频X中包括了众多的战争情节,而视频Y是人文相关的纪录片,按照本申请提供的方案进行自适应加速后,由于视频X包括用于感兴趣的内容,播放视频X的第二播放速度就小于播放视频Y的第二播放速度,用户观看视频X的时长将大于视频Y的时长。
示例性的,图18示意了一种播放视频的方法,该方法中以视频中的图像信息和语音信息确定第二播放速度以播放视频,在图18示意的播放视频的方法中,以图像信息中目标的移动速度来确定图像信息对应的第三播放速度,且图像信息对应的第三播放速度,包括所述图像信息在多个不同播放速度下对应的包括播放倍率理论值的第三播放速度。该播放视频的方法具体可以包括:
S1、将待播放视频抽取为图像信息V和语音信息A。
S2-V、以抽帧方式,将图像信息V生成K段不同播放速度的图像帧序列,每段图像帧序列以滑窗方式,依次将连续的W帧的片段送入速率预测模块speednet中,得到K段图像帧序列的图像速度预测序列。
其中,K段图像帧序列的播放速度分别为X 0~X K-1,X 0~X K-1可以根据实际需求配置,本申请对其取值不予限定。
具体的,速率预测模块speednet用于预测每个图像中目标的移动速度,速率预测模块speednet可以为神经网络。
图像速度预测序列是速率预测模块speednet的输出结果,K段图像帧序列输入速率预测模块speednet,生成K个图像速度预测序列。图像速度预测序列中包括0~1之间的标量预测值,该值表示图像帧中目标的速度预测,例如,1表示图像中目标被算法认为运动正常,1之外的其他值表示图像中存在目标被算法认为处于快速运动状态。
S3-V、对K个图像速度预测序列进行长度对齐。
具体的,以待播放视频的帧数F为长度基准,通过插值法将K个图像速度预测序列插值为长度均为F的序列,实现序列长度对齐。
S4-V、根据阈值获取播放倍率理论值序列。
示例性的,在S4-V中,依次从0~1之间的阈值集合(例如阈值集合可以为{0.1,……,0.9},)中取多个阈值,该阈值用于界定是否快速运动,大于阈值认为快速运动,小于阈值认为正常运动。
对选取的每个阈值进行如下操作:
将K个图像速度预测序列,与阈值作比较,得到K个二进制序列(大于阈值的为1,小于阈值的为0),每个二进制序列乘以其对应的播放速度(X 0~X K-1中的值)。乘以各自对应的播放速度后的K个序列,相同帧位置取对应的最大值,K个序列变为1个序列,称为播放倍率理论值序列,该播放倍率理论值序列中的每一个值代表了其对应图像帧使分类网络(速率预测模块speednet)判别为非加速状态(速率预测模块speednet输出值为1)的最大可能播放倍率。
对{0.1,……,0.9}中选取多个(例如选取9个)不同的阈值,分别进行上述操作,共形成多个(例如9个)播放倍率理论值序列,该播放倍率理论值序列可以理解为前述实施例中,图像信息对应的第三播放速度。
S2-A、将语音信息输入到语速估计模块中,估计出视频中人物说话片段的语速,形成语速序列。
其中,语速估计模块可以用于字幕速度统计。
S3-A、统计得到人类舒适的最高语速,根据语速序列计算得到语音最大可变速倍率序列。
具体的,在S3-A中将人类舒适的最高语速处于语速序列中每个语音帧的语速,得到每个语音帧最大可以变速的播放倍率,得到语音最大可变速倍率序列,最终变速不能超过该最大可以变速的播放倍率。
S4-A、将语音最大可变速倍率序列长度对齐为视频帧数F,得到播放倍率最大允许值序列。
S4-A中得到的播放倍率最大允许值序列可以理解为前述实施例中,语音信息对应的第三播放速度。
S5、将播放倍率理论值序列和播放倍率最大允许值序列对齐。
在S5中,画面信息对应的播放倍率理论值序列之外,不存在其他第一信息对应的 播放倍率理论值序列,仅存在S4-A中得到的语音信息对应的播放倍率最大允许值序列,可以将S4-V中得到的多个播放倍率理论值序列,分别是S4-A中得到播放倍率最大允许值序列,融合得到多个融合相对可变速序列,将该多个融合相对可变速序列理解为前述实施例中描述的多个第四播放速度。在对其融合过程中,对存在语音部分,取两个序列中的最小有效值作为该处播放倍率,形成融合相对可变速序列。
在S5中,仅存在S4-A中得到的语音信息对应的播放倍率最大允许值序列,可以将S4-A中得到的播放倍率最大允许值序列理解为前述实施例中描述的第五播放速度。
例如,9个播放倍率理论值序列分别与播放倍率最大允许值序列对齐融合,得到9个融合相对可变速序列,该融合相对可变速序列是可以达到音画协同播放倍率序列。
S6、根据9条融合相对可变速序列及播放倍率最大允许值序列,选取最终的变速序列。
具体的,在S6中,可以对9条融合相对可变速序列中的数值,分别归一化到0~1之间;然后将归一化后的9条融合相对可变速序列,分别与播放倍率最大允许值序列以及用户指定的播放倍率R 0,带入到目标函数中进行数值优化,选择使得目标函数最低的优化结果作为最终的变速序列,即前述实施例中描述的第二播放速度。该目标函数可以为前述式2示意的目标函数或其他,本申请实施例对此不予限定。
其中,对于归一化过程采用的归一化值,可以根据实际需求配置,本申请实施例对此不予限定。
示例性的,可以选取融合相对可变速序列中的最大值,对融合相对可变速序列进行归一化。
S7、根据最终的变速序列,对视频进行播放,形成自适应变速效果。
图19示意了仅基于图像的自适应变速方案的变速曲线与本申请方案的自适应变速曲线对比。如图19所示,在图像较慢但存在语音的片段1,仅基于图像的自适应变速方案的变速曲线播放倍率很高,语音信息几乎丢失,而本申请的变速曲线在具有更低的播放倍率,保证用户能听清台词,保证了语音清晰。在画面抖动较快的片段2,本申请的播放倍率均低于2x,整体观感比恒定2x更舒缓自然。
图20示意了一种本申请方案的自适应变速曲线,对于图像几乎不变化但语速极快的片段3,本申请提供的自适应变速曲线字该片段3播放倍率接近于1.0x,而此段台词在恒定2x播放时已听不清(信息丢失)。
图21示意了一种申请方案的自适应变速曲线,对于情节激烈的战争片中画面晃动剧烈的片段,使用恒定2x倍率观看已产生了视觉眩晕不适感,紧张的台词也听不清,如图21所示片段4,本申请在该片段4包括的各个片段中播放倍率远低于2x,极大缓解了眩晕不适感。
上述主要从电子设备的工作原理角度对本申请实施例提供的方案进行了介绍。可以理解的是,上述电子设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来 实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对执行本申请提供的播放视频的装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图22示出了上述实施例中所涉及的确定播放视频的装置220的一种可能的结构示意图。该确定播放视频的装置220可以为功能模块或者芯片。如图22所示,播放视频的装置220可以包括:第一获取单元2201、第二获取单元2202、播放单元2203。其中,第一获取单元2201用于执行图10中的过程S1001;第二获取单元2202用于执行图10中的过程S1002;播放单元2203用于执行图10中的过程S1003。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图23示出了上述实施例中所涉及的电子设备230的一种可能的结构示意图。电子设备230可以包括:处理模块2301、通信模块2302。处理模块2301用于对电子设备230的动作进行控制管理,通信模块2302用于与其他设备通信。例如,处理模块2301用于执行图10中的过程S1001至S1003中任一过程。电子设备230还可以包括存储模块2303,用于存储电子设备230的程序代码和数据。
其中,处理模块2301可以为图2所示的电子设备100的实体结构中的处理器110,可以是处理器或控制器。例如可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理模块2301也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块2302可以为图2所示的电子设备100的实体结构中的移动通信模块150或无线通信模块160,通信模块2302可以是通信端口,或者可以是收发器、收发电路或通信接口等。或者,上述通信接口可以通过上述具有收发功能的元件,实现与其他设备的通信。上述具有收发功能的元件可以由天线和/或射频装置实现。存储模块2303可以是图2所示的电子设备100的实体结构中的内部存储器121。
如前述,本申请实施例提供的播放视频的装置220或电子设备230可以用于实施上述本申请各实施例实现的方法中相应的功能,为了便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请各实施例。
作为本实施例的另一种形式,提供一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述方法实施例中的播放视频的方法。
作为本实施例的另一种形式,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行时执行上述方法实施例中的播放视频的方法。
本申请实施例再提供一种芯片系统,该芯片系统包括处理器,用于实现本发明实施例的技术方法。在一种可能的设计中,该芯片系统还包括存储器,用于保存本发明实施例必要的程序指令和/或数据。在一种可能的设计中,该芯片系统还包括存储器, 用于处理器调用存储器中存储的应用程序代码。该芯片系统,可以由一个或多个芯片构成,也可以包含芯片和其他分立器件,本申请实施例对此不作具体限定。
结合本申请公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、闪存、ROM、可擦除可编程只读存储器(erasable programmable ROM,EPROM)、电可擦可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。或者,存储器可以与处理器耦合,例如存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。存储器可以用于存储执行本申请实施例提供的技术方案的应用程序代码,并由处理器来控制执行。处理器用于执行存储器中存储的应用程序代码,从而实现本申请实施例提供的技术方案。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光 盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种播放视频的方法,应用于电子设备,其特征在于,所述方法包括:
    获取第一播放速度;
    获取第一信息,所述第一信息包括所述视频的图像信息和/或所述视频的语音信息;
    以第二播放速度播放所述视频,所述第二播放速度基于所述第一播放速度和所述第一信息得到。
  2. 根据权利要求1所述的方法,其特征在于,采用所述第一播放速度播放所述视频的第一时长,与采用所述第二播放速度播放所述视频的第二时长不同。
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取第一播放速度包括:
    根据获取的第一操作信息,显示第一界面,其中,所述第一界面包括至少两个选择项,一个所述选择项指示一个播放速度;
    获取第二操作信息;
    根据所述第二操作信息和所述至少两个选择项,确定所述第一播放速度。
  4. 根据权利要求1或2所述的方法,其特征在于,所述获取第一播放速度包括:
    根据获取的第一操作信息,显示第一界面,其中,所述第一界面包括第一速度;
    根据获取的第二操作信息,显示第二界面,其中,所述第二界面包括第二速度;
    根据所述第二速度,确定所述第一播放速度。
  5. 根据权利要求1或2所述的方法,其特征在于,所述获取第一播放速度包括:
    根据获取的第一操作信息,停止播放所述视频的前一视频,开始播放所述视频;
    根据所述前一视频的播放速度,确定所述第一播放速度。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述第二播放速度基于所述第一播放速度和所述第一信息得到包括:
    根据所述第一信息中的每一种信息确定对应的第三播放速度;
    根据所述第一播放速度和所有所述第三播放速度,确定所述第二播放速度。
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述第二播放速度基于所述第一播放速度和所述第一信息得到包括:
    根据所述第一信息中的每一种信息确定对应的第三播放速度;
    根据所述第一播放速度和部分所述第三播放速度,确定所述第二播放速度。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述第一信息还包括用户感兴趣的内容,所述用户感兴趣的内容包括以下信息中的至少一种:所述视频的人物描述信息、所述视频的内容描述信息、所述视频的内容结构信息。
  9. 根据权利要求8所述的方法,其特征在于,以所述第二播放速度播放所述视频中与所述用户感兴趣的内容相关的帧时的速度,不快于以所述第一播放速度播放所述相关的帧时的速度。
  10. 根据权利要求1-9中任一项所述的方法,其特征在于,所述第一信息还包括所述视频的播放模式信息,其中,所述播放模式信息与所述视频所对应的播放尺寸信息相关联。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述第一信息还包括所述视频的播放模式信息,其中,所述播放模式信息与所述视频的清晰度信息相关联。
  12. 根据权利要求1-11中任一项所述的方法,其特征在于,所述第一信息还包括所述电子设备的运动状态信息。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,所述第一信息还包括所述电子设备的噪声强度信息。
  14. 根据权利要求1-13中任一项所述的方法,其特征在于,所述第一信息还包括用户视点信息。
  15. 根据权利要求1-14中任一项所述的方法,其特征在于,所述第一信息还包括音频播放设备的连接状态信息。
  16. 根据权利要求1-15中任一项所述的方法,其特征在于,所述第一信息还包括网络状态信息。
  17. 一种视频播放装置,其特征在于,所述视频播放装置包括:
    第一获取单元,所述第一获取单元用于获取第一播放速度;
    第二获取单元,所述第二获取单元用于获取第一信息,所述第一信息包括所述视频的图像信息和/或所述视频的语音信息;
    播放单元,所述播放单元用于以第二播放速度播放所述视频,所述第二播放速度基于所述第一播放速度和所述第一信息得到。
  18. 根据权利要求17所述的装置,其特征在于,采用所述第一播放速度播放所述视频的第一时长,与采用所述第二播放速度播放所述视频的第二时长不同。
  19. 根据权利要求17或18所述的装置,其特征在于,所述第一获取单元具体用于:
    根据获取的第一操作信息,显示第一界面,其中,所述第一界面包括至少两个选择项,一个所述选择项指示一个播放速度;
    获取第二操作信息;
    根据所述第二操作信息和所述至少两个选择项,确定所述第一播放速度。
  20. 根据权利要求17或18所述的装置,其特征在于,所述第一获取单元具体用于:
    根据获取的第一操作信息,显示第一界面,其中,所述第一界面包括第一速度;
    根据获取的第二操作信息,显示第二界面,其中,所述第二界面包括第二速度;
    根据所述第二速度,确定所述第一播放速度。
  21. 根据权利要求17或18所述的装置,其特征在于,所述第一获取单元具体用于:
    根据获取的第一操作信息,停止播放所述视频的前一视频,开始播放所述视频;
    根据所述前一视频的播放速度,确定所述第一播放速度。
  22. 根据权利要求17-21中任一项所述的装置,其特征在于,还包括处理单元,用于:
    根据所述第一信息中的每一种信息确定对应的第三播放速度;
    根据所述第一播放速度和所有所述第三播放速度,确定所述第二播放速度。
  23. 根据权利要求17-21中任一项所述的装置,其特征在于,还包括处理单元,用于:
    根据所述第一信息中的每一种信息确定对应的第三播放速度;
    根据所述第一播放速度和部分所述第三播放速度,确定所述第二播放速度。
  24. 根据权利要求17-23中任一项所述的装置,其特征在于,所述第一信息还包括 用户感兴趣的内容,所述用户感兴趣的内容包括以下信息中的至少一种:所述视频的人物描述信息、所述视频的内容描述信息、所述视频的内容结构信息;所述播放单元播放所述视频中与所述用户感兴趣的内容相关的帧时的速度不快于以所述第一播放速度播放所述相关的帧时的速度。
  25. 一种电子设备,其特征在于,所述电子设备包括:处理器和存储器;
    所述存储器与所述处理器连接;
    所述存储器用于存储计算机指令,当所述处理器执行所述计算机指令时,所述电子设备执行如权利要求1至16中任一项所述的播放视频的方法。
  26. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算机上运行时,使得计算机执行权利要求1至16中任一项所述的播放视频的方法。
  27. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1至16中任一项所述的播放视频的方法。
PCT/CN2022/094784 2021-05-31 2022-05-24 一种播放视频的方法及装置 WO2022253053A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22815100.7A EP4329320A1 (en) 2021-05-31 2022-05-24 Method and apparatus for video playback
US18/521,881 US20240107092A1 (en) 2021-05-31 2023-11-28 Video playing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110604488.9A CN115484498A (zh) 2021-05-31 2021-05-31 一种播放视频的方法及装置
CN202110604488.9 2021-05-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/521,881 Continuation US20240107092A1 (en) 2021-05-31 2023-11-28 Video playing method and apparatus

Publications (1)

Publication Number Publication Date
WO2022253053A1 true WO2022253053A1 (zh) 2022-12-08

Family

ID=84322782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094784 WO2022253053A1 (zh) 2021-05-31 2022-05-24 一种播放视频的方法及装置

Country Status (4)

Country Link
US (1) US20240107092A1 (zh)
EP (1) EP4329320A1 (zh)
CN (1) CN115484498A (zh)
WO (1) WO2022253053A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106911961A (zh) * 2017-02-22 2017-06-30 北京小米移动软件有限公司 多媒体数据播放方法及装置
US20170270965A1 (en) * 2016-03-15 2017-09-21 Samsung Electronics Co., Ltd. Method and device for accelerated playback, transmission and storage of media files
CN110248241A (zh) * 2019-06-11 2019-09-17 Oppo广东移动通信有限公司 视频处理方法及相关装置
CN112822546A (zh) * 2020-12-30 2021-05-18 珠海极海半导体有限公司 基于内容感知的倍速播放方法、系统、存储介质和设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270965A1 (en) * 2016-03-15 2017-09-21 Samsung Electronics Co., Ltd. Method and device for accelerated playback, transmission and storage of media files
CN107193841A (zh) * 2016-03-15 2017-09-22 北京三星通信技术研究有限公司 媒体文件加速播放、传输及存储的方法和装置
CN106911961A (zh) * 2017-02-22 2017-06-30 北京小米移动软件有限公司 多媒体数据播放方法及装置
CN110248241A (zh) * 2019-06-11 2019-09-17 Oppo广东移动通信有限公司 视频处理方法及相关装置
CN112822546A (zh) * 2020-12-30 2021-05-18 珠海极海半导体有限公司 基于内容感知的倍速播放方法、系统、存储介质和设备

Also Published As

Publication number Publication date
EP4329320A1 (en) 2024-02-28
US20240107092A1 (en) 2024-03-28
CN115484498A (zh) 2022-12-16

Similar Documents

Publication Publication Date Title
CN114397979B (zh) 一种应用显示方法及电子设备
US20200236425A1 (en) Method and apparatus for filtering video
CN111738122B (zh) 图像处理的方法及相关装置
WO2021244295A1 (zh) 拍摄视频的方法和装置
CN111669515B (zh) 一种视频生成方法及相关装置
CN110377204B (zh) 一种生成用户头像的方法及电子设备
WO2021013132A1 (zh) 输入方法及电子设备
CN113364971A (zh) 图像处理方法和装置
US20220343648A1 (en) Image selection method and electronic device
CN111553846A (zh) 超分辨率处理方法及装置
CN113099146A (zh) 一种视频生成方法、装置及相关设备
CN116348917A (zh) 一种图像处理方法及装置
CN110442277A (zh) 显示预览窗口信息的方法及电子设备
WO2022057384A1 (zh) 拍摄方法和装置
WO2021180046A1 (zh) 图像留色方法及设备
WO2022143314A1 (zh) 一种对象注册方法及装置
WO2022253053A1 (zh) 一种播放视频的方法及装置
US20220014683A1 (en) System and method for ai enhanced shutter button user interface
CN115883958A (zh) 一种人像拍摄方法
CN112528760B (zh) 图像处理方法、装置、计算机设备及介质
WO2021190097A1 (zh) 图像处理方法和装置
CN116861066A (zh) 应用推荐方法和电子设备
CN115083401A (zh) 语音控制方法及装置
CN114527903A (zh) 一种按键映射方法、电子设备及系统
WO2022143230A1 (zh) 一种确定跟踪目标的方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815100

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022815100

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022815100

Country of ref document: EP

Effective date: 20231122

NENP Non-entry into the national phase

Ref country code: DE