WO2020020156A1 - 一种视频处理方法及装置、终端设备、服务器及存储介质 - Google Patents

一种视频处理方法及装置、终端设备、服务器及存储介质 Download PDF

Info

Publication number
WO2020020156A1
WO2020020156A1 PCT/CN2019/097292 CN2019097292W WO2020020156A1 WO 2020020156 A1 WO2020020156 A1 WO 2020020156A1 CN 2019097292 W CN2019097292 W CN 2019097292W WO 2020020156 A1 WO2020020156 A1 WO 2020020156A1
Authority
WO
WIPO (PCT)
Prior art keywords
video processing
video
package
target
processing package
Prior art date
Application number
PCT/CN2019/097292
Other languages
English (en)
French (fr)
Inventor
李雪
熊唯
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020020156A1 publication Critical patent/WO2020020156A1/zh
Priority to US17/010,549 priority Critical patent/US11854263B2/en
Priority to US18/493,730 priority patent/US20240054784A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Definitions

  • the present application relates to the field of information processing technologies, and in particular, to a video processing method and device, a terminal device, a server, and a storage medium.
  • the user can select video processing methods such as filters, beauty, special effects, background music and so on to process the captured video.
  • video processing methods such as filters, beauty, special effects, background music and so on.
  • each time a user selects a video processing method taking a filter as an example, it is necessary to first open the filter option and then select the desired one from a variety of filter effects.
  • the operation process is tedious and takes a long time.
  • the efficiency is lower when there are many types of video processing methods selected.
  • the current terminal device provides a low matching efficiency in video processing when shooting video, so how to improve the matching efficiency of video processing methods is a need.
  • thinking questions are possible.
  • the embodiments of the present application provide a video processing method and device, a terminal device, a server, and a storage medium, which are used to solve the technical problem of low matching efficiency of video processing modes in related technologies.
  • a video processing method includes:
  • the terminal device obtains scene description information of a target shooting scene
  • the terminal device matches a target video processing package corresponding to the target shooting scene according to the scenario description information, where the target video processing package includes at least one video processing mode for processing a video in a predetermined processing mode;
  • the terminal device processes the target video shot for the target shooting scene according to the target video processing package.
  • the terminal device can automatically match the corresponding target video processing package according to the scene description information, eliminating the need for manual selection by the user as in the related art, thereby improving the matching efficiency of video processing methods.
  • multiple video processing methods can be matched at one time, which can further improve matching efficiency.
  • the target video processing package is dynamically and correspondingly matched based on the scene description information, the matched target video processing package can be matched with the actual video content as much as possible, and the accuracy of the video processing can be improved to the greatest extent possible. Consistent with the actual needs of users.
  • the terminal device matching the target video processing package corresponding to the target shooting scenario according to the scenario description information includes:
  • the terminal device uses the word vector representation of the scene description information to obtain a video content feature variable
  • the terminal device inputs the above-mentioned video content characteristic variables into a pre-established package recommendation model and performs package matching to obtain at least one recommended video processing package that matches the above-mentioned scene description information;
  • the terminal device determines the target video processing package according to the at least one recommended video processing package.
  • the terminal device can match out a target video processing package through a package recommendation model established in advance, which can achieve high matching efficiency. And through the package recommendation model, multiple video processing packages can be recommended at one time, so that the terminal device can choose one of them as the final target video processing package according to its actual situation, thereby enhancing the applicability and universality of the solution.
  • the package recommendation model can be a model established by deep learning based on the video content of a large number of published videos, so by inputting scene description information indicating the actual situation of the shooting scene into the package recommendation model, the target video processing obtained by matching
  • the package can be as close as possible to the actual video content currently being shot, so as to maximize the accuracy and effectiveness of the target video effect processing, and try to meet the user's actual use needs.
  • the terminal device determining the target video processing package according to the at least one recommended video processing package includes:
  • the terminal device determines that the most frequently used video processing package among the at least one recommended video processing package is the target video processing package; or
  • the terminal device determines that the highest similarity between the at least one recommended video processing package and the priority video processing package is the target video processing package, wherein the priority video processing package is a video processing package matched according to user attribute information.
  • the terminal device can select a suitable package from multiple video processing packages recommended by the package recommendation model to process the target video according to the actual use scenario, which enhances the diversity of the solution and makes the The scope of the program is wider and the applicability is stronger.
  • a priority video processing package that matches the user attribute information can also be considered together, that is, the actual use demand of the user is used as a selection factor for selecting a target video processing package.
  • the determination of the target video package is improved, so that the determination result can be as close as possible to the actual preferences of the user.
  • the terminal device determining the target video processing package according to the at least one recommended video processing package includes:
  • the terminal device determines whether there is a recommended video processing package with a similarity between the at least one recommended video processing package and the priority video processing package that is greater than or equal to a predetermined similarity, where the priority video processing package is matched according to user attribute information.
  • Video processing package ;
  • the terminal device determines the recommended video processing package with the highest similarity as the above target video processing package
  • the terminal device determines the priority video processing package as the target video processing package.
  • the user's own preference setting is taken as an important selection condition, so that the final target video processing solution can be as close as possible to the user's actual use requirements.
  • the terminal device matching the target video processing package corresponding to the target shooting scenario according to the scenario description information includes:
  • the terminal device inputs the above scene description information into a preset set of scenes and corresponding sets of video processing packages to perform matching search, so as to obtain a video processing package with the highest matching degree with the above scene description information;
  • the terminal device determines the video processing package with the highest matching degree as the target video processing package.
  • the terminal device can automatically match the appropriate video processing package for different video shooting scenarios through the preset corresponding relationship, which can meet the actual shooting needs of the user, and can be personalized and customized by the user in order to It can be modified and updated at any time, so it can meet the actual shooting needs of users to a large extent.
  • the terminal device matching the target video processing package corresponding to the target shooting scenario according to the scenario description information includes:
  • the terminal device determines a target video processing package before and after the subject changes
  • the terminal device processes the target video shot for the target shooting scene according to the target video processing package, including:
  • the terminal device processes the captured video with the corresponding target video processing package before and after the change of the shooting object.
  • the terminal device can perform corresponding processing with different video processing packages before and after the scene change, which can improve the effectiveness of video processing.
  • the terminal device obtains scene description information of a target shooting scene, including:
  • the terminal device obtains a preview video of the target shooting scene, or the terminal device obtains the target video actually shot by the target shooting scene;
  • the terminal device performs image recognition on the video sequence frame of the preview video or the video sequence frame of the target video to obtain key feature information of each frame, where the key feature information is the largest area and / or visual presentation in each frame Feature information of the subject at the forefront position;
  • the terminal device determines the foregoing scene description information according to the key feature information of all frames.
  • a video processing method includes:
  • the modeling server receives scene description information of the target shooting scene sent by the terminal device;
  • the modeling server uses the word vector representation of the scene description information to obtain the video content feature variables
  • the modeling server inputs the above-mentioned video content feature variables into a pre-established package recommendation model and performs package matching to obtain a target video processing package that matches the target shooting scene, wherein the target video processing package includes processing the video in a predetermined processing mode.
  • At least one video processing method At least one video processing method
  • the modeling server sends the target video processing package to the terminal device or the cloud server, so that the terminal device or the cloud server processes the target video shot for the target shooting scene according to the target video processing package.
  • the modeling server inputs the above-mentioned video content feature variables into a pre-established package recommendation model for package matching to obtain a target video processing package that matches the target shooting scene, including:
  • the modeling server analyzes the feature variables of the video content to determine the environment and / or object category and quantity corresponding to the target shooting scene;
  • the modeling server matches a set of video processing methods for the determined environment and / or different types of objects;
  • the modeling server uses the video processing package composed of the matched multiple video processing modes as the target video processing package.
  • the modeling server can match a set of corresponding video processing methods for different objects, so that different types of objects can be targeted for differential processing to maximize the diversity of video processing.
  • a set of video processing methods determined by various types of objects is also recommended by the trained package recommendation model, so it can also meet the needs of the public as much as possible to ensure the universality of the solution.
  • the above method further includes:
  • the modeling server obtains user attribute information and / or historical viewing information of the user corresponding to the terminal device;
  • the modeling server performs word vector representation on the user attribute information and / or the historical viewing information to obtain auxiliary feature variables;
  • the modeling server inputs the above-mentioned video content characteristic variables into a pre-established package recommendation model for package matching to obtain a target video processing package that matches the above scene description information, including:
  • the modeling server inputs the video content feature variables and the auxiliary feature variables into the package recommendation model and performs package matching to obtain the target video processing package.
  • the modeling server uses the user attribute information and historical viewing information as auxiliary recommendation factors. It also ensures that the actual factors of the user are taken into account as much as possible in the recommendation process, thereby achieving accurate recommendation of the package.
  • the above package recommendation model is established as follows:
  • the modeling server selects multiple videos from the published videos as video training samples
  • the modeling server marks the shooting environment and / or shooting object of each video training sample based on the image recognition result of the video sequence frame included in each video training sample to obtain the video content label of each video training sample;
  • the modeling server extracts the video processing package used for each video training sample
  • the modeling server uses the video content label of each video training sample and the corresponding video processing package as training features to input a preset network model for training and learning to obtain the package recommended model.
  • the modeling server inputs the video content label of each video training sample and the corresponding video processing package into a preset network model for training and learning to obtain the package recommended model, including:
  • the modeling server determines its recommended integration value according to the historical interaction data of each video training sample, wherein the historical interaction data of the video training sample is used to indicate the interaction between the user and the video training sample;
  • the modeling server associates the recommended score value of each video training sample with the corresponding video processing package according to a predetermined association rule, and performs training and learning to obtain the package recommended model.
  • the modeling server associates the recommended score value of each video training sample with the corresponding video content label and performs training and learning to obtain the above package recommendation model according to a predetermined association rule, including:
  • the modeling server follows the principle of higher training weight of the corresponding video processing package according to the higher recommendation point value, and performs correlation training on each video content label and the corresponding video processing package in the preset network model to obtain the above package recommendation. Model; or
  • the modeling server determines the target video content tag with the recommended integral value greater than or equal to the predetermined integral value, and then according to the principle that the higher the recommended integral value is, the larger the training weight of the corresponding video processing package is.
  • the content label and the corresponding video processing package are subjected to associated training to obtain the package recommendation model described above.
  • An embodiment of the present invention provides a video processing method.
  • the method includes:
  • the cloud server receives the scene description information of the target shooting scene and the target video shot for the target shooting scene sent by the terminal device;
  • the cloud server receives the target video processing package sent by the terminal device or the modeling server, where the target video processing package is a video processing package corresponding to the target shooting scenario and is matched according to the scene description information.
  • the target video processing package includes the following: At least one video processing mode in which the predetermined processing mode processes the video;
  • the cloud server processes the target video according to the target video processing package.
  • An embodiment of the present invention further provides a terminal device.
  • the terminal device includes:
  • An obtaining module configured to obtain scene description information of a target shooting scene
  • a matching module configured to match a target video processing package corresponding to the target shooting scene according to the scenario description information, wherein the target video processing package includes at least one video processing mode for processing a video in a predetermined processing mode;
  • the processing module is configured to process the target video shot for the target shooting scene according to the target video processing package.
  • the above matching module is set as:
  • the above matching module is set as:
  • the similarity between the at least one recommended video processing package and the priority video processing package is the above-mentioned target video processing package, where the priority video processing package is a video processing package matched according to user attribute information.
  • the above matching module is set as:
  • the above-mentioned priority video processing package is determined as the above-mentioned target video processing package.
  • the above matching module is set as:
  • the video matching package with the highest matching degree is determined as the target video processing package.
  • the above matching module is set as:
  • the above processing modules are set to:
  • the captured video is processed with a corresponding target video processing package before and after the subject changes.
  • the above obtaining module is set as:
  • the above scene description information is determined according to the key feature information of all frames.
  • An embodiment of the present invention further provides a server.
  • the server includes:
  • a receiving module configured to receive scene description information of a target shooting scene sent by a terminal device
  • the first obtaining module is configured to perform word vector representation of the scene description information to obtain a video content feature variable
  • the matching module is configured to input the above-mentioned video content characteristic variables into a pre-established package recommendation model and perform package matching to obtain a target video processing package that matches the target shooting scene, wherein the target video processing package includes video processing in a predetermined processing mode. At least one video processing mode for processing;
  • the sending module is configured to send the target video processing package to the terminal device or the cloud server, so that the terminal device or the cloud server processes the target video captured for the target shooting scene according to the target video processing package.
  • the above matching module is set as:
  • the video processing package composed of the matched multiple video processing methods is used as the target video processing package.
  • the server further includes a second obtaining module and a third obtaining module; wherein:
  • the second obtaining module is configured to obtain user attribute information and / or historical viewing information of a user corresponding to the terminal device;
  • the third obtaining module is configured to perform word vector representation on the user attribute information and / or the historical viewing information to obtain auxiliary feature variables;
  • the matching module is configured to input the video content feature variables and the auxiliary feature variables into the package recommendation model for package matching to obtain the target video processing package.
  • the server further includes a model building module, which is set to:
  • the video content label of each video training sample and the corresponding video processing package are used as training features to input a preset network model for training and learning to obtain the package recommended model.
  • the above model building module is set as:
  • the recommended integration value is determined according to the historical interaction data of each video training sample, wherein the historical interaction data of the video training sample is used to indicate the interaction between the user and the video training sample;
  • the recommendation score value of each video training sample is associated with the corresponding video processing package, and training is performed to obtain the package recommendation model.
  • the above model building module is set as:
  • the higher the recommendation score value is, the larger the training weight of the corresponding video processing package is, and each video content label and the corresponding video processing package are associatedly trained in the preset network model to obtain the package recommendation model;
  • a server includes:
  • a first receiving module configured to receive scene description information of a target shooting scene and a target video shot for the target shooting scene sent by the terminal device;
  • the second receiving module is configured to receive the target video processing package sent by the terminal device or the modeling server, where the target video processing package is a video processing package corresponding to the target shooting scenario and is matched according to the scenario description information.
  • the video processing package includes at least one video processing method for processing a video in a predetermined processing mode;
  • the processing module is configured to process the target video according to the target video processing package.
  • a video processing device includes:
  • Memory for storing program instructions
  • a storage medium stores computer-executable instructions, and the computer-executable instructions are used to cause a computer to perform the steps included in any of the foregoing methods of the first aspect, or to perform the steps as described in the second aspect The steps included in any one of the foregoing methods, or the steps included in performing any one of the foregoing methods in the third aspect.
  • a video processing device includes at least one processor and a storage medium.
  • the instructions included in the storage medium are executed by the at least one processor, any one of the foregoing aspects of the first aspect may be executed.
  • the method includes the steps, or performs the steps included in any of the foregoing methods in the second aspect, or performs the steps included in any of the foregoing methods in the third aspect.
  • a chip system includes a processor and may further include a memory for implementing any of the foregoing methods in the first aspect, or implementing any of the foregoing methods in the second aspect, or implementing the first aspect. Any of the three methods described above.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of selecting a filter in the related art
  • FIG. 2A is a schematic diagram of an application scenario of a video processing method according to an embodiment of the present application.
  • FIG. 2B is a schematic diagram of another application scenario of a video processing method according to an embodiment of the present application.
  • FIG. 2C is a schematic diagram of another application scenario of a video processing method according to an embodiment of the present application.
  • FIG. 3 is a flowchart of a video processing method according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a user shooting a video of a target shooting scene using a terminal device according to an embodiment of the present application
  • FIG. 5 is another flowchart of a video processing method according to an embodiment of the present application.
  • FIG. 6 is another flowchart of a video processing method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a network architecture of an SSD in an embodiment of the present application.
  • FIG. 8 is another schematic diagram of a video processing method according to an embodiment of the present application.
  • FIG. 9 is a structural block diagram of a terminal device according to an embodiment of the present application.
  • FIG. 10 is a structural block diagram of a server in an embodiment of the present application.
  • FIG. 11 is another structural block diagram of a server in an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application.
  • FIG. 13 is another schematic structural diagram of a video processing apparatus according to an embodiment of the present application.
  • FIG. 14 is another schematic structural diagram of a video processing apparatus according to an embodiment of the present application.
  • a plurality may mean at least two, for example, two, three, or more, which are not limited in the embodiments of the present application.
  • Short videos that is, short-form videos, are a form of Internet content transmission.
  • videos that are transmitted on new Internet media for less than 5 minutes (for example, ranging from seconds to minutes) are suitable for mobile and High-frequency push video content for short-term leisure viewing.
  • the content of the short video covers topics such as skill sharing, cute baby records, beauty and beauty, funny humor, sports slimming, fashion trends, cute pet records, social hot spots, and food recommendations.
  • short video platforms can randomly recommend some popular short videos, or they can make targeted recommendations based on user-defined viewing preferences, such as a mother user
  • the customized viewing preferences are short videos of Mengwa's records and beauty and beauty categories, so the platform will recommend as many short videos of these two topics to the user as possible when recommending.
  • users may choose some video processing methods to perform special processing on the video before shooting the video, so as to obtain a satisfactory video effect. For example, when watching a short video of a baby sent by someone else, the user finds that the action and music of the video are very interesting, and they want to imitate this effect to publish a short video on their own, so they open the short video app and aim at their baby to shoot, Before clicking the shooting button, the user can select the required filters, special effects, beauty, music and other effects from the operation selection area shown in FIG. 1. Take the filter selection as an example. After the user clicks the filter button (as shown in the left image in Figure 1), more filters will pop up in the operation selection area (as shown in the right image in Figure 1) for the user to choose.
  • the filter button as shown in the left image in Figure 1
  • more filters will pop up in the operation selection area (as shown in the right image in Figure 1) for the user to choose.
  • the default sort filter is randomly arranged, such as the short video APP system default sort, the user can choose the recommended filter area or the default sort filter In the area, select a filter effect that you currently need to complete the filter selection.
  • the selection of special effects, beauty, music, and other effects are similar to the filter selection. It can be seen that the entire selection process requires more user steps, which is more complicated, especially when you need to select filters, special effects, beauty, Music and other effects take longer.
  • the applicant further explored the characteristics of the existing short videos, and found that most short videos are motivated by imitation, the so-called imitation It is to imitate the scenes, characters and actions in the published video to achieve similar video effects, that is, the content of the video to be shot and the video being imitated are roughly the same, and the same content of the video means that the scene was shot
  • the environment, the type and number of objects are roughly the same. In other words, the environment of the shooting scene and the subjects in the shooting scene are roughly the same, and the actions of the subjects are also roughly the same.
  • the so-called subjects are people, An animal or other still life, such as a baby, or a baby and a mother, or a kitten, or a mobile phone, or a plant, etc.
  • the applicant considered that according to the shooting scene Scene description information to match the corresponding video processing package, because shooting scene information can It is enough to describe the general environment of the shooting scene and the relevant situation of the shooting subject. Therefore, the automatic matching of the video processing package through the scene description information can make the matching result as close as possible to the current actual video content. To a certain extent, it meets the actual shooting needs of users, so that the final processed video can meet the requirements of users.
  • the embodiments of the present application provide a video processing method to improve the matching efficiency and accuracy of the video processing methods.
  • this method it is necessary to first obtain the scene description information of the target shooting scene, and then automatically match the target video processing package corresponding to the target shooting scene based on the scene description information, so that the manual selection operation of the user is omitted, and the Improve the matching efficiency to a certain extent, and because the matching is based on the scene description information, the targeting and accuracy of the matching can be improved to a certain extent, and the matching result is consistent with the current actual video as much as possible, thereby satisfying the user. Actual needs.
  • the target video processing package includes at least one video processing method for processing the video in a predetermined processing mode.
  • the target video processing package includes a filter of "small forest", a degree of beauty of 3, and a degree of face reduction with large eyes Level 2, special effects are "bubbles” and music are "good dolls” of multiple video processing methods, and then obtain a target video processing package to process the target video captured for the target shooting scene, and then obtain the processed
  • the target video uses filters as an example. Before the target video is processed by the target video processing package, the target video has no filter effect, and after processing, the target video has a filter effect.
  • the captured target video can implement various video effects according to the target video processing package automatically recommended by the terminal device.
  • FIG. 2A is an application scenario to which the video processing method in the embodiment of the present application can be applied, and the application scenario includes a terminal device 21 and a server 22.
  • An APP capable of shooting video can be installed in the terminal device 21.
  • the APP with a video shooting function is referred to as a video APP in the embodiment of the present application, that is, a video app installed and running on the terminal device 21
  • server 22 refers to the server corresponding to the client of the video app, such as an application server.
  • the application server may install and update the video app. Provide the corresponding installation package and update package.
  • the client of the video app can interact with the corresponding application server.
  • the user can use the terminal device 21 to shoot the target shooting scene by using his own camera (front or rear) to obtain scene description information of the target shooting scene, and then according to the obtained scene
  • the description information matches a target video processing package corresponding to the target shooting scene.
  • the matching process can be performed independently by the client of the video app in the terminal device 21, or the terminal device 21 can report the obtained scene description information to the server 22 to match the target video processing package through the server 22, that is, it can It is matched by the server 22 (application server).
  • the terminal device 21 may also shoot the target shooting scene to obtain the target video, and finally, the target video processing package is processed with the matched target video processing package.
  • the target video after processing is obtained, and the process of processing the target video by using the target video processing package may be performed by the terminal device 21 or may be performed by the server 22. If the terminal device 21 is performed, the target video is processed.
  • the terminal device 21 can send the processed target video to the server 22 via the client of the video APP, and finally the server 22 publishes the processed target video. Before the release, the server 22 can also perform the video processing. Review, if it involves video content that is not conducive to network transmission, distribution can be prohibited.
  • FIG. 2B is another application scenario to which the video processing method in the embodiment of the present application can be applied.
  • the application scenario includes a terminal device 21, a server 22, and a server 23.
  • the terminal device 21 and the server 22 are the same as those in FIG. 2A.
  • it may be a cloud server, which is used to process the target video according to the target video processing package.
  • the cloud server and the server 22 may be one server, or may be as shown in FIG. 2B Are separate servers.
  • server 23 and server 22 are different servers, after server 23 processes the target video, the processed target video can be sent to server 22, which has been processed by server 22 (that is, the application server). Review and publish, or you can publish directly by server 22 itself.
  • server 22 and server 23 can pre-establish related protocols that allow server 23 to publish videos to avoid server 23 illegal release. Adverse effects.
  • FIG. 2C is another application scenario to which the video processing method in the embodiment of the present application can be applied.
  • the application scenario includes a terminal device 21, a server 22, a server 23, and a server 24, among which the terminal device 21, the server 22, and the server 23 has been described in the foregoing.
  • the server 24 it refers to a modeling server for establishing a package recommendation model, that is, the server 24 can establish a package recommendation model, and can send the established package recommendation model to the terminal device 21.
  • the target video processing package After receiving the scene description information sent by the terminal device 21, the target video processing package is directly matched based on the scene description information and the package recommendation model.
  • the server 24 and the terminal device 21 or different servers can selectively establish a communication connection.
  • Figure C Take the server 24 and the terminal device 21 and the server 22 and the server 23 to establish a communication connection as an example for illustration.
  • the description information of the target shooting scene and the target video obtained by shooting the target shooting scene are executed by the terminal device 21.
  • the operation of matching the target video processing package according to the scene description information can be performed by the terminal device 21 or the server 22 (application server ) Or server 23 (cloud server) or server 24 (modeling server).
  • the operation of processing the target video according to the target video processing package can be performed by the terminal device 21 or server 22 (application server) or server 23 (cloud server).
  • the server 22 (application server), server 23 (cloud server), and server 24 (modeling server) can be three servers that exist separately, or any two or three of them can be Deployed as a server.
  • the aforementioned terminal device 21 may be a mobile phone, a tablet computer, a personal digital assistant (PDA), a notebook computer, a vehicle-mounted device, a smart wearable device (such as a smart watch and a smart bracelet), a personal computer, etc. What kind of device can be run in this device, that is, the client that can run the video app.
  • the aforementioned server 22, server 23, and server 24 may all be personal computers, large and medium-sized computers, computer clusters, and so on.
  • the embodiments of the present application provide method operation steps as shown in the following embodiments or drawings, based on conventional or no creative labor, the method may include more or fewer operation steps. Among the steps in which no necessary causality exists logically, the execution order of these steps is not limited to the execution order provided by the embodiments of the present application.
  • the method may be executed sequentially or in parallel (for example, an application environment of a parallel processor or a multi-threaded processing) during the actual vehicle driving adjustment process or when the device is executed in accordance with the method shown in the embodiment or the accompanying drawings.
  • FIG. 3 Please refer to FIG. 3 for a flowchart of a video processing method provided by an embodiment of the present application, and a flow of the method is described as follows.
  • Step 31 The terminal device obtains scene description information of the target shooting scene.
  • the shooting scene refers to the scene when shooting a video, such as shooting a baby who is learning to walk in the living room. Then the picture composed of the living room environment and all the objects (such as the baby and the sofa) in the living room can be understood
  • a shooting scene can be understood as an environment targeted when shooting a video and a set of all shooting objects included in the environment.
  • the target shooting scene may be a name that refers to a specific scene, for example, the shooting scene where the final video shooting is to be called the target shooting scene.
  • the shooting scene may change. For example, for a 15-second short video, the first 8 seconds are taken by a baby who is learning to walk in the living room, and the next 7 seconds are taken by a mother in the kitchen who is holding the baby to learn
  • the walking picture can be determined in practice based on whether the shooting picture has changed to a predetermined degree. If the shooting scene changes from the baby who is learning to walk in the living room to the mother in the kitchen. The baby learns to walk. As the background (from the living room to the kitchen) and the subject (from the baby to the baby and the mother) have changed substantially, it can be considered that the shooting scene has changed.
  • the baby learning to walk while holding one side of the edge of the sofa changed to the baby learning to walk while holding the other side of the sofa. Since there is only a small change in the environment, it can be considered that the scene has not changed.
  • the scene description information refers to information used to describe the relevant situation of the shooting scene.
  • the scene description information refers to describing the environment of the shooting scene and / or the characteristics of the shooting object in the shooting scene.
  • the feature description information may be referred to as a shooting environment.
  • the feature description information of the shooting environment may include, for example, shooting time, shooting geographic location, device information of a shooting device used when shooting (for example, a mobile phone of a ** brand), and shooting parameter information, which can be used to describe All information about the current shooting environment, and the feature description information of the subject is all attribute information that can describe the current actual state of the subject, such as the type of the subject, height, skin color, and hair color (for example, a white Cats), expressions, actions, etc. can objectively describe all attribute information of the subject.
  • the scene description information of the shooting scene may include the light intensity of the living room, the shooting time, the approximate color of the background wall of the living room (such as a white wall), the approximate shape and approximate Size (such as a rectangular living room that is approximately 3 meters long and 2 meters wide), the types of items included in the living room and the main characteristics of each item (such as a blue sofa and a white coffee table), and the approximate height of the baby (Such as 90 cm) and skin color, the approximate movements of the baby (such as standing, sitting, or reclining), the baby's hairstyle (such as a bald head, short hair, or two pigtails), the style of clothes the baby wears, and the baby's hand Whether to hold things, etc.
  • the approximate color of the background wall of the living room such as a white wall
  • the approximate shape and approximate Size such as a rectangular living room that is approximately 3 meters long and 2 meters wide
  • the types of items included in the living room and the main characteristics of each item such as a blue sofa and a white coffee table
  • the scene description information of the shooting scene may include the ambient light intensity, the current weather conditions (such as snow, rain, or sufficient sunlight), shooting time, shooting Location (for example, a certain scenic area), a large proportion of the shooting object (for example, a forest, a waterfall, or a flowing river) in the shooting scene, the approximate shape and color of the shooting object, and so on.
  • the scene description information of the shooting scene may include the shape, color, and number of layers of the cake.
  • the scene description information of the shooting scene may include the light intensity and sound intensity of the environment, the shooting time, the shooting location, the music style of the song being sung, and the speaking speed of the singer ,
  • the scene description information can not only determine the environment of the shooting scene, but also determine all of the shooting scenes.
  • the shooting object can be dynamic or static.
  • Dynamic shooting objects include, for example, people and animals, such as babies, mothers, and kittens
  • static shooting objects include, for example, mobile phones or pots of green plants.
  • the description information can know the approximate environment of the target shooting scene and the composition of the shooting subject.
  • the different kinds of feature description information included in the scene description information in the embodiments of the present application can be obtained in different ways.
  • the different kinds of feature description information referred to here include at least the feature description information of the shooting environment and the feature description of the subject. Information, for ease of understanding, the following examples.
  • the shooting device can obtain the shooting time and shooting location in real time, for example, it was shot in a certain scenic area at 16:32 on June 6, 2018.
  • the shooting time and shooting geographical location can give a general understanding of the situation of the current target shooting scene in time and space.
  • the shooting time and shooting location can be uploaded to the background or the cloud, and then the current actual weather conditions can be determined through a network search.
  • the determined weather is "sunny and the temperature is 28 ° C to 33 °", that is,
  • the relevant feature description information of the target shooting scene can be obtained online through some objective information that the shooting device itself can detect and combined with the network search; or, it can also be directly released in the system based on the shooting time and shooting geographic location (Including reviewing and publishing) videos to find matches, which can match the shooting time (for example, within 10 minutes) and the geographic location (for example, the distance between shooting geographic locations is 2 kilometers apart)
  • the weather corresponding to the video determines the current actual weather conditions, that is, the current actual weather conditions can be determined by the video platform system itself, and the sharing mechanism of video data public sharing can be convenient for multiple users. Direct interaction.
  • the way to determine the weather can also be obtained directly without using a network search method.
  • the shooting device can detect the current temperature, humidity, and light intensity through the built-in transmitter, and then determine these parameters approximately. Current actual weather conditions.
  • the shooting device taking the shooting device as a mobile phone as an example, it is mainly considered that the shooting capabilities of different brands of different models of mobile phones may be different, and that each user is using the shooting device
  • the shooting parameters set during shooting may also be different.
  • the scene description information of different target shooting scenes may also be different. Taking these differences into account can make the determination of the scene description information more Accurate, so as to facilitate subsequent matching when determining the target video processing package.
  • the subject is actually present in the target shooting scene, so the feature description information of the subject is objective and real information.
  • the feature description information of the subject is objective and real information.
  • it can be connected with the preview video of the target shooting scene or the videos of the official shooting video.
  • the frame performs image recognition and image feature extraction, and uses image processing to obtain the feature description information of each shooting object.
  • a preview video of the target shooting scene is obtained through a camera of the terminal device, and then scene description information is obtained by previewing the video.
  • the scene description information may also be obtained according to the target video obtained in real time during the formal shooting of the video.
  • the video sequence frame of the preview video or the video sequence frame of the target video can be image-recognized to obtain the key feature information of each frame of the image, and finally the scene description information can be determined according to the key feature information of all the frames. That is, the scene description information can be obtained by previewing the video before the video is officially shot, or the scene description information can also be obtained by the video that has been shot after or at the same time as the video.
  • the determination time of the scene description information may not be particularly required.
  • the target video processing package can be determined according to the scene description information in advance, and then during the video shooting process, for each frame or continuous multi-pin images obtained
  • the processed video can naturally be obtained after the video shooting is completed, so that the shooting and processing of the video are performed as synchronously as possible, thereby ensuring the timeliness of video processing and improving the video Processing efficiency.
  • the scene description information is determined based on the actual video captured. Then, the obtained scene description information can reflect the current actual scene to the greatest extent, and the accuracy is higher, so as to avoid the occurrence of the scene as much as possible.
  • the situation where the scene description information is not updated due to the change can further improve the accuracy of the target video processing package determination, so that the final determined target video processing package is as close as possible to the current actual shooting scene to improve video processing Effectiveness and accuracy to meet the actual needs of users.
  • two alternative methods are provided, which can increase the diversity of the solutions in the embodiments of the present application, and thus enable the solutions in the embodiments of the present application to be applied to different Application scenarios, thereby improving the universality of the solution.
  • the key feature information in the embodiment of the present application is the feature information of the photographic subject that occupies the largest area in each frame of the video image and / or is visually presented at the forefront position.
  • the user is using a mobile phone as The opposite child shoots a video
  • the target shooting scene where the child is located also includes a cat located on the right rear side of the child.
  • the child occupies the largest area and is also located at the forefront of the scene example shooting user. Therefore, at this time, the child can be regarded as a key subject in the target shooting scene, or the main subject is riding.
  • the characteristic information of the child can be determined as the key feature information of the target shooting scene.
  • Cats are far away from the camera and are not considered. According to the characteristic information of the child, it can be determined that there is a child in the target shooting scene, and the child is in an upright posture, so the key characteristic information of the target shooting scene can be determined as "a standing child", and finally, If the key feature information is directly used as scene description information, then through such scene description information, it can be known that there is a child in an upright state in the target shooting scene.
  • Step 32 The terminal device matches the target video processing package corresponding to the target shooting scene according to the obtained scene description information, wherein the target video processing package includes at least one video processing mode for processing the video in a predetermined processing mode.
  • the terminal device may process the target video processing package corresponding to the target shooting scene according to a preset package recommendation strategy.
  • a simple understanding of the target video processing package in the embodiment of the present application is a set of at least one video processing method, by which the video can be processed in a predetermined processing mode, and a corresponding Video processing effects.
  • the target video processing package includes a variety of filters: "small forest", beauty level 3, big eyes and face reduction 2, special effects "bubbles", and music "good dolls”.
  • Video processing method and then obtain the target video processing package to process the target video captured for the target shooting scene, and then obtain the processed target video.
  • the target video processing package is used to process the target video. Previously, the target video did not have any filter effect.
  • the target video has a filter effect, which can make the target video obtained according to the target video processing package automatically recommended by the terminal device.
  • Video effects are examples of the target video processing package in the embodiment of the present application.
  • a target video package is determined in combination with a pre-established package recommendation model.
  • the specific process is as shown in steps 511 to 518 in FIG. 5, which will be described in detail below.
  • Step 511 The terminal device determines a package recommendation model established in advance.
  • the pre-established package recommendation model refers to a data model for recommending a video processing package established in advance.
  • the package recommendation model may be a model based on deep learning based on the video content of a large number of published videos.
  • the package recommendation model may be The application server corresponding to the video app is established, or it can also be established by a special modeling server. No matter which way is established, the established package recommendation model can be embedded in the video app client as an embedded function of the video app. Therefore, after the client of the video APP is installed in the terminal device, the terminal device can obtain the package recommendation model.
  • the package recommendation model in the embodiment of the present application may be a model obtained by deep learning using a multi-tasking network model for multiple videos that have been released. Therefore, when determining a target video processing package according to the scenario description information and the package recommendation model, Try to refer to the video processing package used by the published video as much as possible, so that the target video processing package determined can be as close to the public's usage habits and hobbies as possible, so that the processed target video can be as popular as possible.
  • the package recommendation model may be established according to the method shown in FIG. 6.
  • the package recommendation model may be established by a terminal device, a modeling server, or a cloud server.
  • Step 61 Select multiple videos from the published videos as video training samples.
  • the video source is not particularly limited.
  • all video training samples are derived from the same video APP, or they may be from different video APPs.
  • the video training sample may include a large number of published videos, such as 100,000 short videos.
  • the package recommendation model as recommended as possible for different video content
  • videos in the most recent time period for example, within a week
  • Step 62 Perform image recognition on the video sequence frames included in each video training sample to obtain the image recognition result of each video training sample.
  • the image recognition mentioned here is mainly to identify the basic features included in each frame of the image, such as color features and shape features, and so on.
  • Step 63 Based on the image recognition result of each video training sample, mark the shooting environment and / or shooting object of each video training sample to obtain a video content label of each video training sample.
  • the similarity and deviation between the color features and shape features and other features between each frame of image can be determined by specific data processing methods to determine the The shooting environment and / or the shooting object, for example, for one of the video training samples, it is determined that the shooting environment of the video training sample is a lawn under the sun weather, and the determined shooting object is a child and a middle-aged woman.
  • the shooting environment and / or shooting object are marked, and the video content tags of each video training sample are obtained according to the tags.
  • the obtained video content tags are "sunny lawn" and "a child A middle-aged woman. "
  • Step 64 Extract a video processing package used for each video training sample.
  • each video training sample is a video that the user has posted on the network, the user generally processes it before publishing, that is, these video training samples are added to the original video. Video effects, so each video training sample can be analyzed to determine the video processing method used for each video training sample, and then all video processing methods used for each video training sample The set is determined as the video processing package corresponding to the video training sample.
  • the video content label and the corresponding video processing package of each video training sample can be used as training features to input a preset network model for training and learning, and
  • the package recommendation model in the embodiment of the present application is obtained according to the final training and learning results.
  • existing learning models can be used to build models, such as logistic regression methods, decision tree methods, or other preset network models to train and learn video content labels and corresponding video processing packages. This preset network model is not limited in the embodiment of the present application.
  • Step 65 Determine the recommended integration value of the video training sample according to the historical interaction data of each video training sample.
  • Step 66 According to a predetermined association rule, the recommendation score value of each video training sample is associated with the corresponding video processing package, and then training is learned to obtain a package recommendation model.
  • the historical interaction data of the video can be used to represent the popularity of the video
  • the historical interaction data can be used to indicate the interaction between all users and the video training sample, such as the user's viewing behavior and social behavior.
  • the video training sample it can be the video's viewing data and social data.
  • the video viewing data may include the total number of users viewed, the total number of times viewed, the total length of each time viewed, and all other data related to the user ’s viewing.
  • the social data of the video may include the number of likes and retweets The number of times, the number of comments, the number of downloads, and all other data related to the user's social behavior.
  • the recommended integral value of each video training sample can be calculated.
  • the recommended integral value is equivalent to the popularity of the video training sample.
  • the higher the recommended integral value the greater the popularity, indicating that the user likes it.
  • the trained recommended package model can meet the needs of the public as much as possible, increasing the applicability and universality of the model.
  • the following methods can be used: 1) For viewing data, assuming that the viewing time is greater than 10 seconds + 1 minute, one viewing time + 1 point, and the same user has watched more than a predetermined number of times (for example, 3 times) +1 points for social media; 2) For social data, let ’s say +1 points for likes, +1 points for retweets, +1 points for downloads, +1 points for reviews, and the number of words for a review exceeds a predetermined number of words (for example, 30 words) + 2 points. Then, the viewing data and the social data are respectively added and added to obtain the final recommended score.
  • a predetermined number of times for example, 3 times
  • step 66 in the specific implementation process, any one of the following two methods may be adopted according to actual usage requirements.
  • Step 661 According to the principle that the higher the recommendation score value is, the larger the training weight of the corresponding video processing package is, and each video content label and the corresponding video processing package are associatedly trained in a preset network model to obtain a package recommendation model.
  • a corresponding recommended integral value can be obtained, and then according to the training principle with a higher recommended integral value, the corresponding training weight is greater, and each video content label is set in a preset network model.
  • Associate training with the corresponding video processing package The larger the recommended score value, the more popular it is. Therefore, if it is trained with a larger training weight, it can be more prominent.
  • the video processing package corresponding to the video training sample will also be added to the recommendation pool of the package recommendation model, so that in the subsequent recommendation process using the package recommendation model, it can be recommended to the user as preferentially as possible.
  • all video training samples are input into a preset network model for training, which can make the samples as comprehensive as possible, thereby improving the universality of the package recommendation model.
  • Step 662 Determine a target video content tag whose recommended integration value is greater than or equal to a predetermined integration value.
  • Step 663 According to the principle that the higher the recommendation score value is, the larger the training weight of the corresponding video processing package is, and each target video content label and the corresponding video processing package are associatedly trained in a preset network model to obtain a package recommendation model. .
  • the method of steps 662 to 663 is equivalent to filtering out some of the less popular video training samples with a predetermined integral value, and the filtered video training samples are no longer input into the preset network model for training, because it is recommended
  • a lower score indicates that the corresponding video training samples are more difficult to get the recognition and like of most users and belong to a very small sample, so even if these are used as samples for training, the package recommendation model obtained by training is also very It is difficult to recommend the corresponding video processing package to the user, so in order to reduce the amount of data for model training and learning, and also to improve the effectiveness of the package recommendation model, the filtering process can be performed in step 662 first.
  • Step 512 The terminal device represents the scene description information with a word vector to obtain a video content feature variable.
  • the scene description learning can also determine the video content through deep learning, and then obtain the video content feature variables.
  • the video sequence frames corresponding to the scene description information (such as the aforementioned video sequence frames of the preview video or the video sequence frames of the target video) can be input to the detection network model, and the detection network model can automatically identify the Objects and their positions, classify and mark each object, and finally tag the scene description information based on the recognition results, that is, the terminal device can represent the scene description information by word vectors by detecting the network model to obtain the detection network
  • the model is capable of identifying and processing video content characteristics.
  • an SSD network framework is used as an example to illustrate the process of obtaining the feature variables of video content. See FIG. 7 The schematic diagram of the SSD network structure shown.
  • the training process of the SSD network structure is as follows:
  • the basic network in the SSD network architecture is a VGG-16 network, and since a faster processing rate is required in the embodiments of the present application to meet the identification requirements of dynamic video, the VGG- The 16 network was replaced with a more lightweight mobilenet network.
  • a prediction target frame is set on each newly added feature map, and the position of the photographic object is predicted through the prediction target frame.
  • the prediction category is performed for each prediction target frame, and each prediction target frame is compared with the actual labeled frame of the shooting object to calculate a loss.
  • the above description is based on an example in which package matching is performed on a terminal device.
  • the matching package operation is performed by another (for example, an application server or a modeling server), a similar manner may also be performed.
  • Step 513 The terminal device inputs the feature variables of the video content into the package recommendation model for package matching to obtain at least one recommended video processing package that matches the scene description information.
  • the package recommendation model can use the video content feature variables as an input variable to match the corresponding video processing package. For example, "a baby" is used as the video content feature variable to input the package recommendation model.
  • the package recommendation model can recommend one or more recommended video processing packages. For example, according to the video processing packages corresponding to the three video training samples with the highest recommendation points, three recommended video processing packages can be obtained.
  • the package recommendation model only recommends one video processing package, then the one video processing package can be directly used as the final target video processing package.
  • the package recommendation model recommends multiple video processing packages, then one of the following methods can be selected to determine the final target video processing package according to the actual use requirements.
  • Step 514 is executed, that is, the terminal device determines, as the target video processing package, the most frequently used video processing package among the at least one recommended video processing package.
  • the frequency of use here can be measured by the recommended integral value, that is, the target video processing package is determined by the video processing package corresponding to the maximum recommended integral value, so that the processed target video can meet the preferences of the public users.
  • Step 515 is executed, that is, the terminal device first determines a video processing package matching the user attribute information according to the user attribute information.
  • the video processing package matching the user attribute information in the embodiment of the present application is referred to as a priority video. Processing package, and then determine the video processing package with the highest similarity between the priority video processing packages from at least one recommended video processing package, and finally determine the video processing package with the highest similarity as the final target video processing package .
  • the user attribute information may refer to the user's own preference settings and related information when the user first uses the video app or when registering, such as gender, age, life stage (unmarried, married, pregnant, giving birth, or Love), career, video effect preferences (such as which filter, beauty level and special effects, etc.), video theme preferences (such as favorite videos of cute baby theme and beauty and beauty theme videos), and so on.
  • the user's attribute information can be used to roughly know the video effect the user likes, and then these factors can be combined to estimate the priority video processing package that matches the user.
  • the second solution based on the multiple video processing packages recommended by the package recommendation system, combined with the user's actual preferences, the one that best matches the user's actual preferences (that is, the most similar) is selected.
  • One as the final target video processing package which can meet the differentiated needs of users, and can match the actual needs of users as much as possible.
  • the terminal device first determines a priority video processing package that matches user attribute information.
  • Step 516 The terminal device determines whether at least one of the recommended video processing packages has a recommended video processing package whose similarity with the priority video processing package is greater than or equal to a predetermined similarity.
  • Step 517 If so, the terminal device determines the recommended video processing package with the highest similarity as the target video processing package.
  • Step 518 If not, the terminal device directly determines the priority video processing package as the target video processing package.
  • the priority video processing package can be directly used as the final target video processing package temporarily.
  • the modeling server may use the word vector representation of the scene description information to obtain the video content feature variables, and then analyze the video content feature variables to determine Find out the environment and / or object category and quantity corresponding to the target shooting scene, and then match a set of video processing methods for the determined environment and / or different types of objects, and then match the multiple sets of video processing methods.
  • the formed video processing package serves as the final target video processing package, that is, it can match a set of corresponding video processing methods for different objects, so that different types of objects can be targeted for different processing to maximize the video. Diversity of processing, and because a set of video processing methods determined for each type of object is also recommended by a trained package recommendation model, it can also meet the needs of the public as much as possible to ensure the universality of the solution.
  • a set of corresponding video processing methods can get four sets of video processing methods, and then the set of these four sets of video processing methods is determined as the final target video processing package for recommendation.
  • the user attribute information and / or historical viewing information of the user can also be obtained, and the user attribute information and / or historical viewing information are respectively represented by word vectors to obtain auxiliary feature variables.
  • the video content feature variables obtained according to the scene description information and the auxiliary feature variables obtained here are input into a package recommendation model for package matching to obtain a recommended target video processing package. It can be seen that for the case of package matching recommendation by the modeling server, the modeling server finally recommends only one video processing package.
  • the use of user attribute information and historical viewing information as auxiliary recommendation factors also ensures that the recommended In the process, try to take into account the actual factors of the user to achieve accurate recommendation of the package.
  • a target video package is determined by combining a preset set of scenes and a corresponding set of video processing packages.
  • the specific process is shown in steps 521 to 523 in FIG. 5, which will be described in detail below.
  • Step 521 The terminal device determines a corresponding set of a preset scene and a video processing package.
  • the user can set the correspondence between the scene and the video processing package in advance, such as package A for baby, package B for baby + mother, package C for baby + daddy, package D for pet (kitten or puppy), etc. Wait, of course, during the setting process, you can also shoot segment videos or set scenes by previewing the video, and then set the corresponding video processing package for each scene.
  • the corresponding relationship may also be configured by a video APP by default.
  • Step 522 The terminal device enters the scene description information into the aforementioned corresponding set for matching search, so as to obtain a video processing package with the highest degree of matching with the scene description information.
  • Step 523 The terminal device uses the video processing package with the highest matching degree as the target video processing package.
  • the preset corresponding relationship can automatically match the appropriate video processing package for different video shooting scenes, which can meet the user's actual shooting needs, and can be personalized and customized by the user in order to It can be modified and updated at any time, so it can meet the actual shooting needs of users to a large extent.
  • Step 33 The terminal device obtains a target video obtained by shooting the target shooting scene.
  • the target video may be obtained according to the actual situation.
  • Step 34 The terminal device processes the target video according to the target video processing package to obtain a processed target video.
  • the target video After the target video is processed with the target video processing package, the target video can have corresponding video effects, which improves the degree of beautification of the video.
  • Step 35 The terminal device sends the processed target video to the corresponding application server, and the application server can receive the processed target video.
  • Step 36 The application server reviews the received processed target video and publishes it after the review is passed.
  • the processed target video in order to achieve social sharing, it can also be posted on the network.
  • the publishing process is specifically described in steps 35 and 36. Specifically, the video publishing process in the related technology can be performed. The explanation will not be expanded here.
  • the target video processing package before and after the change in the subject is determined, and further Then, the captured video is processed with a corresponding target video processing package before and after the subject changes. In this way, corresponding processing can be performed with different video processing packages before and after scene changes, which can improve the effectiveness of video processing.
  • Step 81 The terminal device sends the scene description information to the modeling server.
  • the modeling server can receive scene description information.
  • Step 82 The modeling server matches the target video processing package corresponding to the target shooting scene according to the scene description information. Specifically, the recommendation can be made through the package recommendation model in the modeling server.
  • Step 83 The modeling server sends the matched target video processing package to the cloud server.
  • the cloud server can receive the target video processing package.
  • Step 84 The terminal device sends the obtained target video to the cloud server.
  • the cloud server can receive the target video.
  • Step 85 The cloud server processes the target video according to the target video processing package to obtain a processed target video.
  • the cloud server may receive the target video processing package first, and then receive the target video, or receive the target video first, and then receive the target video processing package, or both Receive target videos and target video processing packages. After the cloud server receives the target video and the target video processing package, it processes the target video according to the target video processing package to obtain the processed target video.
  • Step 86 The cloud server sends the processed target video to the application server.
  • the application server can receive the processed target video.
  • Step 87 The application server reviews the processed target video and publishes it after passing the review.
  • the user can choose whether to enable the function of automatically matching the video processing package. If the user turns off the function, the user is allowed to set various video processing methods and use the user.
  • a video processing package composed of a set video processing method is used to process a video captured by a user.
  • an embodiment of the present application provides a terminal device, which may be, for example, the terminal device 21 in the foregoing FIG. 2A to FIG. 2C.
  • the terminal device may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the terminal device may be implemented by a chip system, and the chip system may be composed of a chip, and may also include a chip and other discrete devices.
  • the terminal device in the embodiment of the present application may include an obtaining module 91, a matching module 92, and a processing module 93. among them:
  • the obtaining module 91 is configured to obtain scene description information of a target shooting scene
  • the matching module 92 is configured to match the target video processing package corresponding to the target shooting scene according to the scene description information, wherein the target video processing package includes at least one video processing mode for processing the video in a predetermined processing mode;
  • the processing module 93 is configured to process the target video shot for the target shooting scene according to the target video processing package.
  • the matching module 92 is configured to:
  • the scene description information is represented by a word vector to obtain video content feature variables
  • the matching module 92 is configured to:
  • the target video processing package has the greatest similarity with the priority video processing package, wherein the priority video processing package is a video processing package matched according to user attribute information.
  • the matching module 92 is configured to:
  • the priority video processing package is determined as the target video processing package.
  • the matching module 92 is configured to:
  • the most matching video processing package is determined as the target video processing package.
  • the matching module 92 is configured to:
  • the processing module 93 is configured to process the captured video with a corresponding target video processing package before and after the change of the shooting object.
  • the obtaining module 91 is configured as:
  • the division of the modules in the embodiments of the present application is schematic and is only a logical function division. In actual implementation, there may be another division manner.
  • the functional modules in the embodiments of the present application may be integrated into one process. In the device, it can also exist separately physically, or two or more modules can be integrated into one module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules.
  • the server may be, for example, the server 24 in the foregoing FIG. 2A to FIG. 2C, that is, a modeling server.
  • the server may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the terminal device may be implemented by a chip system, and the chip system may be composed of a chip, and may also include a chip and other discrete devices.
  • the server in the embodiment of the present application may include a receiving module 101, a first obtaining module 102, a matching module 103, and a sending module 104. among them:
  • the receiving module 101 is configured to receive scene description information of a target shooting scene sent by a terminal device;
  • the first obtaining module 102 is configured to represent the scene description information with a word vector to obtain a video content feature variable
  • the matching module 103 is configured to input the feature variables of the video content into a pre-established package recommendation model and perform package matching to obtain a target video processing package that matches the target shooting scene.
  • the target video processing package includes processing the video in a predetermined processing mode. At least one video processing method;
  • the sending module 104 is configured to send the target video processing package to the terminal device or the cloud server, so that the terminal device or the cloud server processes the target video shot for the target shooting scene according to the target video processing package.
  • the matching module 103 is configured to:
  • a video processing package composed of a plurality of matched video processing methods is used as a target video processing package.
  • the server further includes a second obtaining module and a third obtaining module; wherein:
  • a second obtaining module configured to obtain user attribute information and / or historical viewing information of a user corresponding to the terminal device
  • the third obtaining module is configured to represent the user attribute information and / or the historical viewing information by word vectors to obtain auxiliary feature variables;
  • the matching module 103 is configured to input video content feature variables and auxiliary feature variables together into a package recommendation model to perform package matching to obtain a target video processing package.
  • the server further includes a model establishment module, which is configured to:
  • the video content label of each video training sample and the corresponding video processing package are used as training features to input a preset network model for training and learning to obtain a package recommendation model.
  • model building module is set as:
  • the recommended integration value is determined according to the historical interaction data of each video training sample, wherein the historical interaction data of the video training sample is used to indicate the interaction between the user and the video training sample;
  • the recommendation score value of each video training sample is associated with the corresponding video processing package, and then training is learned to obtain a package recommendation model.
  • model building module is set as:
  • each video content label and the corresponding video processing package are associatedly trained in a preset network model to obtain a package recommendation model;
  • each target video content label and corresponding Video processing package for associated training to get package recommendation models.
  • the division of the modules in the embodiments of the present application is schematic and is only a logical function division. In actual implementation, there may be another division manner.
  • the functional modules in the embodiments of the present application may be integrated into one process. In the device, it can also exist separately physically, or two or more modules can be integrated into one module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules.
  • the server may be, for example, the server 23 in the foregoing FIG. 2A to FIG. 2C, that is, a cloud server.
  • the server may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the terminal device may be implemented by a chip system, and the chip system may be composed of a chip, and may also include a chip and other discrete devices.
  • the server in the embodiment of the present application may include a first receiving module 111, a second receiving module 112, and a processing module 113. among them:
  • the first receiving module 111 is configured to receive scene description information of a target shooting scene and a target video shot for the target shooting scene sent by the terminal device;
  • the second receiving module 112 is configured to receive a target video processing package sent by a terminal device or a modeling server, where the target video processing package is a video processing package corresponding to a target shooting scene and is matched according to scene description information.
  • the target video processing package includes At least one video processing mode for processing a video in a predetermined processing mode;
  • the processing module 113 is configured to process the target video according to the target video processing package.
  • the division of the modules in the embodiments of the present application is schematic and is only a logical function division. In actual implementation, there may be another division manner.
  • the functional modules in the embodiments of the present application may be integrated into one process. In the device, it can also exist separately physically, or two or more modules can be integrated into one module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules.
  • the embodiment of the present application also provides another video processing device.
  • the video processing device may be a terminal device, such as a smart phone, a tablet computer, a PDA, a notebook computer, a vehicle-mounted device, a smart wearable device, and the like.
  • the functions of the terminal device in the foregoing video processing methods shown in FIG. 3 and FIG. 5 are implemented; or the video processing apparatus may also be a device capable of supporting the terminal device to realize the functions of the terminal device in the foregoing video caching method.
  • the video processing device may be a hardware structure, a software module, or a hardware structure plus a software module.
  • the video processing device may be implemented by a chip system, and the chip system may be composed of a chip, and may also include a chip and other discrete devices.
  • the video processing device in the embodiment of the present application includes at least one processor 121 and a memory 122 connected to the at least one processor, and the specific between the processor 121 and the memory 122 is not limited in the embodiment of the present application.
  • the connection medium is shown in FIG. 12 by using the bus 120 between the processor 121 and the memory 122 as an example.
  • the bus 120 is indicated by a thick line in FIG. 12.
  • the connection between other components is only a schematic description, and Don't be bound.
  • the bus 120 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only a thick line is used in FIG. 12, but it does not mean that there is only one bus or one type of bus.
  • the memory 122 stores instructions that can be executed by the at least one processor 121.
  • the at least one processor 121 can execute the steps included in the foregoing video caching method by executing the instructions stored in the memory 122.
  • the processor 121 is the control center of the video processing device. It can use various interfaces and lines to connect various parts of the entire video processing device. By running or executing instructions stored in the memory 122 and calling data stored in the memory 122, Various functions and data of the video processing device, so as to monitor the video processing device as a whole.
  • the processor 121 may include one or more processing units, and the processor 121 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc.
  • the tuning processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 121.
  • the processor 121 and the memory 122 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
  • the processor 121 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic device, a discrete gate, or a transistor
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • a general-purpose processor may be a microprocessor or any conventional processor.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly implemented by a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the memory 122 is a non-volatile computer-readable storage medium and can be used to store non-volatile software programs, non-volatile computer executable programs, and modules.
  • the memory 122 may include at least one type of storage medium, for example, may include a flash memory, a hard disk, a multimedia card, a card-type memory, a random access memory (RAM), a static random access memory (Static Random Access Memory, SRAM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CDs and more.
  • the memory 122 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory 122 in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and / or data.
  • the video processing device may further include an input unit 133, a display unit 134, a radio frequency unit 135, an audio circuit 136, a speaker 137, a microphone 138, and a wireless fidelity (Wireless Fidelity (WiFi) module 139, Bluetooth module 1310, power supply 1311, external interface 1312, headphone jack 1312 and other components.
  • WiFi wireless Fidelity
  • FIG. 13 is only an example of a video processing device, and does not constitute a limitation on the video processing device.
  • the video processing device may include more or fewer components than shown in the figure, or some components may be combined. , Or different parts.
  • the input unit 133 may be used to receive inputted numeric or character information, and generate key signal inputs related to user settings and function control of the video processing device.
  • the input unit 133 may include a touch screen 1331 and other input devices 1332.
  • the touch screen 1331 can collect user's touch operations on or near it (such as the operation of the user using fingers, joints, stylus, etc. on the touch screen 1331 or near the touch screen 1331), that is, the touch screen 1331 can be used to detect touch pressure and Touch input position and touch input area, and drive the corresponding connection device according to a preset program.
  • the touch screen 1331 can detect a user's touch operation on the touch screen 1331, convert the touch operation into a touch signal and send it to the processor 121, or understand that the touch information of the touch operation can be sent to the processor 121, and can receive processing The command sent from the device 121 is executed.
  • the touch information may include at least one of pressure magnitude information and pressure duration information.
  • the touch screen 1331 can provide an input interface and an output interface between the video processing device and a user.
  • the touch screen 1331 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the input unit 133 may also include other input devices 1332.
  • other input devices 1332 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and an joystick.
  • the display unit 134 may be configured to display information input by the user or information provided to the user and various menus of the video processing device.
  • the touch screen 1331 may cover the display unit 134, and when the touch screen 1331 detects a touch operation on or near the touch screen 1331, it is transmitted to the processor 121 to determine pressure information of the touch operation.
  • the touch screen 1331 and the display unit 134 may be integrated into one component to implement input, output, and display functions of the video processing device.
  • the embodiment of the present application uses the touch screen 1331 to represent the function set of the touch screen 1331 and the display unit 134 as an example for illustration.
  • the touch screen 1331 and the display unit 134 may also be used as two independent components. .
  • the display unit 134 can be used as an input device and an output device, and when used as an output device, can be used to display an image, for example, to implement various video Play.
  • the display unit 134 may include a Liquid Crystal Display (LCD), a Thin Film Transistor (Liquid Crystal Display, TFT-LCD), an Organic Light Emitting Diode (OLED) display, and an active matrix organic light emitting diode (OLED) display.
  • LCD Liquid Crystal Display
  • TFT-LCD Thin Film Transistor
  • OLED Organic Light Emitting Diode
  • OLED active matrix organic light emitting diode
  • the video processing device may include two or more display units (or other display devices). ),
  • the video processing apparatus may include an external display unit (not shown in FIG. 13) and an internal display unit (not shown in FIG. 13).
  • the radio frequency unit 135 may be used for receiving and transmitting information or receiving and transmitting signals during a call.
  • the radio frequency circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • the radio frequency unit 135 can also communicate with network devices and other devices through wireless communication.
  • Wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.
  • GSM Global System of Mobile
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the audio circuit 136, the speaker 137, and the microphone 138 may provide an audio interface between the user and the video processing device.
  • the audio circuit 136 may transmit the received electric data converted electric signal to the speaker 137, and the speaker 137 converts the electric signal into a sound signal and outputs it.
  • the microphone 138 converts the collected sound signal into an electrical signal, which is received by the audio circuit 136 and converted into audio data.
  • the audio data is output to the processor 121 for processing, it is transmitted to, for example, another electronic device via the radio frequency unit 135.
  • the audio data is output to the memory 122 for further processing.
  • the audio circuit may also include a headphone jack 1312 for providing a connection interface between the audio circuit and the headphones.
  • WiFi is a short-range wireless transmission technology.
  • the video processing device can help users send and receive email, browse web pages, and access streaming media through the WiFi module 139. It provides users with wireless broadband Internet access.
  • FIG. 13 shows the WiFi module 139, it can be understood that it does not belong to the necessary structure of the video processing device, and can be omitted as needed without changing the nature of the application.
  • Bluetooth is a short-range wireless communication technology.
  • the use of Bluetooth technology can effectively simplify the communication between mobile communication terminal devices such as palmtops, laptops, and mobile phones. It can also successfully simplify the communication between these devices and the Internet.
  • the video processing device passes the Bluetooth module 1310.
  • the data transmission between the video processing device and the Internet becomes more rapid and efficient, and the road is widened for wireless communication.
  • Bluetooth technology is an open solution that enables wireless transmission of voice and data.
  • FIG. 13 shows the Bluetooth module 1310, it can be understood that it does not belong to the necessary configuration of the video processing device, and can be omitted as needed without changing the nature of the application.
  • the video processing apparatus may further include a power source 1311 (such as a battery) for receiving external power or supplying power to various components within the video processing apparatus.
  • the power source 1311 can be logically connected to the processor 121 through a power management system, so as to implement functions such as management of charging, discharging, and power consumption management through the power management system.
  • the video processing device may also include an external interface 1312.
  • the external interface 1312 may include a standard Micro USB interface, or may include a multi-pin connector, which can be used to connect the video processing device to other devices for communication, or to connect a charger Charge the video processing unit.
  • the video processing apparatus in the embodiments of the present application may further include other possible functional modules such as a camera, a flash, and the like, and details are not described herein again.
  • FIG. 14 shows a schematic structural diagram of a video processing device provided by an embodiment of the present application.
  • Server 22, server 23, or server 24 in the AC is a schematic structural diagram of a video processing device provided by an embodiment of the present application.
  • Server 22, server 23, or server 24 in the AC is a schematic structural diagram of a video processing device provided by an embodiment of the present application.
  • the video processing device includes a processor 1401, a random access memory 1402, and a system memory 1404 of a read-only memory 1403, and a system bus 1405 that connects the system memory 1404 and the processor 1401.
  • the video processing device also includes a basic input / output system (I / O system) 1406 to help transfer information between various devices in the computer, and a large-capacity storage for storing the operating system 1413, application programs 1414, and other program modules 1415.
  • Device 1407 includes a basic input / output system (I / O system) 1406 to help transfer information between various devices in the computer, and a large-capacity storage for storing the operating system 1413, application programs 1414, and other program modules 1415.
  • I / O system basic input / output system
  • Processor 1401 is the control center of the video processing device. It can use various interfaces and lines to connect various parts of the entire video processing device, and run or execute instructions stored in memory (such as random access memory 142 and read-only memory 1403). And call the data stored in the memory, various functions and processing data of the video processing device, so as to monitor the video processing device as a whole.
  • memory such as random access memory 142 and read-only memory 1403
  • the processor 1401 may include one or more processing units, and the processor 1401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc.
  • the tuning processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1401.
  • the processor 1401 and the memory may be implemented on the same chip, and in some embodiments, they may also be implemented separately on separate chips.
  • the processor 1401 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic device, a discrete gate, or a transistor
  • the logic device and discrete hardware components can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application.
  • a general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in combination with the embodiments of the present application may be directly implemented and executed by a hardware processor, or may be executed and completed by a combination of hardware and software modules in the processor.
  • the memory can be used to store non-volatile software programs, non-volatile computer executable programs, and modules.
  • the memory may include at least one type of storage medium, for example, it may include a flash memory, a hard disk, a multimedia card, a card type memory, a RAM, a static random access memory (SRAM), a programmable read-only memory (Programmable Read Only Memory) , PROM), ROM, Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory in the embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and / or data.
  • the basic input / output system 1406 includes a display 1408 for displaying information and an input device 1409 such as a mouse, a keyboard, or the like for a user to input information.
  • the display 1408 and the input device 1409 are both connected to the processor 1401 through a basic input / output system 1406 connected to the system bus 1405.
  • the basic input / output system 1406 may further include an input / output controller for receiving and processing input from a plurality of other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input-output controller also provides output to a display, printer, or other type of output device.
  • the mass storage device 1407 is connected to the processor 1401 through a mass storage controller (not shown) connected to the system bus 1405.
  • the mass storage device 1407 and its associated computer-readable medium provide non-volatile storage for the video processing device package. That is, the mass storage device 1407 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • Computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory, or other solid-state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, disk storage, or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other solid-state storage technologies
  • CD-ROM, DVD or other optical storage CD-ROM, DVD or other optical storage
  • tape cartridges magnetic tape
  • disk storage or other magnetic storage devices.
  • the above-mentioned system memory 904 and mass storage device 907 may be collectively referred to as a memory.
  • the video processing device package may also be operated by a remote computer connected to a network through a network such as the Internet. That is, the video processing device package can be connected to the network 1412 through the network interface unit 1411 connected to the system bus 1405, or the network interface unit 1411 can also be used to connect to other types of networks or remote computer systems (not Shows).
  • an embodiment of the present application further provides a storage medium that stores computer instructions.
  • the computer instructions When the computer instructions are run on a computer, the computer is caused to execute the steps of the foregoing video processing method.
  • an embodiment of the present application further provides a video processing device.
  • the video processing device includes at least one processor and a readable storage medium. When an instruction included in the readable storage medium is executed by the at least one processor, , Can perform the steps of the aforementioned video processing method.
  • an embodiment of the present application further provides a chip system.
  • the chip system includes a processor, and may further include a memory, for implementing the steps of the foregoing video processing method.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • aspects of the video processing method provided in the present application may also be implemented in the form of a program product, which includes program code.
  • program product runs on a computer
  • the program code is used To cause the computer to execute the steps in the video processing method according to various exemplary embodiments of the present application described above.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) containing computer-usable program code.
  • a computer-usable storage media including, but not limited to, disk storage, optical storage, and the like
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • This solution obtains scene description information of a target shooting scene, matches a target video processing package corresponding to the target shooting scene according to the scene description information, and according to the target video processing package, The method of processing the target video.
  • the corresponding target video processing package is automatically matched according to the scene description information, eliminating the manual selection operation by the user as in the related art, thereby improving the matching efficiency of the video processing mode.
  • multiple video processing methods can be matched at one time, which can further improve matching efficiency.
  • the target video processing package is dynamically and correspondingly matched based on the scene description information, the matched target video processing package can be matched with the actual video content as much as possible, thereby improving the accuracy of video processing.

Abstract

本申请公开了一种视频处理方法及装置、终端设备、服务器及存储介质,属于信息处理技术领域,用于解决相关技术中视频处理方式的匹配效率较低的技术问题。在该方法中,可以根据场景描述信息自动匹配出对应的目标视频处理套餐,省去了如相关技术中用户手动进行选择的操作,进而可以提高视频处理方式的匹配效率。同时,可以一次性匹配多种视频处理方式,可以进一步地提高匹配效率。并且,由于目标视频处理套餐是基于场景描述信息动态对应地匹配出来的,所以可以使得匹配出的目标视频处理套餐能够尽量与实际的视频内容相符合,进而可以提高视频处理的准确性,以尽量与用户的实际需求相符。

Description

一种视频处理方法及装置、终端设备、服务器及存储介质
本申请要求于2018年7月23日提交中国专利局、优先权号为2018108143463、发明名称为“一种视频处理方法和装置、终端设备、服务器及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,尤其涉及一种视频处理方法及装置、终端设备、服务器及存储介质。
背景技术
随着终端设备的普及和网络的提速,大流量传播内容逐渐获得各大社交平台的青睐,例如短视频,目前已经有多种应用程序(Application,APP)支持短视频的拍摄和发布,用户可以将值得纪念的场景用视频的方式记录下来,并且还可以将拍摄的视频进行发布,进而实现视频的网络分享。
为了取得较好的视频效果,在拍摄视频之前,用户可以选择例如滤镜、美颜、特效、背景音乐等视频处理方式对拍摄的视频进行处理。在相关技术中,用户在每选择一项视频处理方式时,以滤镜为例,需要先打开滤镜选项,然后从多种滤镜效果中选择需要的,操作过程比较繁琐,耗时较长,尤其是在选择的视频处理方式种类较多时效率就更低,换句话说,目前在拍摄视频时终端设备提供视频处理方式的匹配效率较低,所以如何提高视频处理方式的匹配效率是一个需要思考的问题。
发明内容
本申请实施例提供一种视频处理方法及装置、终端设备、服务器及存储介质,用于解决相关技术中视频处理方式的匹配效率较低的技术问题。
一方面,提供一种视频处理方法,该方法包括:
终端设备获得目标拍摄场景的场景描述信息;
终端设备根据上述场景描述信息,匹配与上述目标拍摄场景对应的目标视频处理套餐,其中,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
终端设备根据上述目标视频处理套餐,对针对上述目标拍摄场景拍摄的目标视频进行处理。
在该方案中,终端设备可以根据场景描述信息自动匹配出对应的目标视频处理套餐,省去了如相关技术中用户手动进行选择的操作,进而可以提高视频处理方式的匹配效率。同时,可以一次性匹配多种视频处理方式,可以进一步地提高匹配效率。并且,由于目标视频处理套餐是基于场景描述信息动态对应地匹配出来的,所以可以使得匹配出的目标视频处理套餐能够尽量与实际的视频内容相符合,进而可以提高视频处理的准确性,以尽量与用户的实际需求相符。
在一种可能的设计中,终端设备根据上述场景描述信息,匹配与上述目标拍摄场景对应的目标视频处理套餐,包括:
终端设备将上述场景描述信息进行词向量表示,以得到视频内容特征变量;
终端设备将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述场景描述信息匹配的至少一个推荐的视频处理套餐;
终端设备根据上述至少一个推荐的视频处理套餐,确定上述目标视频 处理套餐。
在该方案中,终端设备可以通过预先建立的套餐推荐模型来进行匹配出目标视频处理套餐,可以达到较高的匹配效率。并且通过套餐推荐模型可以一次性推荐多个视频处理套餐,这样可以便于终端设备根据自身的实际情况选择其中一种作为最终使用的目标视频处理套餐,从而可以增强方案的适用性和普适性。
并且,套餐推荐模型可以是根据大量已发布视频的视频内容进行深度学习而建立的模型,所以通过将表明拍摄场景的实际情况的场景描述信息输入到套餐推荐模型的方式,匹配得到的目标视频处理套餐可以尽量与当前实际拍摄的视频内容相符合,这样可以尽量提高目标视频效果处理的准确性和有效性,尽量满足用户的实际使用需求。
在一种可能的设计中,终端设备根据上述至少一个推荐的视频处理套餐,确定上述目标视频处理套餐,包括:
终端设备确定上述至少一个推荐的视频处理套餐中使用频率最高的为上述目标视频处理套餐;或
终端设备确定上述至少一个推荐的视频处理套餐中与优先视频处理套餐之间相似度最大的为上述目标视频处理套餐,其中,上述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐。
在该方案中,终端设备可以根据实际使用场景采用不同的选择方式从套餐推荐模型推荐的多个视频处理套餐中选择一种合适的套餐来对目标视频进行处理,增强了方案的多样性,使得方案的适用范围更广,适用性更强。
并且,由于考虑了用户属性信息,所以在选择套餐时还可以将与用户属性信息匹配的优先视频处理套餐一并进行考虑,即将用户的实际使用需求作为选择目标视频处理套餐的选择因素,这样可以在一定程度上提升目标视频套餐确定的针对性,使得确定结果能够尽量与用户的实际喜好相符合。
在一种可能的设计中,终端设备根据上述至少一个推荐的视频处理套餐,确定上述目标视频处理套餐,包括:
终端设备判断上述至少一个推荐的视频处理套餐中是否有与上述优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐,其中,上述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐;
若有,则终端设备将相似度最大的推荐的视频处理套餐确定为上述目标视频处理套餐;
若没有,则终端设备将上述优先视频处理套餐确定为上述目标视频处理套餐。
在该方案中,将用户自身的偏好设置作为一个重要的选择条件,这样可以使得最终得到的目标视频处理方案尽量符合用户的实际使用需求。
在一种可能的设计中,终端设备根据上述场景描述信息,匹配与上述目标拍摄场景对应的目标视频处理套餐,包括:
终端设备将上述场景描述信息输入预先设置的场景与视频处理套餐对应集合进行匹配查找,以得到与上述场景描述信息匹配度最高的视频处理套餐;
终端设备将上述匹配度最高的视频处理套餐确定为上述目标视频处理套餐。
在该方案中,终端设备通过预置的前述对应关系即可以针对不同的视频拍摄场景自动匹配出相适应的视频处理套餐,可以满足用户的实际拍摄需求,并且可以由用户进行个性化定制,以便于随时修改和更新,所以能够较大程度上满足用户的实际拍摄需求。
在一种可能的设计中,终端设备根据上述场景描述信息,匹配与上述目标拍摄场景对应的目标视频处理套餐,包括:
在上述场景描述信息表明上述目标拍摄场景中的拍摄对象发生变化时,终端设备分别确定拍摄对象发生变化前后的目标视频处理套餐;
终端设备根据上述目标视频处理套餐,对针对上述目标拍摄场景拍摄的目标视频进行处理,包括:
终端设备在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得到的视频进行处理。
在该方案中,终端设备可以针对场景变化前后分别以不同的视频处理套餐进行对应处理,这样可以提高视频处理的有效性。
在一种可能的设计中,终端设备获得目标拍摄场景的场景描述信息,包括:
终端设备获得上述目标拍摄场景的预览视频,或者,终端设备获得上述目标拍摄场景实际拍摄的上述目标视频;
终端设备对上述预览视频的视频序列帧或对上述目标视频的视频序列帧进行图像识别,以获得每帧的关键特征信息,其中,关键特征信息为 每帧中所占面积最大和/或视觉呈现于最前位置的拍摄对象的特征信息;
终端设备根据所有帧的关键特征信息,确定上述场景描述信息。
另一方面,提供一种视频处理方法,该方法包括:
建模服务器接收终端设备发送的目标拍摄场景的场景描述信息;
建模服务器将上述场景描述信息进行词向量表示,以得到视频内容特征变量;
建模服务器将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述目标拍摄场景匹配的目标视频处理套餐,其中,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
建模服务器将上述目标视频处理套餐发送给上述终端设备或云端服务器,以使上述终端设备或上述云端服务器根据上述目标视频处理套餐对针对上述目标拍摄场景拍摄的目标视频进行处理。
在一种可能的设计中,建模服务器将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述目标拍摄场景匹配的目标视频处理套餐,包括:
建模服务器对上述视频内容特征变量进行分析,确定与上述目标拍摄场景对应的环境和/或物体类别及数量;
建模服务器针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式;
建模服务器将匹配出的多套视频处理方式所组成的视频处理套餐作为上述目标视频处理套餐。
在该方案中,建模服务器可以针对不同的物体分别匹配一套对应的视频处理方式,这样可以对不同类别的物体进行针对性地差异处理,以尽量提升视频处理的多样性,并且由于针对每种类别的物体所确定出的一套视频处理方式也是由训练好的套餐推荐模型推荐的,所以也可以尽量符合大众需求,确保方案的普适性。
在一种可能的设计中,上述方法还包括:
建模服务器获得上述终端设备对应用户的用户属性信息和/或历史观看信息;
建模服务器将上述用户属性信息和/或上述历史观看信息分别进行词向量表示,以得到辅助特征变量;
建模服务器将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述场景描述信息匹配的目标视频处理套餐,包括:
建模服务器将上述视频内容特征变量和上述辅助特征变量一起输入上述套餐推荐模型进行套餐匹配,以得到上述目标视频处理套餐。
在该方案中,建模服务器采用将用户属性信息和历史观看信息作为辅助推荐因素的方式,也是确保在推荐的过程中尽量将用户的实际因素考虑在内,进而实现套餐的准确推荐。
在一种可能的设计中,上述套餐推荐模型按照以下方式建立:
建模服务器从已发布的视频中选择多个视频作为视频训练样本;
建模服务器基于每个视频训练样本包括的视频序列帧的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,以得到每个 视频训练样本的视频内容标签;
建模服务器提取每个视频训练样本使用的视频处理套餐;
建模服务器将每个视频训练样本的视频内容标签和对应的视频处理套餐作为训练特征输入预设网络模型进行训练学习,以获得上述套餐推荐模型。
在一种可能的设计中,建模服务器将每个视频训练样本的视频内容标签和对应的视频处理套餐输入预设网络模型进行训练学习,以获得上述套餐推荐模型,包括:
建模服务器根据每个视频训练样本的历史交互数据确定其推荐积分值,其中,视频训练样本的历史交互数据用于表明用户与该视频训练样本之间的交互情况;
建模服务器按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得上述套餐推荐模型。
在一种可能的设计中,建模服务器按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频内容标签建立关联后进行训练学习,以获得上述套餐推荐模型,包括:
建模服务器按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在上述预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,以得到上述套餐推荐模型;或
建模服务器确定推荐积分值大于等于预定积分值的目标视频内容标签,再按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在上述预设网络模型内将每个目标视频内容标签和对应的视频处理套餐 进行关联训练,以得到上述套餐推荐模型。
本发明实施例提供一种视频处理方法,该方法包括:
云端服务器接收终端设备发送的目标拍摄场景的场景描述信息以及针对上述目标拍摄场景拍摄的目标视频;
云端服务器接收上述终端设备或建模服务器发送的目标视频处理套餐,其中,上述目标视频处理套餐是根据上述场景描述信息匹配的与上述目标拍摄场景对应的视频处理套餐,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
云端服务器根据上述目标视频处理套餐对上述目标视频进行处理。
本发明实施例还提供一种终端设备,该终端设备包括:
获得模块,设置为获得目标拍摄场景的场景描述信息;
匹配模块,设置为根据上述场景描述信息,匹配与上述目标拍摄场景对应的目标视频处理套餐,其中,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
处理模块,设置为根据上述目标视频处理套餐,对针对上述目标拍摄场景拍摄的目标视频进行处理。
在一种可能的设计中,上述匹配模块设置为:
将上述场景描述信息进行词向量表示,以得到视频内容特征变量;
将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述场景描述信息匹配的至少一个推荐的视频处理套餐;
根据上述至少一个推荐的视频处理套餐,确定上述目标视频处理套餐。
在一种可能的设计中,上述匹配模块设置为:
确定上述至少一个推荐的视频处理套餐中使用频率最高的为上述目 标视频处理套餐;或
确定上述至少一个推荐的视频处理套餐中与优先视频处理套餐之间相似度最大的为上述目标视频处理套餐,其中,上述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐。
在一种可能的设计中,上述匹配模块设置为:
判断上述至少一个推荐的视频处理套餐中是否有与上述优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐,其中,上述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐;
若有,则将相似度最大的推荐的视频处理套餐确定为上述目标视频处理套餐;
若没有,则将上述优先视频处理套餐确定为上述目标视频处理套餐。
在一种可能的设计中,上述匹配模块设置为:
将上述场景描述信息输入预先设置的场景与视频处理套餐对应集合进行匹配查找,以得到与上述场景描述信息匹配度最高的视频处理套餐;
将上述匹配度最高的视频处理套餐确定为上述目标视频处理套餐。
在一种可能的设计中,上述匹配模块设置为:
在上述场景描述信息表明上述目标拍摄场景中的拍摄对象发生变化时,分别确定拍摄对象发生变化前后的目标视频处理套餐;
上述处理模块设置为:
在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得到的视频进行处理。
在一种可能的设计中,上述获得模块设置为:
获得上述目标拍摄场景的预览视频,或者,获得上述目标拍摄场景实际拍摄的上述目标视频;
对上述预览视频的视频序列帧或对上述目标视频的视频序列帧进行图像识别,以获得每帧的关键特征信息,其中,关键特征信息为每帧中所占面积最大和/或视觉呈现于最前位置的拍摄对象的特征信息;
根据所有帧的关键特征信息,确定上述场景描述信息。
本发明实施例还提供一种服务器,该服务器包括:
接收模块,设置为接收终端设备发送的目标拍摄场景的场景描述信息;
第一获得模块,设置为将上述场景描述信息进行词向量表示,以得到视频内容特征变量;
匹配模块,设置为将上述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与上述目标拍摄场景匹配的目标视频处理套餐,其中,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
发送模块,设置为将上述目标视频处理套餐发送给上述终端设备或云端服务器,以使上述终端设备或上述云端服务器根据上述目标视频处理套餐对针对上述目标拍摄场景拍摄的目标视频进行处理。
在一种可能的设计中,上述匹配模块设置为:
对上述视频内容特征变量进行分析,确定与上述目标拍摄场景对应的环境和/或物体类别及数量;
针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式;
将匹配出的多套视频处理方式所组成的视频处理套餐作为上述目标视频处理套餐。
在一种可能的设计中,上述服务器还包括第二获得模块和第三获得模块;其中:
上述第二获得模块,设置为获得上述终端设备对应用户的用户属性信息和/或历史观看信息;
上述第三获得模块,设置为将上述用户属性信息和/或上述历史观看信息分别进行词向量表示,以得到辅助特征变量;
上述匹配模块,设置为将上述视频内容特征变量和上述辅助特征变量一起输入上述套餐推荐模型进行套餐匹配,以得到上述目标视频处理套餐。
在一种可能的设计中,上述服务器还包括模型建立模块,设置为:
从已发布的视频中选择多个视频作为视频训练样本;
基于每个视频训练样本包括的视频序列帧的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,以得到每个视频训练样本的视频内容标签;
提取每个视频训练样本使用的视频处理套餐;
将每个视频训练样本的视频内容标签和对应的视频处理套餐作为训练特征输入预设网络模型进行训练学习,以获得上述套餐推荐模型。
在一种可能的设计中,上述模型建立模块设置为:
根据每个视频训练样本的历史交互数据确定其推荐积分值,其中,视频训练样本的历史交互数据用于表明用户与该视频训练样本之间的交互情况;
按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得上述套餐推荐模型。
在一种可能的设计中,上述模型建立模块设置为:
按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在上述预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,以得到上述套餐推荐模型;或
确定推荐积分值大于等于预定积分值的目标视频内容标签,再按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在上述预设网络模型内将每个目标视频内容标签和对应的视频处理套餐进行关联训练,以得到上述套餐推荐模型。
第六方面,提供一种服务器,该服务器包括:
第一接收模块,设置为接收终端设备发送的目标拍摄场景的场景描述信息以及针对上述目标拍摄场景拍摄的目标视频;
第二接收模块,设置为接收上述终端设备或建模服务器发送的目标视频处理套餐,其中,上述目标视频处理套餐是根据上述场景描述信息匹配的与上述目标拍摄场景对应的视频处理套餐,上述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
处理模块,设置为根据上述目标视频处理套餐对上述目标视频进行处理。
第七方面,提供一种视频处理装置,该视频处理装置包括:
存储器,用于存储程序指令;
处理器,用于调用上述存储器中存储的程序指令,按照获得的程序指令执行如第一方面中任一上述的方法包括的步骤,或者执行如第二方面中任一上述的方法包括的步骤,或者执行第三方面中任一上述的方法包括的步骤。
第八方面,提供一种存储介质,上述存储介质存储有计算机可执行指令,上述计算机可执行指令用于使计算机执行如第一方面中任一上述的方法包括的步骤,或者执行如第二方面中任一上述的方法包括的步骤,或者执行第三方面中任一上述的方法包括的步骤。
第九方面,提供一种视频处理装置,该视频处理装置包括至少一个处 理器及存储介质,当该存储介质中包括的指令被该至少一个处理器执行时,可以执行第一方面中任一上述的方法包括的步骤,或者执行第二方面中任一上述的方法包括的步骤,或者执行第三方面中任一上述的方法包括的步骤。
第十方面,提供一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现第一方面中任一上述的方法,或者实现第二方面中任一上述的方法,或者实现第三方面中任一上述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为相关技术中选择滤镜的操作示意图;
图2A为本申请实施例中的视频处理方法的应用场景示意图;
图2B为本申请实施例中的视频处理方法的另一应用场景示意图;
图2C为本申请实施例中的视频处理方法的另一应用场景示意图;
图3为本申请实施例中的视频处理方法的流程图;
图4为本申请实施例中的用户使用终端设备对目标拍摄场景拍摄视频的示意图;
图5为本申请实施例中的视频处理方法的另一流程图;
图6为本申请实施例中的视频处理方法的另一流程图;
图7为本申请实施例中的SSD的网络架构示意图;
图8为本申请实施例中视频处理方法的另一示意图;
图9为本申请实施例中的终端设备的结构框图;
图10为本申请实施例中的服务器的结构框图;
图11为本申请实施例中的服务器的另一结构框图;
图12为本申请实施例中的视频处理装置的结构示意图;
图13本申请实施例中的视频处理装置的另一结构示意图;
图14为本申请实施例中的视频处理装置的另一结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请的说明书和权利要求书及上述附图中的术语“第一”和“第二”是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的保护。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例中,“多个”可以表示至少两个,例如可以是两个、三 个或者更多个,本申请实施例不做限制。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,在不做特别说明的情况下,一般表示前后关联对象是一种“或”的关系。
以下对本文中涉及的部分用语进行说明,以便于本领域技术人员理解。
短视频,即短片视频,是一种互联网内容传播方式,一般是在互联网新媒体上传播的时长在5分钟以内(例如几秒到几分钟不等)的视频传播内容,是适合在移动状态和短时休闲状态下观看的、高频推送的视频内容。短视频的内容涵盖技能分享、萌娃记录、美妆美容、幽默搞怪、运动瘦身、时尚潮流、萌宠记录、社会热点、美食推荐等主题。
用户可以在各大短视频平台上观看各种主题的短视频,一般来说,短视频平台可以随机推荐一些热门的短视频,或者也可以根据用户定制的观看偏好针对性推荐,比如一个妈妈用户定制的观看偏好是萌娃记录和美妆美容类的短视频,那么平台在推荐时就会向该用户尽量多地推荐这两种主题的短视频。
在实际中,短视频的创造动机大致包括两种,一是原创型,二是模仿型,这两者的关系是:用户在观看了自己觉得有趣或者有意义的原创型短视频之后,可能激发自身的创造欲而模仿该原创型短视频来拍摄类似的视频,对于这类模仿拍摄的视频就称作模仿型视频。根据有效的调查数据显示,目前各大短视频平台上模仿型视频的比例是比较大的,约占到视频总量的50%-80%。
如前所述,用户在拍摄视频前可能会选择一些视频处理方式来对视频进行特殊处理,进而可以获得满意的视频效果。例如,用户在观看一条别人发的宝宝短视频时,觉得视频的动作和音乐都很有意思,就想模仿着这 个效果自己发布一条短视频,于是,打开短视频APP对准自己的宝宝准备拍摄,在点击拍摄按键之前,用户可以从如图1所示的操作选择区中选择需要的滤镜、特效、美颜、音乐及其它效果。以选择滤镜为例,用户在点击滤镜按键(如图1中的左图所示)后,操作选择区会弹出更多的滤镜(如图1中的右图所示)供用户选择,其中主要分为两大部分,分别为推荐滤镜区和默认排序滤镜区,其中的推荐滤镜区是按照例如短视频APP中滤镜的使用频率由高到低的顺序排列,或者是按照用户自己以前拍摄视频时使用频率由高到低的顺序排列,而默认排序滤镜则是随机排列的,例如是短视频APP的系统默认排序,用户可以在推荐滤镜区或者默认排序滤镜区中选择一种自己当前需要的滤镜效果,进而完成滤镜的选择。对于特效、美颜、音乐以及其它效果的选择均是与滤镜选择类似的,可见整个选择过程需要用户较多的操作步骤,比较繁琐,尤其是在需要依次选择滤镜、特效、美颜、音乐及其它效果时,则需要花费更长的时间。
在具体实践过程中,本申请的申请人发现,目前在视频拍摄前的视频处理方式的选择过程中,需要用户花费较多的操作和时间才能完成选择,整个过程完全是依靠用户的手动操作完成,耗时较长,选择视频处理方式的匹配效率较低。为此,本申请人考虑到,可以借助于终端设备自身的设备能力来为用户匹配视频处理方式,也就是说,通过终端设备为用户自动推荐视频处理方式,这样的话就无需用户再手动进行选择,可以减少用户的操作,并且,可以直接匹配出由多种视频处理方式组成的视频处理套餐,例如可以同时匹配出包括滤镜、美颜、特效和音乐在内的视频处理套餐,相对于用户一种一种依次选择的方式来说,可以提高效率。
在考虑由终端设备自动推荐的基础上,如何实现准确的推荐呢,基于此,本申请人进一步地挖掘现有短视频的特点,从而发现大多数短视频是始发于模仿的动机,所谓模仿就是效仿已发布视频中的场景、人物和动作 来实现类似的视频效果,也就是说,将要拍摄的视频和被模仿的视频的内容大致是相同的,而对于视频内容相同,就是指拍摄场景的环境、物体类型和数量是大致相同的,换句话说,就是拍摄场景的环境和拍摄场景中的拍摄对象是大致相同的,并且拍摄对象的动作也是大致相同的,这里所谓的拍摄对象是人、动物或者其它静物,例如是一个宝宝,或者是宝宝和妈妈,或者是小猫,或者是一部手机,或者是一株植物,等等,基于这些挖掘发现,本申请人就考虑到根据拍摄场景的场景描述信息来匹配对应的视频处理套餐,因为拍摄场景信息能够用于来描述拍摄场景的大致环境和拍摄对象的相关情况,所以通过场景描述信息来实现视频处理套餐的自动匹配可以使得匹配出来的结果能够尽量与当前实际拍摄的视频内容符合,这可以在一定程度上满足用户的实际拍摄需求,使得最后处理出来的视频能够符合用户的要求。另外,那么为了能够使得模仿拍摄的视频能够得到大多数人的喜欢,并且由于是模仿拍摄,那么说明用户本身也是比较喜欢被模仿的视频的整体视频效果的,所以在模仿拍摄时也就可以直接使用被模仿的视频的一些视频处理效果,即可以根据已发布视频的视频处理效果来确定当前实际需要的视频处理方式,这部分内容将在后文详细介绍。需要说明的是,即使对于原创视频,也可以根据场景描述信息找出比较相近的拍摄场景而进行匹配,从而将原创视频视为模仿型视频匹配出对应的视频处理套餐,上述原理同样适用。
根据上述分析,在相关技术中的视频处理方式的匹配效率较低的前提下,本申请实施例提供一种视频处理方法,用来提高视频处理方式的匹配效率和准确性。在该方法中,需要先获得目标拍摄场景的场景描述信息,再基于该场景描述信息自动匹配出与该目标拍摄场景对应的目标视频处理套餐,这样省去了用户的手动选择操作,可以在一定程度上提高匹配效率,并且由于是基于场景描述信息对应匹配的,所以可以在一定程度上提 高匹配的针对性和准确性,尽量使得匹配结果是与当前实际拍摄的视频相符合的,进而满足用户的实际需求。而该目标视频处理套餐中包括以预定处理模式对视频进行处理的至少一种视频处理方式,例如该目标视频处理套餐包括滤镜为“小森林”、美颜程度为3级,大眼瘦脸程度为2级,特效为“泡泡”,音乐为“乖娃娃”的多种视频处理方式,再以获得的目标视频处理套餐对针对目标拍摄场景拍摄得到的目标视频进行处理,进而获得处理后的目标视频,以滤镜为例,在以目标视频处理套餐对目标视频进行处理之前,该目标视频是没有任何滤镜效果的,而在处理之后,该目标视频就具有了滤镜效果了,进而可以使得拍摄得到的目标视频能够根据终端设备自动推荐的目标视频处理套餐实现多种视频效果。
下面对本申请实施例的技术方案能够适用的应用场景做一些简单介绍,需要说明的是,以下介绍的应用场景仅用于说明本申请实施例而非限定。在具体实施时,可以根据实际需要灵活地应用本申请实施例提供的技术方案。
图2A为本申请实施例中的视频处理方法能够适用的一种应用场景,在该应用场景中包括终端设备21和服务器22。在终端设备21中可以安装有能够拍摄视频的APP,为了便于描述,本申请实施例中将具有视频拍摄功能的APP称作视频APP,也就是说,终端设备21中安装并运行有视频APP的客户端,例如抖音、美拍、微视等视频APP的客户端,而服务器22是指与视频APP的客户端对应的服务端,例如是应用服务器,应用服务器可以为视频APP的安装和更新提供对应的安装包和更新包,在视频APP的运行过程中,视频APP的客户端可以与对应的应用服务器进行交互。在图2A所示的应用场景中,用户可以使用终端设备21利用其自身的摄像头(前置或后置)对目标拍摄场景进行拍摄以获得目标拍摄场景的场景描述信息,进而再根据获得的场景描述信息匹配与该目标拍摄场景对应 的目标视频处理套餐。
对于匹配的过程,可以由终端设备21中的视频APP的客户端独立进行,或者也可以由终端设备21将获得的场景描述信息上报给服务器22以通过服务器22来匹配目标视频处理套餐,即可以由服务器22(应用服务器)匹配。在匹配目标视频处理套餐之前、之后甚至与此同时,终端设备21还可以针对目标拍摄场景进行拍摄以得到目标视频,最后,再以匹配得到的目标视频处理套餐对拍摄得到的目标视频进行处理,进而获得处理后的目标视频,而利用目标视频处理套餐对目标视频进行处理的过程,可以由终端设备21执行,或者也可以由服务器22执行,若是由终端设备21执行的话,在对目标视频进行处理之后,终端设备21可以经由视频APP的客户端将处理后的目标视频发送给服务器22,最后再由服务器22将处理后的目标视频进行发布,在发布之前,应服务器22还可以对视频进行审核,如果是涉及不利于网络传播的视频内容的话则可以禁止发布。
图2B为本申请实施例中的视频处理方法能够适用的另一种应用场景,在该应用场景中包括终端设备21、服务器22和服务器23,其中的终端设备21与服务器22与图2A中的一样,而对于服务器23来说,可以是云端服务器,用于根据目标视频处理套餐对目标视频进行处理,在实际中,该云端服务器和服务器22可以是一个服务器,或者也可以如图2B所示的为分离的不同服务器,当服务器23和服务器22为不同服务器时,在服务器23对目标视频进行处理之后,可以将处理后的目标视频发送给服务器22,已通过服务器22(即应用服务器)进行审核并发布,或者也可以直接由服务器22自身进行发布,对于由服务器23进行发布的情形,服务器22和服务器23可以预先建立允许服务器23发布视频的相关协议,以避免由于服务器23违规发布而造成的不良影响。
图2C为本申请实施例中的视频处理方法能够适用的另一种应用场景, 在该应用场景中包括终端设备21、服务器22、服务器23和服务器24,其中的终端设备21、服务器22和服务器23已经在前面进行了相关描述,对于服务器24来说,是指用于建立套餐推荐模型的建模服务器,即服务器24可以建立套餐推荐模型,并且可以将建立的套餐推荐模型发送给终端设备21、服务器22和服务器23,以便于终端设备21、服务器22和服务器23能够利用该套餐推荐模型和场景描述信息来匹配出目标视频处理套餐,或者,服务器24自身也可以利用建立的套餐推荐模型,在接收终端设备21发送的场景描述信息之后,直接基于该场景描述信息和套餐推荐模型来匹配目标视频处理套餐。在具体实施过程中,根据由谁来利用套餐推荐模型和场景描述信息匹配目标视频处理套餐的不同,服务器24和终端设备21或不同的服务器之间可以选择性地建立通信连接,在图C中是以服务器24和终端设备21、服务器22和服务器23均建立通信连接为例进行图示说明。
以上举例说明了一些可能的应用场景,在另外一些应用场景中,例如只包括终端设备21、服务器22和服务器24的应用场景,等等,在具体实施过程中,可以根据实际的网络部署选择不同的应用场景来实施本申请实施例中的技术方案。总的来说,目标拍摄场景的描述信息和针对目标拍摄场景进行拍摄得到的目标视频由终端设备21执行,根据场景描述信息匹配目标视频处理套餐的操作可以由终端设备21或服务器22(应用服务器)或服务器23(云端服务器)或服务器24(建模服务器)执行,根据目标视频处理套餐对目标视频进行处理的操作可以由终端设备21或服务器22(应用服务器)或服务器23(云端服务器)执行,并且在一些可能的网络架构中,服务器22(应用服务器)、服务器23(云端服务器)和服务器24(建模服务器)可以是分离存在的三台服务器,或者其中的任意两个或三个可以部署成一台服务器。
前述的终端设备21,可以是手机、平板电脑、掌上电脑(Personal Digital Assistant,PDA),笔记本电脑、车载设备、智能穿戴式设备(例如智能手表和智能手环)、个人计算机,等等,无论是哪种设备,在该设备中均可以运行视频APP,即可以运行视频APP的客户端。前述的服务器22、服务器23和服务器24均可以是个人计算机、大中型计算机、计算机集群,等等。
为进一步说明本申请实施例提供的技术方案,下面结合附图以及具体实施方式对此进行详细的说明。虽然本申请实施例提供了如下述实施例或附图所示的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本申请实施例提供的执行顺序。所述方法在实际中的车辆行驶调整的处理过程中或者装置执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的应用环境)。
请参见图3所示的本申请实施例提供的视频处理方法的流程图,该方法的流程描述如下。
步骤31:终端设备获得目标拍摄场景的场景描述信息。
其中,拍摄场景,是指拍摄视频时所针对的场面,例如拍摄客厅中正在学走路的宝宝,那么客厅这个环境和客厅中所包括的所有物体(例如宝宝和沙发)所组成的画面即可以理解为是拍摄场景,换句话说,可以将拍摄场景理解为是拍摄视频时所针对的环境以及该环境中所包括的所有拍摄对象的集合。而目标拍摄场景可以是指针对某一特定场景的称呼,例如将要进行最终视频拍摄的拍摄场景称作目标拍摄场景。
在拍摄视频的过程中,拍摄场景可能会发生转变,例如对于一段15秒的短视频,前8秒拍摄的是客厅中正在学走路的宝宝,后7秒拍摄的是 厨房中妈妈拉着宝宝学走路的画面,在实际中可以根据拍摄画面是否发生了预定程度的变化来确定是否发生了拍摄场景的切换,继续前述例子,当拍摄画面从客厅中正在学走路的宝宝变化为厨房中妈妈拉着宝宝学走路,由于背景(从客厅变化为厨房)以及拍摄对象(从宝宝一个人变成了宝宝和妈妈两个人)均发生了实质性改变,则可以认为拍摄场景发生了变化,若是从客厅中扶着沙发边缘一侧学走路的宝宝变化为扶着沙发边缘另一侧学走路的宝宝,由于只是环境的少许变化,此时则可以认为场景并未发生切换。
本申请实施例中,场景描述信息是指用于描述拍摄场景的相关情况的信息,具体来说,场景描述信息是指对拍摄场景的环境和/或拍摄场景中的拍摄对象的特征进行描述的特征描述信息,为了便于描述,例如可以将拍摄场景的环境称作拍摄环境。其中,拍摄环境的特征描述信息例如可以包括拍摄时间、拍摄地理位置、拍摄时所使用的拍摄设备的设备信息(例如为**品牌的**型号的手机)和拍摄参数信息等能够用于描述当前的拍摄环境的所有信息,而拍摄对象的特征描述信息即为能够刻画出拍摄对象当前的实际状态的所有属性信息,例如拍摄对象的种类、身高、肤色、毛发颜色(例如是一只白色的猫)、表情、动作等等能够客观描述拍摄对象的所有属性信息。
例如,对于一个宝宝在客厅的拍摄场景来说,该拍摄场景的场景描述信息可以包括客厅的光照强度,拍摄时间、客厅背景墙面的大致颜色(例如白色墙面),客厅的大概形状和大概尺寸(例如是大概长3米宽2米的长方形的客厅),客厅中所包括的物品种类及每种物品的主要特征(例如包括一个蓝色的沙发和一个白色的茶几),宝宝的大概身高(例如90厘米)和肤色,宝宝的大概动作(例如是站立、坐或者仰趟),宝宝的发型(例如是光头、短发或者扎着两个小辫子),宝宝穿的衣服款式,宝宝手上是 否拿着东西,等等。
又例如,对于拍摄主题为风景的一拍摄场景来说,该拍摄场景的场景描述信息可以包括环境的光照强度,当前的天气情况(例如是下雪、下雨或者阳光充足),拍摄时间、拍摄地点(例如是某风景区)、拍摄场景中占较大比例的拍摄物体(例如是一片树林、瀑布或者奔流的河水),拍摄物体的大致形状和颜色,等等。
又例如,对于拍摄主题为一块蛋糕的拍摄场景来说,该拍摄场景的场景描述信息可以包括蛋糕的形状、颜色和层数,用于放置蛋糕的承载面(例如桌面或者专用于盛放蛋糕的果盘)的形状和颜色,当前环境的光照强度,等等。
再例如,对于拍摄主题为一个正在唱歌的歌手来说,该拍摄场景的场景描述信息可以包括环境的光照强度和声音强度,拍摄时间、拍摄地点、所唱歌曲的音乐风格,歌手唱歌的语速,歌手的肤色、发型和服装造型,歌手的动作,等等。
以上是以列举的几个具体的拍摄场景来对本申请中的场景描述信息进行解释说明,总的来说,通过场景描述信息不仅可以确定出拍摄场景的环境,还可以确定出拍摄场景中的所有拍摄对象,以及每个拍摄对象的具体特征,等等。其中,拍摄对象可以是动态或静态的,动态的拍摄对象例如包括人和动物,例如宝宝、妈妈和小猫,而静态的拍摄对象例如是手机或者一盆绿植,等等,可见,通过场景描述信息可以知晓目标拍摄场景的大致环境和拍摄对象的组成。
本申请实施例中的场景描述信息所包括的不同种类的特征描述信息可以通过不同的方式获得,此处所指的不同种类的特征描述信息至少包括拍摄环境的特征描述信息和拍摄对象的特征描述信息,为了便于理解,以下举例说明。
1)拍摄环境的特征描述信息。
在获得目标拍摄场景的预览视频或者正式拍摄视频的过程中,拍摄设备可以实时获得拍摄时间和拍摄地理位置,例如是在2018年6月6日16时32分于某某风景区拍摄的,通过拍摄时间和拍摄地理位置则可以在时间和空间上对当前的目标拍摄场景的情况有一个大概的了解。
可以将拍摄时间和拍摄地理位置上传到后台或者云端,进而可以通过网络查找的方式确定出当前的实际天气情况,例如确定出的天气是“晴天,温度为28℃~33°”,也就是说,通过拍摄设备自身能够检测到的一些客观信息再结合网络查找的方式即可在线得到目标拍摄场景的相关的特征描述信息;或者,也可以直接以拍摄时间和拍摄地理位置与系统中已经发布(包括正在审核和正在发布过程中)的视频进行匹配查找,进而可以将拍摄时间匹配(例如拍摄时间间隔10分钟之内)和拍摄地理位置匹配(例如拍摄地理位置之间的距离相隔在2千米之内)的视频所对应的天气确定出当前的实际天气情况,也就是说,可以通过视频平台系统本身来确定当前的实际天气情况,利用视频数据公开共享的分享机制可以方便多个用户之间的直接交互。
另外,对于天气的确定方式,也可以不借助网络查找的方式而直接获得,例如,拍摄设备可以通过内置传给器检测当前的温度、湿度和光照强度,进而再通过这些参数值近似地确定出当前的实际天气情况。
以及,对于拍摄设备的设备信息和拍摄参数信息,以拍摄设备是手机为例,主要是考虑到不同品牌的不同型号的手机所具备的拍摄能力可能有所不同,以及每个用户在使用拍摄设备进行拍摄时所设置的拍摄参数也可能不同,正式由于可能存在这些差异,所以使得不同的目标拍摄场景的场景描述信息也可能存在差异,将这些差异考虑在内可以使得对于场景描述信息的确定更加准确,以便于后续在确定目标视频处理套餐时的匹配性更 好。
2)拍摄对象的特征描述信息。
拍摄对象是实际存在于目标拍摄场景,所以对于拍摄对象的特征描述信息是客观的真实信息,对于该种特征描述信息,可接对拍摄得到的目标拍摄场景的预览视频或者正式拍摄视频的各视频帧进行图像识别和图像特征提取,以利用图像处理的方式来获得各拍摄对象的特征描述信息。
在具体实施过程中,可以是在正式开始拍摄视频之前通过终端设备的摄像头获得目标拍摄场景的预览视频,然后通过预览视频来获得场景描述信息。或者,也可以是在正式拍摄视频的过程中根据实时拍摄得到的目标视频来获得场景描述信息。具体来说,可以对预览视频的视频序列帧或对目标视频的视频序列帧进行图像识别,进而获得每帧图像的关键特征信息,最后再根据所有帧的关键特征信息来确定出场景描述信息,也就是说,可以在视频正式拍摄之前,就通过预览视频的方式获得场景描述信息,或者也可以是在正式拍摄视频之后或者同时,通过已经拍摄得到的视频来获得场景描述信息,可见,本申请实施例中对于场景描述信息的确定时刻可以不做特别要求。
对于在视频正式拍摄前就获得场景描述信息的方式,那么则可以提前根据场景描述信息确定出目标视频处理套餐,进而可以在视频拍摄的过程中,针对获得的每一帧或连续的多针图像以目标视频处理套餐进行实时处理,那么在视频拍摄完毕时自然也就可以得到处理后的视频,这样可以使得视频的拍摄和处理是尽量同步进行的,从而可以确保视频处理的及时性,提高视频处理的效率。
对于在视频开始拍摄后再获得场景描述信息的方式,则可以在视频拍摄完毕之后再统一对视频中的每帧视频图像以目标视频处理套餐进行处理,或者也可以在一旦确定场景描述信息并根据场景描述信息确定出目标 视频处理套餐之后则对已经拍摄的视频进行及时处理,以及对后面拍摄的视频的每帧图像或者连续的多帧图像进行实时处理。在该种方式中,是根据实际拍摄得到的视频来确定场景描述信息,那么所获得的场景描述信息就可以最大程度上体现当前实际拍摄的场景,准确性更高,这样可以尽量避免由于场景发生变化所导致的场景描述信息更新不及时的情形,进而可以提高目标视频处理套餐确定的准确性,使得最终确定出的目标视频处理套餐是尽量与当前实际拍摄的场景相符合的,以提高视频处理的有效性和准确性,满足用户的实际需求。
本申请实施例中,根据场景描述信息的获取时刻不同,提供了两种可选的方式,可以提高本申请实施例中方案的多样性,进而使得本申请实施例中的方案能够应用于不同的应用场景,进而提高了方案的普适性。
另外,本申请实施例中的关键特征信息为每帧视频图像中所占面积最大和/或视觉呈现于最前位置的拍摄对象的特征信息,例如参考图4所示,此时用户正使用手机为对面的小孩拍摄视频,而在该小孩所在的目标拍摄场景中还包括位于小孩右后侧的猫,而在整个目标拍摄场景中,小孩所占面积最大并且也是位于场景举例拍摄用户的最前位置,所以此时可以将该小孩看作是该目标拍摄场景中的关键拍摄对象,或者乘坐主要拍摄对象,那么则可以将该小孩的特征信息确定为是该目标拍摄场景的关键特征信息,而由于小猫距离镜头较远所以则不予考虑。根据该小孩的特征信息可以确定出该目标拍摄场景中有一个小孩,并且该小孩是处于立正的姿态,所以可以将该目标拍摄场景的关键特征信息确定为“一个立正的小孩”,最后则可以将该关键特征信息直接作为场景描述信息,那么通过这样的场景描述信息则可以知晓该目标拍摄场景中有一个处于立正状态的小孩。
步骤32:终端设备根据获得场景描述信息,匹配与目标拍摄场景对应的目标视频处理套餐,其中,该目标视频处理套餐包括以预定处理模式对 视频进行处理的至少一种视频处理方式。
在获得场景描述信息之后,终端设备可以按照预设套餐推荐策略来对匹配与目标拍摄场景对应的目标视频处理套餐。本申请实施例中的目标视频处理套餐,简单理解,就是至少一种视频处理方式的集合,通过该至少一种视频处理方式可以对视频以预定处理模式进行处理,并且在处理之后可以获得对应的视频处理效果,例如该目标视频处理套餐包括滤镜为“小森林”、美颜程度为3级,大眼瘦脸程度为2级,特效为“泡泡”,音乐为“乖娃娃”的多种视频处理方式,再以获得的目标视频处理套餐对针对目标拍摄场景拍摄得到的目标视频进行处理,进而获得处理后的目标视频,以滤镜为例,在以目标视频处理套餐对目标视频进行处理之前,该目标视频是没有任何滤镜效果的,而在处理之后,该目标视频就具有了滤镜效果了,进而可以使得拍摄得到的目标视频能够根据终端设备自动推荐的目标视频处理套餐实现多种视频效果。
为了对本申请实施例中的方案进行说明,以下结合图5对本申请实施例中确定目标视频处理套餐的方式进行说明。
第一种方式
在第一种方式中,是结合预先建立的套餐推荐模型来确定目标视频套餐,具体的流程如图5中的步骤511-步骤518,以下详细说明。
步骤511:终端设备确定预先建立的套餐推荐模型。
该预先建立的套餐推荐模型是指提前建立的用于推荐视频处理套餐的数据模型,该套餐推荐模型可以是根据大量已发布视频的视频内容进行深度学习而建立的模型,该套餐推荐模型可以由视频APP对应的应用服务器建立,或者也可以由专门的建模服务器建立,无论通过哪种方式建立,建立好的套餐推荐模型均可以嵌入视频APP的客户端,作为视频APP的一种嵌入功能使用,所以在该视频APP的客户端被安装在终端设备中后, 终端设备就可以获得该套餐推荐模型。
本申请实施例中的套餐推荐模型可以是针对已发布的多个视频利用多任务网络模型进行深度学习后得到的模型,所以在根据场景描述信息与套餐推荐模型来确定目标视频处理套餐时,可以尽量参考已发布的视频所使用的视频处理套餐,这样确定出的目标视频处理套餐能够尽量符合大众的使用习惯和爱好,进而可以使得处理后的目标视频能够尽量令大众喜欢。
在具体实施过程中,可以按照图6所示的方法建立该套餐推荐模型。可选地,上述套餐推荐模型可以由终端设备或建模服务器或云端服务器建立。
步骤61:从已发布的视频中选择多个视频作为视频训练样本。
在确定视频训练样本时,对视频来源不作特别限定,例如所有的视频训练样本均来源于同一视频APP,或者也可以来自于不同的视频APP。并且为了确保学习得到的套餐推荐模型的准确性和普适性,视频训练样本可以包括海量的已发布的视频,例如10万条短视频。另外,为了使得建立的套餐推荐模型能够尽量为不同视频内容进行套餐推荐,在选择视频训练样本时可以尽量选择多种主体的视频,以及,为了能够尽量使得建立的套餐推荐模型能够涵盖最近时间段内的用户使用的视频处理套餐,可以选择最近时间段内(例如一周内)的视频作为视频训练样本。
步骤62:对每个视频训练样本包括的视频序列帧进行图像识别,以获得每个视频训练样本的图像识别结果。这里所说的图像识别主要是对每帧图像所包括的基础特征进行识别,例如颜色特征和形状特征,等等。
步骤63:基于每个视频训练样本的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,得到每个视频训练样本的视频内容标签。
根据每个视频训练样本的图像识别结果,则可以每帧图像之间的颜色 特征和形状特征及其它特征之间的相似和偏差,再通过特定的数据处理方法以确定出每个视频训练样本的拍摄环境和/或拍摄对象,例如对于其中一个视频训练样本,确定出该视频训练样本的拍摄环境是太阳天气下的草坪,以及确定出的拍摄对象是一个小孩和一个中年女人,可以对确定出的拍摄环境和/或拍摄对象进行标记,根据标记得到每个视频训练样本的视频内容标签,继续前述例子的视频训练样本来说,得到的视频内容标签就是“晴天的草坪”和“一个小孩一个中年女人”。
步骤64:提取每个视频训练样本所使用的视频处理套餐。
因为每个视频训练样本都是用户已经发布在网络中的视频,在发布之前用户一般对其进行了一些处理的,也就是说,这些视频训练样本都是在原始视频的基础上添加了某些视频效果的,所以可以对每个视频训练样本进行分析,进而确定出每个视频训练样本对应的视频效果所使用的视频处理方式,再将每个视频训练样本所使用的所有的视频处理方式的集合确定为该视频训练样本对应的视频处理套餐。
在获得每个视频训练样本的视频内容标签和对应的视频处理套餐之后,则可以将每个视频训练样本的视频内容标签和对应的视频处理套餐作为训练特征输入预设网络模型进行训练学习,并根据最终的训练学习结果得到本申请实施例中的套餐推荐模型。在具体实施过程中,可以采用现有的学习模型来建立模型,例如采用logistic回归方法、决策树法或者其它预设网络模型来对视频内容标签和对应的视频处理套餐进行训练学习,具体采用何种预设网络模型本申请实施例不作限制。
步骤65:根据每个视频训练样本的历史交互数据确定该视频训练样本的推荐积分值。
步骤66:按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得套餐推荐模型。
针对不同的视频训练样本,也有各自的热度,而这里所说的热度是表明视频受用户欢迎的程度,例如在本申请实施例中可以用视频的历史交互数据来表征视频受欢迎的程度,而历史交互数据可以用于表明所有用户与该视频训练样本之间的交互情况,例如用户的观看行为以及社交行为,对应到视频训练样本来说则可以是视频的观看数据和社交数据。其中,视频的观看数据可以包括被观看的总用户数、被观看的总次数,每次被观看的总时长等等所有与用户的观看相关的数据,视频的社交数据可以包括点赞次数、转发次数、评论次数、下载次数等等所有与用户进行社交行为相关的数据。
根据每个视频训练样本的历史交互数据,可以计算每个视频训练样本的推荐积分值,推荐积分值相当于是视频训练样本的热度,推荐积分值越高的表明热度越大,说明受用户喜欢的程度越大,由于喜欢的用户较多,那么说明该视频训练样本的整体视频效果也是能够得到大多数的认可和喜欢的额,所以对于不同推荐积分值的视频训练样本可以设置对应不同的训练权重,以尽量突出热度较高的视频训练样本,进而可以使得训练得到的推荐套餐模型能够尽量符合大众需求,增加模型的适用性和普适性。
对于推荐积分值的计算,例如可以采用以下方式:1)针对观看数据,假设观看时间大于10秒的+1分,一次观看次数+1分,同一用户观看的次数大于预定次数(例如3次)的+1分;2)针对社交数据,假设被点赞一次+1分,被转发一次+1分,被下载一次+1分,被评论一次+1分,评论一次的字数超过预定字数(例如30字)+2分。再将观看数据和社交数据的分别得到进行相加,则可以得到最终的推荐积分值。
对于步骤66的具体实施方式,在具体实施过程中按照实际使用需求可以采用以下两种方式中的任意一种。
方式一
步骤661:按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,以得到套餐推荐模型。
也就是说,针对所有的视频训练样本,均可以得到一个对应的推荐积分值,再按照推荐积分值越高对应的训练权重越大的训练原则,在预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,由于推荐积分值越大的表明是越受大众用户喜欢的,所以将其以更大的训练权重进行训练的话,则可以将其更为突出,那么这类视频训练样本对应的视频处理套餐也会被加入套餐推荐模型的推荐池中,以便于后续在利用套餐推荐模型进行推荐的过程中,能够尽量优先地推荐给用户。并且,是将所有的视频训练样本都输入了预设网络模型进行训练,这样可以使得样本尽量全面,进而提高套餐推荐模型的普适性。
方式二
步骤662:确定推荐积分值大于等于预定积分值的目标视频内容标签。
步骤663:按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在预设网络模型内将每个目标视频内容标签和对应的视频处理套餐进行关联训练,以得到套餐推荐模型。
通过步骤662-步骤663的方式,相当于是先以预定积分值将部分热度较低的视频训练样本过滤掉,而过滤掉的这部分视频训练样本就不再输入预设网络模型进行训练,因为推荐积分值较低则表明对应的视频训练样本是较难得到大部分用户的认可和喜欢的,属于极小众的样本,所以即使将这些作为样本进行训练,那么训练得到的套餐推荐模型中也很难将其对应的视频处理套餐推荐给用户使用,所以为了减少模型训练和学习的数据量,同时也为了提高套餐推荐模型的有效性,可以先进行入步骤662的过滤处理。
步骤512:终端设备将场景描述信息进行词向量表示,以得到视频内容特征变量。
在套餐匹配的过程中,针对场景描述学习也可以通过深度学习的方式来确定视频内容,进而得到视频内容特征变量。具体来说,可以场景描述信息对应的视频序列帧(例如前述的预览视频的视频序列帧或目标视频的视频序列帧)输入到检测网络模型,该检测网络模型可以自动识别出每帧图像中的物体以及物体位置,并对每个物体进行分类标记,最后在通过识别结果对场景描述信息添加标签,也就是说,终端设备可以通过检测网络模型对场景描述信息进行词向量表示,以得到检测网络模型能够识别和处理的视频内容特征。
目前有很多检测网络模型,比如R-CNN、faster R-CNN、yolo、SSD,等等,在本申请实施例中以SSD网络框架为例来说明得到视频内容特征变量的过程,请参见图7所示的SSD网络结构示意图,SSD网络结构的训练流程如下:
(1)首先通过基础网络对视频序列帧进行基础特征提取,例如对颜色特征、形状特征等基础特征进行提取。如图7所示,SSD网络架构中的基础网络是VGG-16网络,而由于在本申请实施例中需要较快的处理速率以满足动态视频的识别需求,可以将SSD网络架构中的VGG-16网络以更为轻量级的mobilenet网络替换。
(2)添加全连接层和卷积层获取特征图,即基于提取的基础特征生成特征图像。
(3)对新增的各个特征图上设置预测目标框,通过预测目标框对拍摄对象的位置进行预测。
(4)对每个预测目标框进行预测类别,以及把每个预测目标框和拍摄对象的实际标记框作对比,计算偏差(loss)。
(5)再通过迭代训练的方式,不断学习让预测标记框对应的物体类别、框大小及位置尽量与实际标记框尽可能接近。
以上流程不断学习,当达到标准时可以认为得到了场景描述信息针对的视频序列帧的视频内容特征变量。
以上是以终端设备进行套餐匹配为例进行介绍,当匹配套餐的操作是由其它(例如应用服务器或者建模服务器)执行时,也可以按照类似的方式进行。
步骤513:终端设备将视频内容特征变量输入套餐推荐模型进行套餐匹配,以得到与场景描述信息匹配的至少一个推荐的视频处理套餐。
在得到了视频内容特征变量之后,套餐推荐模型则可以将该视频内容特征变量作为一个输入变量输入来匹配对应的视频处理套餐,例如,以“一个宝宝”作为视频内容特征变量输入套餐推荐模型,套餐推荐模型可以推荐一个或多个推荐的视频处理套餐,例如可以按照前述的推荐积分值最高的三个视频训练样本对应的视频处理套餐,即可以得到三个推荐的视频处理套餐。
若套餐推荐模型只推荐了一个视频处理套餐,那么则可以直接将该一个视频处理套餐作为最终的目标视频处理套餐。
若套餐推荐模型推荐了多个视频处理套餐,那么则可以按照实际使用需求选择以下方式中的其中一种来确定最终的目标视频处理套餐。
方案一:执行步骤514,即,终端设备将至少一个推荐的视频处理套餐中使用频率最高的确定为目标视频处理套餐。这里的使用频率可以以推荐积分值来进行衡量,即将推荐积分值最大对应的视频处理套餐确定目标视频处理套餐,这样可以使得处理后的目标视频能够符合大众用户的喜好。
方案二:执行步骤515,即,终端设备先根据用户属性信息确定出该用户属性信息匹配的视频处理套餐,为了便于描述,本申请实施例中与用 户属性信息匹配的视频处理套餐称作优先视频处理套餐,再从至少一个推荐的视频处理套餐中确定出该优先视频处理套餐之间的相似度最大的视频处理套餐,最后再将该相似度最大的视频处理套餐确定为最终的目标视频处理套餐。
其中,用户属性信息可以是指用户在首次使用视频APP时或者在注册时填写的用户自身的偏好设置以及自身的相关信息,例如性别、年龄、人生阶段(未婚、已婚、怀孕、生娃或恋爱)、职业、视频效果偏好(比如喜欢哪种滤镜、美颜级别和哪种特效,等等)、视频主题偏好(比如喜欢萌娃主题和美妆美容主题类型的视频),等等。也就是说,通过用户属性信息可以大致知晓用户喜欢的视频效果,进而可以结合这些因素估计出与该用户匹配的优先视频处理套餐。
也就是说,在方案二中可以在套餐推荐系统推荐的多个视频处理套餐的基础上,再结合到用户自身的实际喜好来选择一个与用户实际喜好最相符的(即相似度最大的)的一个作为最终的目标视频处理套餐,这样可以满足用户的差异化需求,并且可以尽量与用户的实际需求相匹配。
方案三:
终端设备先确定出与用户属性信息匹配的优先视频处理套餐。
步骤516:终端设备判断至少一个推荐的视频处理套餐中是否有与该优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐。
步骤517:若有的话,终端设备则将相似度最大的推荐的视频处理套餐确定为目标视频处理套餐。
步骤518:若没有的话,终端设备则直接将优先视频处理套餐确定为目标视频处理套餐。
也就是说,针对套餐推荐模型推荐的多个视频处理套餐,可以先以预 定相似度与优先视频处理套餐进行筛选,如果相似度低于预定相似度的话,则表明推荐的视频处理套餐与用户的实际需求差异很大,如果这时候勉强使用推荐的视频处理套餐对目标视频进行处理的话,那么多处理后的视频效果在很大程度上也是用户自己不满意的,所以此时为了尽量满足用户当前的实际需求,在推荐的所有视频处理套餐均不满足前述条件的话,则可以直接将优先视频处理套餐临时作为最终的目标视频处理套餐。
另外,对于建模服务器根据场景描述信息匹配目标视频处理套餐的情形来说,建模服务器可以将场景描述信息进行词向量表示以得到视频内容特征变量,再对视频内容特征变量进行分析,以确定出与目标拍摄场景对应的环境和/或物体类别及数量,然后再针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式,然后在将匹配出的多套视频处理方式所组成的视频处理套餐作为最终的目标视频处理套餐,也就是说可以针对不同的物体分别匹配一套对应的视频处理方式,这样可以对不同类别的物体进行针对性地差异处理,以尽量提升视频处理的多样性,并且由于针对每种类别的物体所确定出的一套视频处理方式也是由训练好的套餐推荐模型推荐的,所以也可以尽量符合大众需求,确保方案的普适性。
例如,检测出目标拍摄场景的环境是晴朗的天空,以及其中包括的拍摄对象有一个小孩、一个中年女人和一只猫,那么就可以针对晴朗的天空、小孩、中年女人和猫分别匹配一套对应的视频处理方式,则可以得到四套视频处理方式,然后再将这四套视频处理方式的集合确定为是最终的目标视频处理套餐进行推荐。
另外,在建模服务器进行推荐的过程中,还可以获得用户的用户属性信息和/或历史观看信息,再分别将用户属性信息和/或历史观看信息分别进行词向量表示,以得到辅助特征变量,最后再将前述根据场景描述信息得到的视频内容特征变量和这里得到的辅助特征变量一起输入套餐推荐 模型进行套餐匹配,进而得到推荐的目标视频处理套餐。可见,对于建模服务器进行套餐匹配推荐的情形来说,建模服务器最后推荐的始终只有一种视频处理套餐,采用将用户属性信息和历史观看信息作为辅助推荐因素的方式,也是确保在推荐的过程中尽量将用户的实际因素考虑在内,进而实现套餐的准确推荐。
第二种方式
在第二种方式中,是结合预先设置的场景与视频处理套餐对应集合来确定目标视频套餐,具体的流程如图5中的步骤521-步骤523,以下详细说明。
步骤521:终端设备确定预先设置的场景与视频处理套餐对应集合。
也就是说,用户可以预先设置场景与视频处理套餐的对应关系,比如宝宝用套餐A,宝宝+妈妈用套餐B,宝宝+爸爸用套餐C,宠物(小猫或小狗)用套餐D,等等,当然在设置的过程中,也可以预先拍摄语段视频或者以预览视频的方式来设置场景,进而再针对每种场景设置对应的视频处理套餐。在另外一种可能的实施方式中,该对应关系也可以是由视频APP默认配置的。
步骤522:终端设备将场景描述信息输入前述对应集合进行匹配查找,以得到与该场景描述信息匹配度最高的视频处理套餐。
步骤523:终端设备将匹配度最高的视频处理套餐作为目标视频处理套餐。
在第二种方式中,通过预置的前述对应关系即可以针对不同的视频拍摄场景自动匹配出相适应的视频处理套餐,可以满足用户的实际拍摄需求,并且可以由用户进行个性化定制,以便于随时修改和更新,所以能够较大程度上满足用户的实际拍摄需求。
步骤33:终端设备获得针对目标拍摄场景进行拍摄得到的目标视频。
另外,在前述匹配视频推荐套餐之前、之后或同时,可以根据实际情形获得目标视频。
步骤34:终端设备根据目标视频处理套餐对目标视频进行处理,获得处理后的目标视频。
在以目标视频处理套餐对目标视频进行处理之后,目标视频就可以具备对应的视频效果,提升视频的美化程度。
步骤35:终端设备将处理后的目标视频发送给对应的应用服务器,应用服务器则可以接收处理后的目标视频。
步骤36:应用服务器对接收的处理后的目标视频进行审核,并在审核通过后进行发布。
最后,针对处理后的目标视频,为了实现社交分享,还可以将其在网络上进行发布,发布的过程具体如步骤35和步骤36所述,具体来说可以执行相关技术中的视频发布流程,此处就不展开说明了。
另外,考虑到拍摄场景会发生变化的情形,在本申请实施例中,可以在场景描述信息表明目标拍摄场景中的拍摄对象发生变化时,分别确定拍摄对象发生变化前后的目标视频处理套餐,进而再在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得到的视频进行处理。这样可以针对场景变化前后分别以不同的视频处理套餐进行对应处理,这样可以提高视频处理的有效性。
如前所述,匹配套餐的操作和对目标视频进行处理的操作可能由不同的执行主体执行,为了便于本领域技术人员理解,以下以图8所示的流程图对其中一种可能的实施方式进行说明。
步骤81:终端设备将场景描述信息发送给建模服务器。建模服务器可以接收到场景描述信息。
步骤82:建模服务器根据场景描述信息,匹配与目标拍摄场景对应的目标视频处理套餐。具体来说,可以通过建模服务器中的套餐推荐模型进行推荐。
步骤83:建模服务器将匹配出来的目标视频处理套餐发送给云端服务器。云端服务器可以接收到目标视频处理套餐。
步骤84:终端设备将获得的目标视频发送给云端服务器。云端服务器可以接收到目标视频。
步骤85:云端服务器根据目标视频处理套餐,对目标视频进行处理,以获得处理后的目标视频。
需要说明的是,上述步骤83、步骤84之间的顺序可以改变,即,云端服务器可以先接收目标视频处理套餐,再接收目标视频,或者先接收目标视频,再接收目标视频处理套餐,或者同时接收目标视频与目标视频处理套餐。云端服务器接收到目标视频与目标视频处理套餐后,再根据目标视频处理套餐对目标视频进行处理,获得处理后的目标视频。
步骤86:云端服务器将处理后的目标视频发送应用服务器。应用服务器可以接收到处理后的目标视频。
步骤87:应用服务器对处理后的目标视频进行审核,并在审核通过后发布。
需要说明的是,上述每一种实施例中,用户都可以自己选择是否启用自动匹配视频处理套餐的功能,如果用户关闭该功能,则允许用户采用自行设定各种视频处理方式,并利用用户设定的视频处理方式组成的视频处理套餐,处理用户拍摄的视频,本领域技术人员可以在前述公开的每一个 实施例的基础上,具体实现这一方案,本申请实施例不再一一详细描述。
基于同一申请构思,本申请实施例提供一种终端设备,该终端设备例如可以是前述图2A-图2C中的终端设备21。该终端设备可以是硬件结构、软件模块、或硬件结构加软件模块。该终端设备可以由芯片系统实现,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
请参见图9所示,本申请实施例中的终端设备可以包括获得模块91、匹配模块92和处理模块93。其中:
获得模块91,设置为获得目标拍摄场景的场景描述信息;
匹配模块92,设置为根据场景描述信息,匹配与目标拍摄场景对应的目标视频处理套餐,其中,目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
处理模块93,设置为根据目标视频处理套餐,对针对目标拍摄场景拍摄的目标视频进行处理。
在一种可能的实施方式中,匹配模块92设置为:
将场景描述信息进行词向量表示,以得到视频内容特征变量;
将视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与场景描述信息匹配的至少一个推荐的视频处理套餐;
根据至少一个推荐的视频处理套餐,确定目标视频处理套餐。
在一种可能的实施方式中,匹配模块92设置为:
确定至少一个推荐的视频处理套餐中使用频率最高的为目标视频处理套餐;或
确定至少一个推荐的视频处理套餐中与优先视频处理套餐之间相似度最大的为目标视频处理套餐,其中,优先视频处理套餐是根据用户属性 信息匹配的视频处理套餐。
在一种可能的实施方式中,匹配模块92设置为:
判断至少一个推荐的视频处理套餐中是否有与优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐,其中,优先视频处理套餐是根据用户属性信息匹配的视频处理套餐;
若有,则将相似度最大的推荐的视频处理套餐确定为目标视频处理套餐;
若没有,则将优先视频处理套餐确定为目标视频处理套餐。
在一种可能的实施方式中,匹配模块92设置为:
将场景描述信息输入预先设置的场景与视频处理套餐对应集合进行匹配查找,以得到与场景描述信息匹配度最高的视频处理套餐;
将匹配度最高的视频处理套餐确定为目标视频处理套餐。
在一种可能的实施方式中,匹配模块92设置为:
在场景描述信息表明目标拍摄场景中的拍摄对象发生变化时,分别确定拍摄对象发生变化前后的目标视频处理套餐;
处理模块93,设置为在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得到的视频进行处理。
在一种可能的实施方式中,获得模块91设置为:
获得目标拍摄场景的预览视频,或者,获得目标拍摄场景实际拍摄的目标视频;
对预览视频的视频序列帧或对目标视频的视频序列帧进行图像识别,以获得每帧的关键特征信息,其中,关键特征信息为每帧中所占面积最大和/或视觉呈现于最前位置的拍摄对象的特征信息;
根据所有帧的关键特征信息,确定场景描述信息。
其中,前述图3、图5所示的视频处理方法实施例涉及的各步骤的相关内容均可以援引到本申请实施例中的对应功能模块的功能描述,在此不再赘述。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
基于同一申请构思,本申请实施例提供一种服务器,该服务器例如可以是前述图2A-图2C中的服务器24,即建模服务器。该服务器可以是硬件结构、软件模块、或硬件结构加软件模块。该终端设备可以由芯片系统实现,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
请参见图10所示,本申请实施例中的服务器可以包括接收模块101、第一获得模块102、匹配模块103和发送模块104。其中:
接收模块101,设置为接收终端设备发送的目标拍摄场景的场景描述信息;
第一获得模块102,设置为将场景描述信息进行词向量表示,以得到视频内容特征变量;
匹配模块103,设置为将视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与目标拍摄场景匹配的目标视频处理套餐,其中,目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
发送模块104,设置为将目标视频处理套餐发送给终端设备或云端服务器,以使终端设备或云端服务器根据目标视频处理套餐对针对目标拍摄场景拍摄的目标视频进行处理。
在一种可能的实施方式中,匹配模块103设置为:
对视频内容特征变量进行分析,确定与目标拍摄场景对应的环境和/或物体类别及数量;
针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式;
将匹配出的多套视频处理方式所组成的视频处理套餐作为目标视频处理套餐。
在一种可能的实施方式中,服务器还包括第二获得模块和第三获得模块;其中:
第二获得模块,设置为获得终端设备对应用户的用户属性信息和/或历史观看信息;
第三获得模块,设置为将用户属性信息和/或历史观看信息分别进行词向量表示,以得到辅助特征变量;
匹配模块103,设置为将视频内容特征变量和辅助特征变量一起输入套餐推荐模型进行套餐匹配,以得到目标视频处理套餐。
在一种可能的实施方式中,服务器还包括模型建立模块,设置为:
从已发布的视频中选择多个视频作为视频训练样本;
基于每个视频训练样本包括的视频序列帧的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,以得到每个视频训练样本的视频内容标签;
提取每个视频训练样本使用的视频处理套餐;
将每个视频训练样本的视频内容标签和对应的视频处理套餐作为训 练特征输入预设网络模型进行训练学习,以获得套餐推荐模型。
在一种可能的实施方式中,模型建立模块设置为:
根据每个视频训练样本的历史交互数据确定其推荐积分值,其中,视频训练样本的历史交互数据用于表明用户与该视频训练样本之间的交互情况;
按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得套餐推荐模型。
在一种可能的实施方式中,模型建立模块设置为:
按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,以得到套餐推荐模型;或
确定推荐积分值大于等于预定积分值的目标视频内容标签,再按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在预设网络模型内将每个目标视频内容标签和对应的视频处理套餐进行关联训练,以得到套餐推荐模型。
其中,前述6所示的视频处理方法实施例涉及的各步骤的相关内容均可以援引到本申请实施例中的对应功能模块的功能描述,在此不再赘述。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
基于同一申请构思,本申请实施例提供一种服务器,该服务器例如可 以是前述图2A-图2C中的服务器23,即云端服务器。该服务器可以是硬件结构、软件模块、或硬件结构加软件模块。该终端设备可以由芯片系统实现,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
请参见图11所示,本申请实施例中的服务器可以包括第一接收模块111、第二接收模块112、和处理模块113。其中:
第一接收模块111,设置为接收终端设备发送的目标拍摄场景的场景描述信息以及针对目标拍摄场景拍摄的目标视频;
第二接收模块112,设置为接收终端设备或建模服务器发送的目标视频处理套餐,其中,目标视频处理套餐是根据场景描述信息匹配的与目标拍摄场景对应的视频处理套餐,目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
处理模块113,设置为根据目标视频处理套餐对目标视频进行处理。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
基于同一申请构思,本申请实施例还提供另一种视频处理装置,该视频处理装置可以是终端设备,例如智能手机、平板电脑、PDA,笔记本电脑、车载设备、智能穿戴式设备等等,能够实现前述的图3和图5所示的视频处理方法中终端设备的功能;或者,该视频处理装置也可以是能够支持终端设备实现前述的视频缓存方法中终端设备的功能的装置。该视频处理装置可以是硬件结构、软件模块、或硬件结构加软件模块。该视频处理 装置可以由芯片系统实现,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
如图12所示,本申请实施例中的视频处理装置包括至少一个处理器121,以及与至少一个处理器连接的存储器122,本申请实施例中不限定处理器121与存储器122之间的具体连接介质,图12中是以处理器121和存储器122之间通过总线120连接为例,总线120在图12中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。总线120可以分为地址总线、数据总线、控制总线等,为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请实施例中,存储器122存储有可被至少一个处理器121执行的指令,至少一个处理器121通过执行存储器122存储的指令,可以执行前述的视频缓存方法中所包括的步骤。
其中,处理器121是视频处理装置的控制中心,可以利用各种接口和线路连接整个视频处理装置的各个部分,通过运行或执行存储在存储器122内的指令以及调用存储在存储器122内的数据,视频处理装置的各种功能和处理数据,从而对视频处理装置进行整体监控。可选的,处理器121可包括一个或多个处理单元,处理器121可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器121中。在一些实施例中,处理器121和存储器122可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器121可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、 分立硬件组件,可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器122作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器122可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器122是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器122还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
请参见图13所示的视频处理装置的另一结构示意图,该视频处理装置还可以包括输入单元133、显示单元134、射频单元135、音频电路136、扬声器137、麦克风138、无线保真(Wireless Fidelity,WiFi)模块139、蓝牙模块1310、电源1311、外部接口1312、耳机插孔1312等部件。本领域技术人员可以理解的是,图13仅仅是视频处理装置的举例,并不构成对视频处理装置的限定,视频处理装置可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。
输入单元133可用于接收输入的数字或字符信息,以及产生与视频处理装置的用户设置以及功能控制有关的键信号输入。例如,输入单元133 可包括触摸屏1331以及其它输入设备1332。触摸屏1331可收集用户在其上或附近的触摸操作(比如用户使用手指、关节、触笔等任何适合的物体在触摸屏1331上或在触摸屏1331附近的操作),即触摸屏1331可用于检测触摸压力以及触摸输入位置和触摸输入面积,并根据预先设定的程序驱动相应的连接装置。触摸屏1331可以检测用户对触摸屏1331的触控操作,将触控操作转换为触控信号发送给处理器121,或者理解为可将触控操作的触控信息发送给处理器121,并能接收处理器121发来的命令并加以执行。触控信息至少可以包括压力大小信息和压力持续时长信息中的至少一种。触摸屏1331可以提供视频处理装置和用户之间的输入界面和输出界面。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触摸屏1331。除了触摸屏1331,输入单元133还可以包括其它输入设备1332。比如,其它输入设备1332可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元134可用于显示由用户输入的信息或提供给用户的信息以及视频处理装置的各种菜单。摸屏1331可覆盖显示单元134,当触摸屏1331检测到在其上或附近的触控操作后,传送给处理器121以确定的触控操作的压力信息。在本申请实施例中,触摸屏1331与显示单元134可以集成为一个部件而实现视频处理装置的输入、输出、显示功能。为便于描述,本申请实施例以触摸屏1331代表触摸屏1331和显示单元134的功能集合为例进行示意性说明,当然在某些实施例中,触摸屏1331与显示单元134也可以作为两个独立的部件。
当显示单元134和触摸板以层的形式彼此叠加以形成触摸屏1331时,显示单元134可以用作输入装置和输出装置,在作为输出装置时,可以用于显示图像,例如实现对各种视频的播放。显示单元134可以包括液晶显 示器(Liquid Crystal Display,LCD)、薄膜晶体管液晶显示器(Thin Film Transistor Liquid Crystal Display,TFT-LCD)、有机发光二极管(Organic Light Emitting Diode,OLED)显示器、有源矩阵有机发光二极体(Active Matrix Organic Light Emitting Diode,AMOLED)显示器、平面转换(In-Plane Switching,IPS)显示器、柔性显示器、3D显示器等等中的至少一种。这些显示器中的一些可以被构造为透明状以允许用户从外部观看,这可以称为透明显示器,根据特定想要的实施方式,视频处理装置可以包括两个或更多显示单元(或其它显示装置),例如,视频处理装置可以包括外部显示单元(图13未示出)和内部显示单元(图13未示出)。
射频单元135可用于收发信息或通话过程中信号的接收和发送。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,射频单元135还可以通过无线通信与网络设备和其它设备通信。无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
音频电路136、扬声器137、麦克风138可提供用户与视频处理装置之间的音频接口。音频电路136可将接收到的音频数据转换后的电信号,传输到扬声器137,由扬声器137转换为声音信号输出。另一方面,麦克风138将收集的声音信号转换为电信号,由音频电路136接收后转换为音频数据,再将音频数据输出处理器121处理后,经射频单元135以发送给比如另一电子设备,或者将音频数据输出至存储器122以便进一步处理, 音频电路也可以包括耳机插孔1312,用于提供音频电路和耳机之间的连接接口。
WiFi属于短距离无线传输技术,视频处理装置通过WiFi模块139可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图13示出了WiFi模块139,但是可以理解的是,其并不属于视频处理装置的必须构成,完全可以根据需要在不改变申请的本质的范围内而省略。
蓝牙是一种短距离无线通讯技术。利用蓝牙技术,能够有效地简化掌上电脑、笔记本电脑和手机等移动通信终端设备之间的通信,也能够成功地简化以上这些设备与因特网(Internet)之间的通信,视频处理装置通过蓝牙模块1310使视频处理装置与因特网之间的数据传输变得更加迅速高效,为无线通信拓宽道路。蓝牙技术是能够实现语音和数据无线传输的开放性方案。虽然图13示出了蓝牙模块1310,但是可以理解的是,其并不属于视频处理装置的必须构成,完全可以根据需要在不改变申请的本质的范围内而省略。
视频处理装置还可以包括电源1311(比如电池),其用于接收外部电力或为视频处理装置内的各个部件供电。电源1311可以通过电源管理系统与处理器121逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
视频处理装置还可以包括外部接口1312,该外部接口1312可以包括标准的Micro USB接口,也可以包括多针连接器,可以用于连接视频处理装置与其它设备进行通信,也可以用于连接充电器为视频处理装置充电。
尽管未示出,本申请实施例中的视频处理装置还可以包括摄像头、闪光灯等其它可能的功能模块,在此不再赘述。
基于同一申请构思,本申请实施例还提供另一种视频处理装置,请参 见图14,其示出了本申请一个实施例提供的视频处理装置的结构示意图,该视频处理装置例如可以图2A-AC中的服务器22、服务器23或服务器24。具体来讲:
该视频处理装置包括处理器1401、随机存取存储器1402和只读存储器1403的系统存储器1404,以及连接系统存储器1404和处理器1401的系统总线1405。该视频处理装置还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)1406,和用于存储操作系统1413、应用程序1414和其他程序模块1415的大容量存储设备1407。
处理器1401是视频处理装置的控制中心,可以利用各种接口和线路连接整个视频处理装置的各个部分,通过运行或执行存储在存储器(例如随机存取存储器142和只读存储器1403)内的指令以及调用存储在存储器内的数据,视频处理装置的各种功能和处理数据,从而对视频处理装置进行整体监控。
可选的,处理器1401可包括一个或多个处理单元,处理器1401可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1401中。在一些实施例中,处理器1401和存储器可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器1401可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本 申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、RAM、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、ROM、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
基本输入/输出系统1406包括有用于显示信息的显示器1408和用于用户输入信息的诸如鼠标、键盘之类的输入设备1409。其中显示器1408和输入设备1409都通过连接到系统总线1405的基本输入/输出系统1406连接到处理器1401。所述基本输入/输出系统1406还可以包括输入输出控制器以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1407通过连接到系统总线1405的大容量存储控制器(未示出)连接到处理器1401。所述大容量存储设备1407及其相关联的计算机可读介质为该视频处理装置包提供非易失性存储。也就是说, 大容量存储设备1407可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器904和大容量存储设备907可以统称为存储器。
根据本申请的各种实施例,该视频处理装置包还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即该视频处理装置包可以通过连接在所述系统总线1405上的网络接口单元1411连接到网络1412,或者说,也可以使用网络接口单元1411来连接到其他类型的网络或远程计算机系统(未示出)。
基于同一申请构思,本申请实施例还提供一种存储介质,该存储介质存储有计算机指令,当该计算机指令在计算机上运行时,使得计算机执行如前述的视频处理方法的步骤。
基于同一申请构思,本申请实施例还提供一种视频处理装置,该视频处理装置包括至少一个处理器及可读存储介质,当该可读存储介质中包括的指令被该至少一个处理器执行时,可以执行如前述的视频处理方法的步骤。
基于同一申请构思,本申请实施例还提供一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现如前述的视频处理方法的步骤。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
在一些可能的实施方式中,本申请提供的视频处理方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机上运行时,所述程序代码用于使所述计算机执行前文述描述的根据本申请各种示例性实施方式的视频处理方法中的步骤。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一 个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
工业实用性
本方案通过获得目标拍摄场景的场景描述信息,根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,根据所述目标视频处理套餐,对针对所述目标拍摄场景拍摄的目标视频进行处理的方法。在上述方法中,根据场景描述信息自动匹配出对应的目标视频处理套餐,省去了如相关技术中用户手动进行选择的操作,进而可以提高视频处理方式的匹配效率。同时,可以一次性匹配多种视频处理方式,可以进一步地提高匹配效率。并且,由于目标视频处理套餐是基于场景描述信息动态对应地匹配出来的,所以可以使得匹配出的目标视频处理套餐能够尽量与实际的视频内容相符合,进而可以提高视频处理的准确性。

Claims (26)

  1. 一种视频处理方法,所述方法包括:
    终端设备获得目标拍摄场景的场景描述信息;
    所述终端设备根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,其中,所述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
    所述终端设备根据所述目标视频处理套餐,对针对所述目标拍摄场景拍摄的目标视频进行处理。
  2. 如权利要求1所述的方法,其中,所述终端设备根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,包括:
    所述终端设备将所述场景描述信息进行词向量表示,以得到视频内容特征变量;
    所述终端设备将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述场景描述信息匹配的至少一个推荐的视频处理套餐;
    所述终端设备根据所述至少一个推荐的视频处理套餐,确定所述目标视频处理套餐。
  3. 如权利要求2所述的方法,其中,所述终端设备根据所述至少一个推荐的视频处理套餐,确定所述目标视频处理套餐,包括:
    所述终端设备确定所述至少一个推荐的视频处理套餐中使用频率最高的为所述目标视频处理套餐;或
    所述终端设备确定所述至少一个推荐的视频处理套餐中与优先视频处理套餐之间相似度最大的为所述目标视频处理套餐,其中,所述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐。
  4. 如权利要求2所述的方法,其中,所述终端设备根据所述至少一个推荐的视频处理套餐,确定所述目标视频处理套餐,包括:
    所述终端设备判断所述至少一个推荐的视频处理套餐中是否有与优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐,其中,所述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐;
    若有,则所述终端设备将相似度最大的推荐的视频处理套餐确定为所述目标视频处理套餐;
    若没有,则所述终端设备将所述优先视频处理套餐确定为所述目标视频处理套餐。
  5. 如权利要求1所述的方法,其中,所述终端设备根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,包括:
    所述终端设备将所述场景描述信息输入预先设置的场景与视频处理套餐对应集合进行匹配查找,以得到与所述场景描述信息匹配度最高的视频处理套餐;
    所述终端设备将所述匹配度最高的视频处理套餐确定为所述目标视频处理套餐。
  6. 如权利要求1-5中任一所述的方法,其中,所述终端设备根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,包括:
    在所述场景描述信息表明所述目标拍摄场景中的拍摄对象发生变化时,所述终端设备分别确定拍摄对象发生变化前后的目标视频处理套餐;
    所述终端设备根据所述目标视频处理套餐,对针对所述目标拍摄场景拍摄的目标视频进行处理,包括:
    所述终端设备在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得到的视频进行处理。
  7. 一种视频处理方法,所述方法包括:
    建模服务器接收终端设备发送的目标拍摄场景的场景描述信息;
    所述建模服务器将所述场景描述信息进行词向量表示,以得到视频内容特征变量;
    所述建模服务器将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述目标拍摄场景匹配的目标视频处理套餐,其中,所述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
    所述建模服务器将所述目标视频处理套餐发送给所述终端设备或云端服务器,以使所述终端设备或所述云端服务器根据所述目标视频处理套餐对针对所述目标拍摄场景拍摄的目标视频进行处理。
  8. 如权利要求7所述的方法,其中,所述建模服务器将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述目标拍摄场景匹配的目标视频处理套餐,包括:
    所述建模服务器对所述视频内容特征变量进行分析,确定与所述目标拍摄场景对应的环境和/或物体类别及数量;
    所述建模服务器针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式;
    所述建模服务器将匹配出的多套视频处理方式所组成的视频处理套餐作为所述目标视频处理套餐。
  9. 如权利要求7所述的方法,其中,所述方法还包括:
    所述建模服务器获得所述终端设备对应用户的用户属性信息和/或历史观看信息;
    所述建模服务器将所述用户属性信息和/或所述历史观看信息分别进行词向量表示,以得到辅助特征变量;
    所述建模服务器将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述场景描述信息匹配的目标视频处理套餐,包括:
    所述建模服务器将所述视频内容特征变量和所述辅助特征变量一起输入所述套餐推荐模型进行套餐匹配,以得到所述目标视频处理套餐。
  10. 如权利要求7-9中任一所述的方法,其中,所述套餐推荐模型按照以下方式建立:
    所述建模服务器从已发布的视频中选择多个视频作为视频训练样本;
    所述建模服务器基于每个视频训练样本包括的视频序列帧的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,以得到每个视频训练样本的视频内容标签;
    所述建模服务器提取每个视频训练样本使用的视频处理套餐;
    所述建模服务器将每个视频训练样本的视频内容标签和对应的视频 处理套餐作为训练特征输入预设网络模型进行训练学习,以获得所述套餐推荐模型。
  11. 如权利要求10所述的方法,其中,所述建模服务器将每个视频训练样本的视频内容标签和对应的视频处理套餐输入预设网络模型进行训练学习,以获得所述套餐推荐模型,包括:
    所述建模服务器根据每个视频训练样本的历史交互数据确定其推荐积分值,其中,视频训练样本的历史交互数据用于表明用户与该视频训练样本之间的交互情况;
    所述建模服务器按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得所述套餐推荐模型。
  12. 一种终端设备,所述终端设备包括:
    获得模块,设置为获得目标拍摄场景的场景描述信息;
    匹配模块,设置为根据所述场景描述信息,匹配与所述目标拍摄场景对应的目标视频处理套餐,其中,所述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
    处理模块,设置为根据所述目标视频处理套餐,对针对所述目标拍摄场景拍摄的目标视频进行处理。
  13. 根据权利要求12所述的终端设备,所述匹配模块还设置为:
    将所述场景描述信息进行词向量表示,以得到视频内容特征变量;
    将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述场景描述信息匹配的至少一个推荐的视频处理套餐;
    根据所述至少一个推荐的视频处理套餐,确定所述目标视频处理套餐。
  14. 根据权利要求13所述的终端设备,所述匹配模块还设置为:
    确定所述至少一个推荐的视频处理套餐中使用频率最高的为所述目标视频处理套餐;或
    确定所述至少一个推荐的视频处理套餐中与优先视频处理套餐之间相似度最大的为所述目标视频处理套餐,其中,所述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐。
  15. 根据权利要求13所述的终端设备,所述匹配模块还设置为:
    判断所述至少一个推荐的视频处理套餐中是否有与优先视频处理套餐之间的相似度大于等于预定相似度的推荐的视频处理套餐,其中,所述优先视频处理套餐是根据用户属性信息匹配的视频处理套餐;
    若有,则将相似度最大的推荐的视频处理套餐确定为所述目标视频处理套餐;
    若没有,则将所述优先视频处理套餐确定为所述目标视频处理套餐。
  16. 根据权利要求12所述的终端设备,所述匹配模块还设置为:
    将所述场景描述信息输入预先设置的场景与视频处理套餐对应集合进行匹配查找,以得到与所述场景描述信息匹配度最高的视频处理套餐;
    将所述匹配度最高的视频处理套餐确定为所述目标视频处理套餐。
  17. 根据权利要求13至16任意一项所述的终端设备,所述匹配模块还设置为:
    在所述场景描述信息表明所述目标拍摄场景中的拍摄对象发生变化时,分别确定拍摄对象发生变化前后的目标视频处理套餐;
    所述处理模块还设置为:
    在拍摄对象发生变化前后分别以对应的目标视频处理套餐对拍摄得 到的视频进行处理。
  18. 根据权利要求12所述的终端设备,所述获得模块还设置为:
    获得所述目标拍摄场景的预览视频,或者,获得所述目标拍摄场景实际拍摄的所述目标视频;
    对所述预览视频的视频序列帧或对所述目标视频的视频序列帧进行图像识别,以获得每帧的关键特征信息,其中,关键特征信息为每帧中所占面积最大和/或视觉呈现于最前位置的拍摄对象的特征信息;
    根据所有帧的关键特征信息,确定所述场景描述信息。
  19. 一种服务器,所述服务器包括:
    接收模块,设置为接收终端设备发送的目标拍摄场景的场景描述信息;
    第一获得模块,设置为将所述场景描述信息进行词向量表示,以得到视频内容特征变量;
    匹配模块,设置为将所述视频内容特征变量输入预先建立的套餐推荐模型进行套餐匹配,以得到与所述目标拍摄场景匹配的目标视频处理套餐,其中,所述目标视频处理套餐包括以预定处理模式对视频进行处理的至少一种视频处理方式;
    发送模块,设置为将所述目标视频处理套餐发送给所述终端设备或云端服务器,以使所述终端设备或所述云端服务器根据所述目标视频处理套餐对针对所述目标拍摄场景拍摄的目标视频进行处理。
  20. 根据权利要求19所述的服务器,所述匹配模块还设置为:
    对所述视频内容特征变量进行分析,确定与所述目标拍摄场景对应的环境和/或物体类别及数量;
    针对确定出的环境和/或不同类别的物体,分别匹配一套视频处理方式;
    将匹配出的多套视频处理方式所组成的视频处理套餐作为所述目标视频处理套餐。
  21. 根据权利要求19所述的服务器,所述服务器还包括第二获得模块和第三获得模块;其中:
    所述第二获得模块,设置为获得所述终端设备对应用户的用户属性信息和/或历史观看信息;
    所述第三获得模块,设置为将所述用户属性信息和/或所述历史观看信息分别进行词向量表示,以得到辅助特征变量;
    所述匹配模块,设置为将所述视频内容特征变量和所述辅助特征变量一起输入所述套餐推荐模型进行套餐匹配,以得到所述目标视频处理套餐。
  22. 根据权利要求19至21任意一项所述的服务器,所述服务器还包括模型建立模块,设置为:
    从已发布的视频中选择多个视频作为视频训练样本;
    基于每个视频训练样本包括的视频序列帧的图像识别结果,对每个视频训练样本的拍摄环境和/或拍摄对象进行标记,以得到每个视频训练样本的视频内容标签;
    提取每个视频训练样本使用的视频处理套餐;
    将每个视频训练样本的视频内容标签和对应的视频处理套餐作为训练特征输入预设网络模型进行训练学习,以获得所述套餐推荐模型。
  23. 根据权利要求22所述的服务器,所述模型建立模块设置为:
    根据每个视频训练样本的历史交互数据确定其推荐积分值,其中,视频训练样本的历史交互数据用于表明用户与该视频训练样本之间的交互情况;
    按照预定关联规则将每个视频训练样本的推荐积分值和对应的视频处理套餐建立关联后进行训练学习,以获得所述套餐推荐模型。
  24. 根据权利要求23所述的服务器,所述模型建立模块还设置为:
    按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在所述预设网络模型内将每个视频内容标签和对应的视频处理套餐进行关联训练,以得到所述套餐推荐模型;或
    确定推荐积分值大于等于预定积分值的目标视频内容标签,再按照推荐积分值越高对应的视频处理套餐的训练权重越大的原则,在所述预设网络模型内将每个目标视频内容标签和对应的视频处理套餐进行关联训练,以得到所述套餐推荐模型。
  25. 一种视频处理装置,其中,所述装置包括:
    存储器,用于存储程序指令;
    处理器,用于调用所述存储器中存储的程序指令,按照获得的程序指令执行权利要求1-6任一所述的方法包括的步骤,或者执行如权利要求7-11任一所述的方法包括的步骤。
  26. 一种存储介质,其中,所述存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行权利要求1-6任一所述的方法包括的步骤,或者执行如权利要求7-11任一所述的方法包括的步骤。
PCT/CN2019/097292 2018-07-23 2019-07-23 一种视频处理方法及装置、终端设备、服务器及存储介质 WO2020020156A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/010,549 US11854263B2 (en) 2018-07-23 2020-09-02 Video processing method and apparatus, terminal device, server, and storage medium
US18/493,730 US20240054784A1 (en) 2018-07-23 2023-10-24 Video processing package

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810814346.3A CN110163050B (zh) 2018-07-23 2018-07-23 一种视频处理方法及装置、终端设备、服务器及存储介质
CN201810814346.3 2018-07-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/010,549 Continuation US11854263B2 (en) 2018-07-23 2020-09-02 Video processing method and apparatus, terminal device, server, and storage medium

Publications (1)

Publication Number Publication Date
WO2020020156A1 true WO2020020156A1 (zh) 2020-01-30

Family

ID=67645134

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097292 WO2020020156A1 (zh) 2018-07-23 2019-07-23 一种视频处理方法及装置、终端设备、服务器及存储介质

Country Status (3)

Country Link
US (2) US11854263B2 (zh)
CN (1) CN110163050B (zh)
WO (1) WO2020020156A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294610A (zh) * 2020-02-20 2020-06-16 北京奇艺世纪科技有限公司 一种视频处理方法及装置
CN111611941A (zh) * 2020-05-22 2020-09-01 腾讯科技(深圳)有限公司 特效处理方法及相关设备

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598043B (zh) * 2019-09-24 2024-02-09 腾讯科技(深圳)有限公司 一种视频处理方法、装置、计算机设备以及存储介质
CN110942005A (zh) * 2019-11-21 2020-03-31 网易(杭州)网络有限公司 物体识别方法及装置
US11157744B2 (en) * 2020-01-15 2021-10-26 International Business Machines Corporation Automated detection and approximation of objects in video
KR102550378B1 (ko) * 2020-02-27 2023-07-04 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. 짧은 비디오의 생성 방법, 플랫폼, 전자 기기, 저장 매체 및 컴퓨터 프로그램 제품
CN111327968A (zh) * 2020-02-27 2020-06-23 北京百度网讯科技有限公司 短视频的生成方法、平台、电子设备及存储介质
CN111277761B (zh) * 2020-03-05 2022-03-01 北京达佳互联信息技术有限公司 视频拍摄方法、装置、系统、电子设备和存储介质
CN111541936A (zh) * 2020-04-02 2020-08-14 腾讯科技(深圳)有限公司 视频及图像处理方法、装置、电子设备、存储介质
CN111683280A (zh) * 2020-06-04 2020-09-18 腾讯科技(深圳)有限公司 视频处理方法、装置及电子设备
CN111757175A (zh) * 2020-06-08 2020-10-09 维沃移动通信有限公司 视频处理方法及装置
CN112069358B (zh) * 2020-08-18 2022-03-25 北京达佳互联信息技术有限公司 信息推荐方法、装置及电子设备
CN112235603B (zh) * 2020-10-15 2022-04-05 脸萌有限公司 视频分发系统、方法、计算设备、用户设备及视频播放方法
CN112243065B (zh) * 2020-10-19 2022-02-01 维沃移动通信有限公司 视频录制方法及装置
CN112312053B (zh) * 2020-10-29 2023-05-23 维沃移动通信有限公司 视频录制方法及装置
CN112689200B (zh) * 2020-12-15 2022-11-11 万兴科技集团股份有限公司 视频编辑方法、电子设备及存储介质
CN116584103A (zh) * 2021-04-27 2023-08-11 深圳市大疆创新科技有限公司 拍摄方法、装置及存储介质、终端设备
CN113364911B (zh) * 2021-06-11 2023-03-07 上海兴容信息技术有限公司 一种预设终端的检测方法和系统
CN113727025B (zh) * 2021-08-31 2023-04-14 荣耀终端有限公司 一种拍摄方法、设备和存储介质
CN114286181B (zh) * 2021-10-25 2023-08-15 腾讯科技(深圳)有限公司 一种视频优化方法、装置、电子设备和存储介质
CN115002337B (zh) * 2021-11-30 2023-04-11 荣耀终端有限公司 视频处理方法及装置
CN114697761B (zh) * 2022-04-07 2024-02-13 脸萌有限公司 一种处理方法、装置、终端设备及介质
CN117197791A (zh) * 2022-06-01 2023-12-08 华为技术有限公司 一种数据处理方法、装置、设备及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247482A (zh) * 2007-05-16 2008-08-20 北京思比科微电子技术有限公司 一种实现动态图像处理的方法和装置
CN103413270A (zh) * 2013-08-15 2013-11-27 北京小米科技有限责任公司 一种图像的处理方法、装置和终端设备
US20140176732A1 (en) * 2012-12-21 2014-06-26 Google Inc. Recommending transformations for photography
CN104581199A (zh) * 2014-12-12 2015-04-29 百视通网络电视技术发展有限责任公司 视频处理系统及其处理方法
CN106547908A (zh) * 2016-11-25 2017-03-29 三星电子(中国)研发中心 一种信息推送方法和系统
CN107256221A (zh) * 2017-04-26 2017-10-17 苏州大学 基于多特征融合的视频描述方法
CN107730461A (zh) * 2017-09-29 2018-02-23 北京金山安全软件有限公司 图像处理方法、装置、设备及介质

Family Cites Families (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8208694B2 (en) * 2006-06-06 2012-06-26 Thomas Jelonek Method and system for image and video analysis, enhancement and display for communication
JP5298831B2 (ja) * 2008-12-19 2013-09-25 富士ゼロックス株式会社 画像処理装置及びプログラム
JP5636807B2 (ja) * 2010-08-12 2014-12-10 富士ゼロックス株式会社 画像処理装置及びプログラム
JP5776419B2 (ja) * 2011-07-29 2015-09-09 ブラザー工業株式会社 画像処理装置、画像処理プラグラム
US20130128059A1 (en) * 2011-11-22 2013-05-23 Sony Mobile Communications Ab Method for supporting a user taking a photo with a mobile device
WO2014024197A1 (en) * 2012-08-09 2014-02-13 Winkapp Ltd. A method and system for linking printed objects with electronic content
EP2797031A3 (en) * 2013-04-22 2017-07-12 ESSILOR INTERNATIONAL (Compagnie Générale d'Optique) Optical character recognition of text in an image according to a prioritized processing sequence
US9143542B1 (en) * 2013-06-05 2015-09-22 Google Inc. Media content collaboration
US9779527B2 (en) * 2013-08-15 2017-10-03 Xiaomi Inc. Method, terminal device and storage medium for processing image
JP6288816B2 (ja) * 2013-09-20 2018-03-07 カシオ計算機株式会社 画像処理装置、画像処理方法及びプログラム
KR102327779B1 (ko) * 2014-02-21 2021-11-18 삼성전자주식회사 이미지 처리 방법 및 장치
CN105225212B (zh) * 2014-06-27 2018-09-28 腾讯科技(深圳)有限公司 一种图片处理方法和装置
US9225897B1 (en) * 2014-07-07 2015-12-29 Snapchat, Inc. Apparatus and method for supplying content aware photo filters
JP6024719B2 (ja) * 2014-09-09 2016-11-16 カシオ計算機株式会社 検出装置、検出方法、及びプログラム
CN105574006A (zh) * 2014-10-10 2016-05-11 阿里巴巴集团控股有限公司 建立拍照模板数据库、提供拍照推荐信息的方法及装置
CN104360847A (zh) * 2014-10-27 2015-02-18 元亨利包装科技(上海)有限公司 一种用于处理图像的方法与设备
US9591349B2 (en) * 2014-12-23 2017-03-07 Intel Corporation Interactive binocular video display
US20160196584A1 (en) * 2015-01-06 2016-07-07 Facebook, Inc. Techniques for context sensitive overlays
CN106161990B (zh) * 2015-04-28 2019-11-12 腾讯科技(北京)有限公司 一种图像处理方法和装置
KR20160146281A (ko) * 2015-06-12 2016-12-21 삼성전자주식회사 전자 장치 및 전자 장치에서 이미지 표시 방법
JP6179569B2 (ja) * 2015-08-20 2017-08-16 カシオ計算機株式会社 画像処理装置、画像処理方法及びプログラム
US9898847B2 (en) * 2015-11-30 2018-02-20 Shanghai Sunson Activated Carbon Technology Co., Ltd. Multimedia picture generating method, device and electronic device
CN105812660A (zh) * 2016-03-15 2016-07-27 深圳市至壹科技开发有限公司 基于地理位置的视频处理方法
US9930218B2 (en) * 2016-04-04 2018-03-27 Adobe Systems Incorporated Content aware improvement of captured document images
CN107770580B (zh) * 2016-08-19 2020-11-17 北京市商汤科技开发有限公司 视频图像处理方法、装置和终端设备
CN106657810A (zh) * 2016-09-26 2017-05-10 维沃移动通信有限公司 一种视频图像的滤镜处理方法和装置
CN108605085B (zh) * 2016-11-08 2020-12-01 华为技术有限公司 一种获取拍摄参考数据的方法、移动终端
US10515108B2 (en) * 2016-12-30 2019-12-24 Facebook, Inc. Dynamically ranking media effects based on user and device characteristics
US10733372B2 (en) * 2017-01-10 2020-08-04 Microsoft Technology Licensing, Llc Dynamic content generation
CN107123081A (zh) * 2017-04-01 2017-09-01 北京小米移动软件有限公司 图像处理方法、装置及终端
CN108229278B (zh) * 2017-04-14 2020-11-17 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
KR101932844B1 (ko) * 2017-04-17 2018-12-27 주식회사 하이퍼커넥트 영상 통화 장치, 영상 통화 방법 및 영상 통화 중개 방법
US10078909B1 (en) * 2017-05-16 2018-09-18 Facebook, Inc. Video stream customization using graphics
US10163022B1 (en) * 2017-06-22 2018-12-25 StradVision, Inc. Method for learning text recognition, method for recognizing text using the same, and apparatus for learning text recognition, apparatus for recognizing text using the same
JP7149692B2 (ja) * 2017-08-09 2022-10-07 キヤノン株式会社 画像処理装置、画像処理方法
CN110599557B (zh) * 2017-08-30 2022-11-18 深圳市腾讯计算机系统有限公司 图像描述生成方法、模型训练方法、设备和存储介质
CN109474844B (zh) * 2017-09-08 2020-08-18 腾讯科技(深圳)有限公司 视频信息处理方法及装置、计算机设备
CN110019903A (zh) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 图像处理引擎组件的生成方法、搜索方法及终端、系统
CN110069650B (zh) * 2017-10-10 2024-02-09 阿里巴巴集团控股有限公司 一种搜索方法和处理设备
CN107888843A (zh) * 2017-10-13 2018-04-06 深圳市迅雷网络技术有限公司 用户原创内容的混音方法、装置、存储介质及终端设备
CN107911739A (zh) * 2017-10-25 2018-04-13 北京川上科技有限公司 一种视频获取方法、装置、终端设备及存储介质
CN107680128B (zh) * 2017-10-31 2020-03-27 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN108012081B (zh) * 2017-12-08 2020-02-04 北京百度网讯科技有限公司 智能美颜方法、装置、终端和计算机可读存储介质
US10909737B2 (en) * 2017-12-18 2021-02-02 Adobe Inc. Using layer blocks to apply effects to image content
CN108235118A (zh) * 2018-01-29 2018-06-29 北京奇虎科技有限公司 一种视频调色处理方法和装置
CN108304835B (zh) * 2018-01-30 2019-12-06 百度在线网络技术(北京)有限公司 文字检测方法和装置
US10692467B2 (en) * 2018-05-04 2020-06-23 Microsoft Technology Licensing, Llc Automatic application of mapping functions to video signals based on inferred parameters
CN109089133B (zh) * 2018-08-07 2020-08-11 北京市商汤科技开发有限公司 视频处理方法及装置、电子设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247482A (zh) * 2007-05-16 2008-08-20 北京思比科微电子技术有限公司 一种实现动态图像处理的方法和装置
US20140176732A1 (en) * 2012-12-21 2014-06-26 Google Inc. Recommending transformations for photography
CN103413270A (zh) * 2013-08-15 2013-11-27 北京小米科技有限责任公司 一种图像的处理方法、装置和终端设备
CN104581199A (zh) * 2014-12-12 2015-04-29 百视通网络电视技术发展有限责任公司 视频处理系统及其处理方法
CN106547908A (zh) * 2016-11-25 2017-03-29 三星电子(中国)研发中心 一种信息推送方法和系统
CN107256221A (zh) * 2017-04-26 2017-10-17 苏州大学 基于多特征融合的视频描述方法
CN107730461A (zh) * 2017-09-29 2018-02-23 北京金山安全软件有限公司 图像处理方法、装置、设备及介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294610A (zh) * 2020-02-20 2020-06-16 北京奇艺世纪科技有限公司 一种视频处理方法及装置
CN111294610B (zh) * 2020-02-20 2022-03-08 北京奇艺世纪科技有限公司 一种视频处理方法及装置
CN111611941A (zh) * 2020-05-22 2020-09-01 腾讯科技(深圳)有限公司 特效处理方法及相关设备
CN111611941B (zh) * 2020-05-22 2023-09-19 腾讯科技(深圳)有限公司 特效处理方法及相关设备

Also Published As

Publication number Publication date
CN110163050B (zh) 2022-09-27
US20200404173A1 (en) 2020-12-24
CN110163050A (zh) 2019-08-23
US20240054784A1 (en) 2024-02-15
US11854263B2 (en) 2023-12-26

Similar Documents

Publication Publication Date Title
WO2020020156A1 (zh) 一种视频处理方法及装置、终端设备、服务器及存储介质
WO2021018154A1 (zh) 信息表示方法及装置
CN110110203A (zh) 资源信息推送方法及服务器、资源信息展示方法及终端
US11132547B2 (en) Emotion recognition-based artwork recommendation method and device, medium, and electronic apparatus
CN109074358A (zh) 提供与用户兴趣有关的地理位置
CN105245589B (zh) 信息展示方法及装置
CN107885889A (zh) 搜索结果的反馈方法、展示方法及装置
CN103827913B (zh) 用于在便携式终端中剪辑和共享内容的装置和方法
CN107077750A (zh) 化身选择机制
CN103207675A (zh) 制作媒体节目集锦或扩充媒体节目
CN110163066B (zh) 多媒体数据推荐方法、装置及存储介质
CN108781262B (zh) 用于合成图像的方法和使用该方法的电子装置
CN107832784A (zh) 一种图像美化的方法和一种移动终端
CN108289057B (zh) 视频编辑方法、装置及智能移动终端
CN108734516A (zh) 广告投放方法及装置
CN104133956A (zh) 处理图片的方法及装置
CN109358923A (zh) 一种虚拟机器人形象的呈现方法及装置
CN109200567A (zh) 一种运动数据的交互方法及其装置、电子设备
TW201939323A (zh) 提供服飾搭配資訊的方法、裝置及電子設備
WO2016179764A1 (zh) 设置信息的配置方法、终端及服务器
CN110647688A (zh) 信息呈现方法、装置、电子设备和计算机可读介质
CN110222567A (zh) 一种图像处理方法及设备
CN108319412A (zh) 一种照片删除方法、移动终端和计算机可读存储介质
WO2023040603A1 (zh) 一种搜索方法、终端、服务器及系统
CN109977303A (zh) 多媒体信息的交互方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19841257

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19841257

Country of ref document: EP

Kind code of ref document: A1