WO2019105440A1 - 视频编辑推送方法、系统及智能移动终端 - Google Patents

视频编辑推送方法、系统及智能移动终端 Download PDF

Info

Publication number
WO2019105440A1
WO2019105440A1 PCT/CN2018/118373 CN2018118373W WO2019105440A1 WO 2019105440 A1 WO2019105440 A1 WO 2019105440A1 CN 2018118373 W CN2018118373 W CN 2018118373W WO 2019105440 A1 WO2019105440 A1 WO 2019105440A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
environment
editing
classification
classification information
Prior art date
Application number
PCT/CN2018/118373
Other languages
English (en)
French (fr)
Inventor
周宇涛
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Priority to US16/767,677 priority Critical patent/US11393205B2/en
Publication of WO2019105440A1 publication Critical patent/WO2019105440A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Definitions

  • the embodiment of the invention relates to the field of live broadcast, in particular to a video editing push method, system and intelligent mobile terminal.
  • Video editing refers to the process of recording the desired image with a camera and then using the video editing software on the computer to make the image into a disc.
  • real-time video editing has become a demand for development, and editing short video shots by smart mobile terminals has become a new demand.
  • the inventor of the present invention found in the research that due to the diversity of editing materials, the user can obtain and select a large amount of video editing material data, and the user needs to determine the editing material to be used in the current video editing process from a large amount of data. It takes a lot of time to filter, and in the network environment to obtain editing materials, it requires more user data consumption to browse and filter the editing materials. Therefore, in the prior art, the method of determining video editing material by means of mass browsing by users is inefficient, and improvement is urgently needed.
  • the embodiment of the invention provides a video editing and pushing method, system and intelligent mobile terminal capable of acquiring video editing material according to the category of the environment in which the video is edited.
  • a technical solution adopted by the embodiment of the present invention is to provide a video editing push method, which includes the following steps:
  • the acquiring at least one frame picture of the edit video according to the editing instruction, inputting the frame picture into a preset environment classification model, and acquiring the image frame image output by the environment classification model includes the following steps:
  • the classification information having the highest distribution rate among the plurality of classification information is marked as the classification information of the captured video according to the statistical result.
  • the acquiring at least one frame picture of the edit video according to the editing instruction, inputting the frame picture into a preset environment classification model, and acquiring the image frame image output by the environment classification model includes the following steps:
  • the plurality of classification information is counted, and the statistical classification directory is marked as the classification information of the edited video according to the statistical result.
  • the step of acquiring the video editing material according to the classification information to match the video editing material to the image environment of the frame picture comprises the following steps:
  • the method further includes the following steps:
  • the frame picture acquiring the same environment information is in the segment length of the edited video
  • the plurality of classification information is counted, and the statistical classification directory is marked as the classification information of the edited video according to the statistical result.
  • the step of acquiring the video editing material according to the classification information to match the video editing material to the image environment of the frame picture comprises the following steps:
  • the environment classification model is specifically a convolutional neural network model trained to a convergence state, and the convolutional neural network model is trained to classify an image environment.
  • an embodiment of the present invention further provides a video editing push system, including:
  • An obtaining module configured to acquire an editing instruction to be executed by a user
  • a processing module configured to acquire at least one frame image of the edited video according to the editing instruction, input the frame image into a preset environment classification model, and acquire, by the environment classification model, a representation of the frame image Classification information of the environment;
  • an execution module configured to acquire video editing material according to the classification information, so that the video editing material is matched to an image environment of the frame image.
  • the video editing system further includes:
  • a first acquiring submodule configured to periodically acquire a plurality of frame images of the captured video in a video capturing state
  • a first statistic sub-module configured to input the plurality of frame images into the environment classification model, acquire a plurality of classification information corresponding to the plurality of frame images, and perform statistics on the plurality of classification information;
  • the first processing submodule is configured to mark, according to the statistical result, the classification information with the highest distribution rate among the plurality of classification information as the classification information of the captured video.
  • the video editing system further includes:
  • a second acquiring submodule configured to periodically acquire a plurality of frame images of the edited video
  • a first comparison submodule configured to compare whether the environmental information of the image representation in the plurality of frame pictures is consistent
  • the second processing sub-module is configured to perform statistics on the plurality of classification information when the environmental information of the image representation in the plurality of frame images is inconsistent, and mark the statistical classification directory as the classification information of the edited video according to the statistical result.
  • the video editing system further includes:
  • a third obtaining submodule configured to separately acquire the video editing material according to the statistical classification directory
  • a first sorting sub-module configured to acquire an ingestion duration of the environmental information represented by the statistical classification catalog in the edited video, and perform a power-down arrangement on the video editing material according to the ingestion duration.
  • the video editing system further includes:
  • a third processing submodule configured to: when the environment information of the image representation in the plurality of frame pictures is inconsistent, acquire a frame picture of the same environment information in a segment length of the edited video;
  • the second statistic sub-module is configured to perform statistics on the multiple classification information, and mark the statistical classification directory as the classification information of the edited video according to the statistical result.
  • the video editing system further includes:
  • a fourth obtaining submodule configured to acquire an editing period position of the edited video
  • a first determining submodule configured to determine that the editing period position is within a duration of a certain segment of the edit video
  • a fifth acquiring submodule configured to acquire video editing material corresponding to the segment duration environment information.
  • the environment classification model is specifically a convolutional neural network model trained to a convergence state, and the convolutional neural network model is trained to classify an image environment.
  • an embodiment of the present invention further provides an intelligent mobile terminal, including:
  • One or more processors are One or more processors;
  • One or more applications wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the The video editing push method.
  • the beneficial effects of the embodiment of the present invention are: by inputting the frame picture image extracted in the edited video into the environment classification model that has been trained to obtain the classification information of the environment in which the frame picture is located, and using the classification information as a condition, Obtaining the same type of video editing material as the user can select and use, and the method can analyze the environmental factors in the frame picture and filter the massive video editing material by using the environmental factor as a filtering condition.
  • Providing the user with video editing materials suitable for the environment enables the user to obtain the video editing material with high adaptability conveniently and quickly, and improves the efficiency of pushing. At the same time, it also improves the accuracy of video editing material delivery and reduces the consumption of user data traffic.
  • FIG. 1 is a schematic flowchart of a video editing and pushing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of unifying multiple classification information according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of processing of multiple classification information according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of sorting video editing materials according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of obtaining a duration of an environmental information segment according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of adjusting a push result of a video editing material according to an editing position according to an embodiment of the present invention
  • FIG. 7 is a block diagram showing a basic structure of a video editing push system according to an embodiment of the present invention.
  • FIG. 8 is a block diagram showing the basic structure of an intelligent mobile terminal according to an embodiment of the present invention.
  • the environment classification model in the present embodiment is a convolutional neural network model in deep learning technology.
  • Convolutional neural networks are mainly used to identify two-dimensional graphics of displacement, scaling, and other forms of distortion invariance. Since the feature detection layer of the convolutional neural network learns through the training data, when the convolutional neural network is used, the feature extraction of the display is avoided, and learning is implicitly performed from the training data.
  • the basic structure of a convolutional neural network consists of two layers, one of which is a feature extraction layer, the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the feature mapping layer, each computing layer of the network is composed of multiple feature maps, and each feature map is a plane. The weights of all neurons on the plane are equal.
  • the feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance.
  • the convolutional neural network model is trained to classify the image environment, and the training process is: collecting the image as a training sample, and marking the environmental information in the image by artificial marking before training (eg, rain, fog) , sunny, sand or dim, etc.) and use this information as the desired output of the image.
  • the labeled training sample is input into the convolutional neural network model, and the input post-convolution neural network model outputs the excitation output of the training sample data, and the comparison between the expected output and the excitation output is consistent.
  • the volume is adjusted by the inverse algorithm.
  • the weight of the neural network model is used to correct the output of the convolutional neural network model.
  • the training forget-me is then input into the convolutional neural network model to obtain a new round of excitation output, and then compared, and the loop is repeated until the desired output coincides with the excitation output.
  • a large number of training samples such as 1 million pictures, are used during training.
  • the environment information is identified as an environment classification model, and the corresponding video editing material is obtained according to the recognition result.
  • FIG. 1 is a video of the embodiment. Edit the basic flow diagram of the push method.
  • the video editing push method includes the following steps:
  • the user edits the captured or locally stored video using the smart mobile terminal, and receives a click or slide instruction sent by the user through a finger or a stylus before entering the editing state. After receiving the user's click or slide command, enter the acquisition phase of the video editing material.
  • S2000 Obtain at least one frame image of the edited video according to the editing instruction, input the frame image into a preset environment classification model, and acquire a classification of the environment in which the environment classification model outputs the image of the frame image information;
  • At least one frame picture of the edited video is obtained.
  • the edited video is a video to be edited stored in a local storage space of the user's smart mobile terminal, or is a video that is being recorded and recorded through a camera of the smart mobile terminal, that is, the edited video can be already photographed or Video material that is still in the shooting state.
  • the edited video is composed of a number of frame picture images, and is randomly extracted after entering the acquisition stage of the video editing material, or the frame picture is extracted by a timing extraction method (for example, one frame picture is extracted every 1 s).
  • the acquired frame picture is input into the environment classification model to obtain classification information indicating the environment in which the frame picture is located.
  • the video editing material After obtaining the classification information capable of characterizing the environment in which the edited video is located, the video editing material is acquired according to the classification information.
  • the video editing material is a material dedicated to video or image editing.
  • the content of the editing material according to each video is given an attached description of the environment to which the material is applied (for example, by setting an index label on the material).
  • the acquisition information is used as the search key, the video editing material that can be associated with the image environment of the frame screen can be recalled.
  • an intelligent mobile terminal can be directly filtered and extracted in the local storage space when acquiring.
  • the classification information is sent to the server through the server, and the server is in the network database.
  • a search recall is performed, and the retrieved data is sent to the smart mobile terminal.
  • the video editing material fits in the image environment of the frame picture. For example, when detecting the thunderstorm weather of the frame picture environment, the user is prompted to download video editing materials such as lightning, wind and umbrella, which cooperate with the rainy environment; when the frame picture environment is detected to be bright and sunny, the sun, smile or sunshine are pushed to the user. Quotations, etc., but the video editing material is not limited to this. Depending on the application scenario, it is possible to develop editing material that matches any environmental information for the user to select.
  • the video editing material can be (not limited to): video material, text material, image material, or music material.
  • the above embodiment obtains the classification information of the environment in which the frame picture is located by inputting the frame picture image extracted from the edited video into the environment classification model that has been trained, and obtains the classification information based on the classification information.
  • this method can analyze the environmental factors in the frame picture and filter the massive video editing material with the environment factor as the filtering condition, and provide the user with suitable
  • the video editing material of the environment enables the user to obtain the video editing material with high adaptability conveniently and quickly, and improves the efficiency of pushing. At the same time, it also improves the accuracy of video editing material delivery and reduces the consumption of user data traffic.
  • FIG. 2 is a schematic flowchart of unified multi-classification information according to the embodiment.
  • step S2000 specifically includes the following steps:
  • the edited video is a video that is being recorded by the camera of the smart mobile terminal, that is, the edited video is the video material in the shooting state.
  • a frame picture image is acquired by accessing the buffer space in a video capturing state, and a plurality of frame pictures of the captured video are acquired by a method of timing extraction (eg, one frame picture is extracted every 1 s).
  • S2120 Input the plurality of frame screens into the environment classification model, acquire a plurality of classification information corresponding to the plurality of frame images, and perform statistics on the plurality of classification information;
  • a plurality of frame image images are sequentially input into the environment classification model, and a plurality of classification information corresponding to the plurality of frame images are acquired.
  • S2130 Mark the classification information with the highest distribution rate among the plurality of classification information as the classification information of the captured video according to the statistical result.
  • the classification information with the highest proportion of the statistical results is obtained.
  • the proportion of the environmental classification belonging to the snow in the classification information is five, then the snowy environment classification information is the highest proportion.
  • the classification information category is marked as the classification information of the captured video, so as to facilitate the acquisition of the video editing material when editing the captured video.
  • the editing time of the user can be effectively saved. Statistics are performed on multiple classification results to facilitate the determination of the main environmental information of the edited video.
  • the duration of editing the video is long, and the switching time of each environment is relatively long in a large time span. Therefore, when editing the video, several environmental information needs to be separately performed.
  • FIG. 3 is a schematic diagram of a processing flow of the multi-category information in the embodiment.
  • step S2000 specifically includes the following steps:
  • a plurality of frame images of the edited video are obtained by a method of timing extraction (for example, extracting one frame picture every 5 s).
  • the edit video in the present embodiment can be video material that has been photographed or is still in a photographing state.
  • S2220 Align whether the environmental information represented by the image in the plurality of frame pictures is consistent
  • a plurality of frame image images are sequentially input into the environment classification model, and a plurality of classification information corresponding to the plurality of frame images are acquired.
  • the combined statistics are performed according to the type of the classification information.
  • the statistics are classified according to the type of the classification information, and the statistical classification directory is the classification type of the edited video. After the statistics are completed, the statistical classification directory is marked as the editing video. Classification information.
  • FIG. 4 is a schematic flowchart of sorting video editing materials according to an embodiment.
  • step S3000 specifically includes the following steps:
  • the video editing materials are respectively obtained. For example, there are four scenes in the environment for editing the video in the editing video, and the environment classification information of the corresponding editing video is also four categories, and four sets of video editing materials are respectively obtained through four categories.
  • the number of limited materials in each set of materials is 3, and 12 pieces of video editing material are obtained according to the classified information of the edited video.
  • S3120 Acquire an ingestion duration of the environment information represented by the statistical classification directory in the edited video, and perform a power-down arrangement on the video editing material according to the ingestion duration.
  • the ingestion time of the environment information represented by the classification directory in the editing video is obtained. Since the extraction of the frame picture is in accordance with the timing extraction method, when the environment information in the editing video changes, the statistics are obtained before and after the statistics. Inconsistent time between classification information, you can get the length of time an environment is in the edit video.
  • the time length of the environmental information represented by each category catalog in the edited video is counted, and the video editing material is laid down according to the length of the ingestion.
  • the environment classification information of the corresponding editing video is also four categories, and four sets of video editing materials are respectively obtained through four categories, assuming that the number of materials in each group is limited. For 3 pieces, 12 pieces of video editing material are obtained according to the classified information of the edited video.
  • the environment of the first classification directory is 20s in the edited video
  • the environment of the second category is 50s in the edited video
  • the environment of the third category is edited.
  • the duration of the ingestion in the video is 30s
  • the environment characterized by the fourth category catalogue is 15s in the edit video.
  • the video editing materials are sorted. Due to the long environmental information, the area that can be selected as the editing area and the editing action actually has a high probability, so according to the intake time
  • the video editing materials are arranged in a power-down manner to achieve accurate push and improve editing efficiency.
  • FIG. 5 is a schematic flowchart of obtaining a duration of an environmental information segment according to an embodiment.
  • step S3000 the following steps are further included:
  • the frame picture acquiring the same environment information is in the segment length of the edited video. Specifically, after the video editing material is acquired, the length of the segment of the environment information represented by the classification directory in the edited video is obtained. Since the extraction of the frame image is performed according to the timing, when the environment information in the edited video changes, the statistics are obtained before and after the statistics.
  • the time between two inconsistent classification information intervals you can get the length of the clip in an environment in the edit video. For example, two inconsistent classification information before and after detection occur at 1:20s and 1:50s, respectively, which proves that the segmentation time of the environmental information represented by the previous classification information in the edited video is between 1:20s-1:50s. .
  • S3220 Perform statistics on the multiple classification information, and mark the statistical classification directory as the classification information of the edited video according to the statistical result.
  • the statistics are classified according to the type of the classification information, and the statistical classification directory is the classification type of the edited video. After the statistics are completed, the statistical classification directory is marked as the editing video. Classification information.
  • FIG. 6 is a schematic flowchart of adjusting a push result of a video editing material according to an editing position according to an embodiment.
  • step S3000 specifically includes the following steps:
  • the time at which the editing position occurs is compared with the length of the segment obtained by the duration of the environmental information, and it is determined that the time at which the editing position occurs is within the length of the segment in which the environmental information of the editing video continues to be ingested. If the editing time occurs at 1:25s, and the environment information of 1:10s-1:40s in the editing video is "when snowing", it is determined that the editing time occurs in the "snowing" environment.
  • FIG. 7 is a basic structural block diagram of a video editing and pushing system according to this embodiment.
  • a video editing push system includes an acquisition module 2100, a processing module 2200, and an execution module 2300.
  • the obtaining module 2100 is configured to acquire an editing instruction to be executed by the user;
  • the processing module 2200 is configured to acquire at least one frame image of the editing video according to the editing instruction, input the frame image into a preset environment classification model, and acquire an environment classification model.
  • the output classification information indicating the environment in which the frame picture is located;
  • the execution module 2300 is configured to acquire the video editing material according to the classification information, so that the video editing material is matched to the image environment of the frame picture.
  • the frame image extracted in the edited video is input into the environment classification model that has been trained, and the environment classification information of the frame image is obtained, and the environment classification information is used as a condition, and the The classification information is matched or the same type of video editing material is selected for the user to use.
  • This method can analyze the environmental factors in the frame picture and filter the massive editing materials with the environment factor as the filtering condition to provide suitable for the user.
  • the video editing material in the environment enables the user to quickly and easily obtain the editing material with higher adaptability, thereby improving the efficiency of pushing. At the same time, it also improves the accuracy of editing materials and reduces the consumption of user data traffic.
  • the video editing system further includes: a first acquisition sub-module, a first statistical sub-module, and a first processing sub-module.
  • the first obtaining sub-module is configured to periodically acquire a plurality of frame images of the captured video in a video capturing state;
  • the first statistical sub-module is configured to separately input the multiple frame images into the environment classification model, and obtain corresponding multiple frame images.
  • the plurality of classification information is used to perform statistics on the plurality of classification information.
  • the first processing sub-module is configured to mark, according to the statistical result, the classification information with the highest distribution rate among the plurality of classification information as the classification information of the captured video.
  • the video editing system further includes: a second acquisition sub-module, a first comparison sub-module, and a second processing sub-module.
  • the second obtaining sub-module is configured to periodically acquire a plurality of frame images of the edited video;
  • the first comparing sub-module is configured to compare whether the environment information represented by the image in the multiple frame images is consistent;
  • the second processing sub-module is used to When the environmental information of the image representation in the plurality of frame pictures is inconsistent, the plurality of classification information is counted, and the statistical classification directory is marked as the classification information of the edited video according to the statistical result.
  • the video editing system further includes: a third acquisition sub-module and a first sequencing sub-module.
  • the third obtaining sub-module is configured to respectively acquire video editing materials according to the statistical classification directory;
  • the first sorting sub-module is configured to obtain the ingestion duration of the environmental information represented by the statistical classification directory in the editing video, and compare the video according to the ingestion duration. Edit the material for a power-down arrangement.
  • the video editing system further includes: a third processing sub-module and a second statistical sub-module.
  • the third processing sub-module is configured to: when the environment information represented by the image in the multiple frame images is inconsistent, the frame image of acquiring the same environment information is used to edit the segment length of the video; the second statistical sub-module is configured to perform the plurality of classification information. Statistics, according to the statistical results, the statistical classification directory is marked as the classification information of the edited video.
  • the video editing system further includes: a fourth obtaining submodule, a first determining submodule, and a fifth obtaining submodule, wherein the fourth obtaining submodule is configured to acquire an editing period position of the edited video; The sub-module is configured to determine that the editing period position is within a certain segment duration of the editing video; and the fifth obtaining sub-module is configured to acquire the video editing material corresponding to the segment duration environment information.
  • the environmental classification model is specifically a convolutional neural network model trained to a converged state, and the convolutional neural network model is trained to classify the image environment.
  • FIG. 8 is a schematic diagram of a basic structure of an intelligent mobile terminal according to an embodiment of the present disclosure.
  • all the programs in the video editing and pushing method in the embodiment are stored in the memory 1520 of the smart mobile terminal, and the processor 1580 can call the program in the memory 1520 to execute the video editing and pushing. All the features listed in the method.
  • the function of the smart mobile terminal is described in detail in the video editing and pushing method in this embodiment, and details are not described herein.
  • the time axis representing the duration of the template video is overlaid on the progress bar of the editing video, and the template video can be intuitively obtained in the editing video by observing the relative positional relationship between the time axis and the progress bar.
  • the addition of the location, the editing area is reduced, and the area occupied by the editing area is reduced.
  • the user can add the position of the template video in the edited video, and the simplified edited area provides a sufficient space for the design of the timeline container, thus facilitating user adjustment. Editing, reducing the difficulty of editing control, improving the accuracy of editing and the success rate of operation.
  • the embodiment of the present invention further provides an intelligent mobile terminal.
  • the terminal may be any terminal device including a smart mobile terminal, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of Sales), an in-vehicle computer, and the terminal is an intelligent mobile terminal as an example:
  • FIG. 8 is a block diagram showing a partial structure of an intelligent mobile terminal related to a terminal provided by an embodiment of the present invention.
  • the smart mobile terminal includes: a radio frequency (RF) circuit 1510 , a memory 1520 , an input unit 1530 , a display unit 1540 , a sensor 1550 , an audio circuit 1560 , and a wireless fidelity (Wi-Fi) module 1570 . , processor 1580, and power supply 1590 and other components.
  • RF radio frequency
  • the smart mobile terminal structure shown in FIG. 8 does not constitute a limitation on the smart mobile terminal, and may include more or less components than those illustrated, or combine some components or different components. Arrangement.
  • the RF circuit 1510 can be used for receiving and transmitting signals during the transmission or reception of information or during a call. Specifically, after receiving the downlink information of the base station, the processing is processed by the processor 1580. In addition, the data designed for the uplink is sent to the base station.
  • RF circuitry 1510 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 1510 can also communicate with the network and other devices via wireless communication.
  • the above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • E-mail Short Messaging Service
  • the memory 1520 can be used to store software programs and modules, and the processor 1580 executes various functional applications and data processing of the smart mobile terminal by running software programs and modules stored in the memory 1520.
  • the memory 1520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a voiceprint playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the smart mobile terminal (such as audio data, phone book, etc.).
  • memory 1520 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 1530 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the smart mobile terminal.
  • the input unit 1530 may include a touch panel 1531 and other input devices 1532.
  • the touch panel 1531 also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 1531 or near the touch panel 1531. Operation), and drive the corresponding connecting device according to a preset program.
  • the touch panel 1531 may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 1580 is provided and can receive commands from the processor 1580 and execute them.
  • the touch panel 1531 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 1530 may also include other input devices 1532.
  • other input devices 1532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • the display unit 1540 can be used to display information input by the user or information provided to the user as well as various menus of the smart mobile terminal.
  • the display unit 1540 can include a display panel 1541.
  • the display panel 1541 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 1531 may cover the display panel 1541. After the touch panel 1531 detects a touch operation on or near the touch panel 1531, the touch panel 1531 transmits to the processor 1580 to determine the type of the touch event, and then the processor 1580 according to the touch event. The type provides a corresponding visual output on display panel 1541.
  • the touch panel 1531 and the display panel 1541 are two independent components to implement the input and input functions of the smart mobile terminal, in some embodiments, the touch panel 1531 and the display panel 1541 may be Integrate to realize the input and output functions of intelligent mobile terminals.
  • the smart mobile terminal may also include at least one type of sensor 1550, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1541 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1541 when the smart mobile terminal moves to the ear. And / or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the posture of smart mobile terminals (such as horizontal and vertical screen switching).
  • An audio circuit 1560, a speaker 1561, and a microphone 1562 can provide an audio interface between the user and the smart mobile terminal.
  • the audio circuit 1560 can transmit the converted electrical data of the received audio data to the speaker 1561, and convert it into a voiceprint signal output by the speaker 1561.
  • the microphone 1562 converts the collected voiceprint signal into an electrical signal by the audio.
  • Circuit 1560 is converted to audio data upon receipt, processed by audio data output processor 1580, transmitted via RF circuitry 1510 to, for example, another smart mobile terminal, or output audio data to memory 1520 for further processing.
  • Wi-Fi is a short-range wireless transmission technology.
  • the smart mobile terminal can help users to send and receive emails, browse web pages and access streaming media through the Wi-Fi module 1570. It provides users with wireless broadband Internet access.
  • FIG. 8 shows the Wi-Fi module 1570, it can be understood that it does not belong to the essential configuration of the smart mobile terminal, and can be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 1580 is a control center of the smart mobile terminal that connects various portions of the entire smart mobile terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 1520, and by calling them stored in the memory 1520.
  • the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 1580.
  • the smart mobile terminal also includes a power source 1590 (such as a battery) for supplying power to various components.
  • a power source 1590 such as a battery
  • the power source can be logically connected to the processor 1580 through a power management system to manage functions such as charging, discharging, and power management through the power management system. .
  • the smart mobile terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本发明实施例公开了一种视频编辑推送方法、系统及智能移动终端,包括下述步骤:获取用户待执行的编辑指令;根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。通过将编辑视频中抽取的帧画面图像,输入到已经训练完成的环境分类模型中,获取该帧画面所处环境的分类信息,并以该分类信息为条件,获取与该分类信息相配合或者相同类型的视频编辑素材,供用户选择使用。

Description

视频编辑推送方法、系统及智能移动终端 技术领域
本发明实施例涉及直播领域,尤其是一种视频编辑推送方法、系统及智能移动终端。
背景技术
视频编辑是指先用摄影机摄录下预期的影像,再在电脑上用视频编辑软件将影像制作成碟片的编辑过程。但是随着智能移动终端的处理能力越来越好,即时视频编辑成为发展的需求,通过智能移动终端对拍摄的短视频进行编辑成为新的需求。
现有技术中,用户使用移动终端进行视频编辑时,需要在本地存储空间或从网络服务器下载相关的视频编辑素材,用户通过素材的缩略图或者动态展示窗口浏览编辑素材,选定编辑素材后通过调取该编辑素材来完成视频素材的添加。
但是,本发明创造的发明人在研究中发现,由于编辑素材的多样性,用户能够获取和选择的视频编辑素材数据很多,用户需要从大量数据中确定当前视频编辑过程中所要使用的编辑素材,需要耗费较多的时间进行筛选,在网络环境下获取编辑素材,则更需要用户耗费较多的数据流量对编辑素材进行浏览筛选。因此,现有技术中,通过用户海量浏览的方式确定视频编辑素材的方法效率低下,急需改进。
发明内容
本发明实施例提供一种能够根据编辑视频所处环境的类别进行视频编辑素材获取的视频编辑推送方法、系统及智能移动终端。
为解决上述技术问题,本发明创造的实施例采用的一个技术方案是:提供一种视频编辑推送方法,包括下述步骤:
获取用户待执行的编辑指令;
根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;
根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。
可选地,所述根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处 环境的分类信息的步骤,具体包括下述步骤:
在视频拍摄状态下定时获取拍摄视频的多个帧画面;
将所述多个帧画面分别输入到所述环境分类模型中,获取所述多个帧画面对应的多个分类信息,并对所述多个分类信息进行统计;
根据统计结果将所述多个分类信息中分布率最高的分类信息标记为所述拍摄视频的分类信息。
可选地,所述根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息的步骤,具体包括下述步骤:
定时获取所述编辑视频的多个帧画面;
比对所述多个帧画面中图像表征的环境信息是否一致;
当所述多个帧画面中图像表征的环境信息不一致时,对多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
可选地,所述根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境的步骤,具体包括下述步骤:
根据所述统计分类目录分别获取所述视频编辑素材;
获取所述统计分类目录表征的环境信息在所述编辑视频中的摄入时长,并根据所述摄入时长对所述视频编辑素材进行降幂排布。
可选地,所述比对所述多个帧画面中图像表征的环境信息是否一致的步骤之后,还包括下述步骤:
当所述多个帧画面中图像表征的环境信息不一致时,获取相同环境信息的帧画面在所述编辑视频的片段时长;
对所述多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
可选地,所述根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境的步骤,具体包括下述步骤:
获取所述编辑视频的编辑时段位置;
确定所述编辑时段位置位于所述编辑视频的某一所述片段时长内;
获取与所述片段时长环境信息相对应的视频编辑素材。
可选地,所述环境分类模型具体为训练至收敛状态的卷积神经网络模型,所述卷积神经网络模型被训练用于对图像环境进行分类。
为解决上述技术问题,本发明实施例还提供一种视频编辑推送系统,包括:
获取模块,用于获取用户待执行的编辑指令;
处理模块,用于根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;
执行模块,用于根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。
可选地,所述视频编辑系统还包括:
第一获取子模块,用于在视频拍摄状态下定时获取拍摄视频的多个帧画面;
第一统计子模块,用于将所述多个帧画面分别输入到所述环境分类模型中,获取所述多个帧画面对应的多个分类信息,并对所述多个分类信息进行统计;
第一处理子模块,用于根据统计结果将所述多个分类信息中分布率最高的分类信息标记为所述拍摄视频的分类信息。
可选地,所述视频编辑系统还包括:
第二获取子模块,用于定时获取所述编辑视频的多个帧画面;
第一比对子模块,用于比对所述多个帧画面中图像表征的环境信息是否一致;
第二处理子模块,用于当所述多个帧画面中图像表征的环境信息不一致时,对多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
可选地,所述视频编辑系统还包括:
第三获取子模块,用于根据所述统计分类目录分别获取所述视频编辑素材;
第一排序子模块,用于获取所述统计分类目录表征的环境信息在所述编辑视频中的摄入时长,并根据所述摄入时长对所述视频编辑素材进行降幂排布。
可选地,所述视频编辑系统还包括:
第三处理子模块,用于当所述多个帧画面中图像表征的环境信息不一致时,获取相同环境信息的帧画面在所述编辑视频的片段时长;
第二统计子模块,用于对所述多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
可选地,所述视频编辑系统还包括:
第四获取子模块,用于获取所述编辑视频的编辑时段位置;
第一确定子模块,用于确定所述编辑时段位置位于所述编辑视频的某一所述片段时长内;
第五获取子模块,用于获取与所述片段时长环境信息相对应的视频编辑素材。
可选地,所述环境分类模型具体为训练至收敛状态的卷积神经网络模型,所述卷积神经网络模型被训练用于对图像环境进行分类。
为解决上述技术问题,本发明实施例还提供一种智能移动终端,包括:
一个或多个处理器;
存储器;
一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述所述的视频编辑推送方法。
本发明实施例的有益效果是:通过将编辑视频中抽取的帧画面图像,输入到已经训练完成的环境分类模型中,获取该帧画面所处环境的分类信息,并以该分类信息为条件,获取与该分类信息相配合或者相同类型的视频编辑素材,供用户选择使用,采用该方法能够通过分析帧画面中环境因素,并以该环境因素为过滤条件,对海量视频编辑素材进行初步筛选,提供给用户适合于在该环境的视频编辑素材,能够使用户方便快捷的获得适配度较高的视频编辑素材,提高了推送的效率。同时,也提高了视频编辑素材的投放精准度,减少用户数据流量的消耗。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例视频编辑推送方法的基本流程示意图;
图2为本发明实施例统一多个分类信息的一种流程示意图;
图3为本发明实施例多分类信息的一种处理流程示意图;
图4为本发明实施例对视频编辑素材进行排序的一种流程示意图;
图5为本发明实施例获取环境信息片段时长的流程示意图;
图6为本发明实施例根据编辑位置调整视频编辑素材推送结果的一种流程示意图;
图7为本发明实施例视频编辑推送系统基本结构框图;
图8为本发明实施例智能移动终端基本结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的 附图,对本发明实施例中的技术方案进行清楚、完整地描述。
在本发明的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例
需要指出的是本实施方式中的环境分类模型为深度学习技术中的卷积神经网络模型。
卷积神经网络主要用来识别位移、缩放及其他形式扭曲不变性的二维图形。由于卷积神经网络的特征检测层通过训练数据进行学习,所以在使用卷积神经网络时,避免了显示的特征抽取,而隐式地从训练数据中进行学习。
卷积神经网络的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是特征映射层,网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。
本实施方式中,卷积神经网络模型被训练用于对图像环境进行分类,其训练过程为:收集图像作为训练样本,在训练之前通过人为标记,标记图像中的环境信息(如:雨、雾、晴朗、风沙或昏暗等),并将该信息作为该图像的期望输出。将标记完成的训练样本输入到卷积神经网络模型中,输入后卷积神经网络模型输出该训练样本数据的激励输出,比对期望输出与激励输出是否一致,不一致时,通过反向算法调整卷积神经网络模型的权值,以校正卷积神经网络模型的输出。然后将该训练忘本再输入到卷积神经网络模型中,获取新一轮的激励输出,然后再作比较,反复循环直至期望输出 与激励输出一致时结束。训练时采用大量的训练样本如100万张图片,当卷积神经网络模型被训练至收敛时,识别成功率极高,能够快速识别待识别图片中的环境信息。
当卷积神经网络模型被训练至收敛后,作为环境分类模型对编辑视频中的环境信息进行识别,并根据识别结果获取相应的视频编辑素材,具体请参阅图1,图1为本实施例视频编辑推送方法的基本流程示意图。
如图1所示,视频编辑推送方法包括下述步骤:
S1000、获取用户待执行的编辑指令;
用户使用智能移动终端对拍摄或者本地存储的视频进行编辑,进入到编辑状态之前,接收用户通过手指或触控笔发送的点击或滑动指令。接收用户的点击或滑动指令后,进入视频编辑素材的获取阶段。
S2000、根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;
根据用户指令进入到视频编辑素材的获取阶段后,获取编辑视频的至少一个帧画面。
具体地,本实施方式中编辑视频是存储在用户智能移动终端本地存储空间内的待编辑视频,或者是通过智能移动终端的摄像头,正在进行拍摄录制的视频,即编辑视频能够是已经拍摄完成或者还处于拍摄状态中的视频资料。
编辑视频由若干帧画面图像组成,进入到视频编辑素材的获取阶段后通过随机抽取,或者通过定时抽取的方式(如每隔1s抽取一个帧画面)抽取帧画面。
将获取的帧画面输入到环境分类模型中,获取表征帧画面所处环境的分类信息。
S3000、根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。
获取到能够表征编辑视频所处环境的分类信息后,根据该分类信息获取视频编辑素材。视频编辑素材是专用于进行视频或图片编辑的素材,在制作时根据各视频编辑素材内容会赋予该素材适用于何种环境的附属说明(如通过在素材上设置索引标签的方式)。进行获取时以分类信息为检索关键字,就能够将与其有关联关系能够配合于帧画面的图像环境的视频编辑素材进行召回。
智能移动终端获取视频编辑素材的方式有多种,其一,获取时通过在本地存储空间内直接筛选和提取;其二,获取时通过将分类信息发送至服务器端,服务器端通过在网络数据库中进行检索召回,然后将检索得到的数据发送至智能移动终端。
视频编辑素材配合于帧画面的图像环境。例如:检测到帧画面环境雷雨天气时,向用户推送雷电、大风和雨伞等与雨天环境配合的视频编辑素材;检测到帧画面环境为明亮晴朗的环境时,向用户推送太阳、笑脸或阳光开朗的语录等,但视频编辑素材不局限与此,根据应用场景的不同,能够开发出与任何环境信息相配合的编辑素材供用户进行选择使用。
视频编辑素材能够是(不限于):视频素材、文字素材、图像素材或者音乐素材。
上述实施方式通过将编辑视频中抽取的帧画面图像,输入到已经训练完成的环境分类模型中,获取该帧画面所处环境的分类信息,并以该分类信息为条件,获取与该分类信息相配合或者相同类型的视频编辑素材,供用户选择使用,采用该方法能够通过分析帧画面中环境因素,并以该环境因素为过滤条件,对海量视频编辑素材进行初步筛选,提供给用户适合于在该环境的视频编辑素材,能够使用户方便快捷的获得适配度较高的视频编辑素材,提高了推送的效率。同时,也提高了视频编辑素材的投放精准度,减少用户数据流量的消耗。
在一些实施方式中,进行帧画面提取时,获取编辑视频的多张帧画面图像,当编辑视频中进行环境场景的变化时,获取的多张帧画面的环境信息不同时,需要对编辑帧画面的分类结果进行统一。具体请参阅图2,图2为本实施例统一多分类信息的一种流程示意图。
如图2所示,步骤S2000具体包括下述步骤:
S2110、在视频拍摄状态下定时获取拍摄视频的多个帧画面;
本实施方式中,编辑视频为通过智能移动终端的摄像头,正在进行拍摄录制的视频,即编辑视频是处于拍摄状态中的视频资料。
通过在视频拍摄状态下访问缓存空间获取帧画面图像,通过定时抽取的方式(如每隔1s抽取一个帧画面)的方法获取拍摄视频的多个帧画面。
S2120、将所述多个帧画面分别输入到所述环境分类模型中,获取所述多个帧画面对应的多个分类信息,并对所述多个分类信息进行统计;
根据获取的时间先后关系,依次将多个帧画面图像分别输入到环境分类模型中,获取多个帧画面对应的多个分类信息。
获取多个分类信息后,根据分类信息的种类对多个帧画面进行统计。
S2130、根据统计结果将所述多个分类信息中分布率最高的分类信息标记为所述拍摄视频的分类信息。
统计完成后,获取统计结果中占比最高的分类信息种类,(如十个帧画面中,分类 信息中属于下雪的环境分类占比为五,则下雪的环境分类信息为占比最高的分类信息种类),并将该分类信息标记为拍摄视频的分类信息,以方便对拍摄视频进行编辑时,获取视频编辑素材。
本实施方式中,通过在视频拍摄状态下,对该视频的环境信息进行分类,能够有效地节约用户的编辑时间。对于多个分类结果进行统计,方便确定编辑视频的主要环境信息。
在一些实施方式中,编辑视频的时间长度较长,在一个较大的时间跨度内,每一种环境的切换时间也相对较长,因此,在进行视频编辑时,需要分别对几个环境信息的片段进行编辑,具体请参阅图3,图3为本实施例多分类信息的一种处理流程示意图。
如图3所示,步骤S2000具体包括下述步骤:
S2210、定时获取所述编辑视频的多个帧画面;
通过定时抽取的方式(如每隔5s抽取一个帧画面)的方法获取编辑视频的多个帧画面。本实施方式中的编辑视频能够是已经拍摄完成或者还处于拍摄状态中的视频资料。
S2220、比对所述多个帧画面中图像表征的环境信息是否一致;
根据获取的时间先后关系,依次将多个帧画面图像分别输入到环境分类模型中,获取多个帧画面对应的多个分类信息。
分别获取多个分类信息后,比对分类信息是否一致,不一致时,根据分类信息的种类进行合并统计。
S2230、当所述多个帧画面中图像表征的环境信息不一致时,对多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
根据比对结果当多个帧画面中图像表征的环境信息有多个时,根据分类信息的种类进行统计,统计分类目录即为编辑视频的分类种类,统计完成后将统计分类目录标记为编辑视频的分类信息。
在一些实施方式中,当编辑视频存在多个分类信息,并通过统计后得到分类目录作为编辑视频的分类信息后编辑视频存在多个分类信息,多个分类信息所对应的视频编辑素材召回后,需要对视频编辑素材进行排序,然后进行呈现。具体请参阅图4,图4为本实施例对视频编辑素材进行排序的一种流程示意图。
如图4所示,步骤S3000具体包括下述步骤:
S3110、根据所述统计分类目录分别获取所述视频编辑素材;
根据分类目录分别获取视频编辑素材,例如,编辑视频中环境切换共有四个场景, 对应的编辑视频的环境分类信息也为四个分类目录,通过四个分类目录分别获取四组视频编辑素材,假设每一组素材中限定素材的数量为3件,则根据编辑视频的分类信息共获得十二件视频编辑素材。
S3120、获取所述统计分类目录表征的环境信息在所述编辑视频中的摄入时长,并根据所述摄入时长对所述视频编辑素材进行降幂排布。
获取视频编辑素材后,获取分类目录表征的环境信息在编辑视频中的摄入时长,由于帧画面的提取是按照定时提取的方式,故编辑视频中的环境信息发生变化时,通过统计前后两个不一致的分类信息间隔的时间,就能够获取到某一环境在编辑视频中的摄入时长。
统计出各分类目录表征的环境信息在编辑视频中的摄入时长,并根据摄入时长对视频编辑素材进行降幂排布。
例如,编辑视频中环境切换共有四个场景,对应的编辑视频的环境分类信息也为四个分类目录,通过四个分类目录分别获取四组视频编辑素材,假设每一组素材中限定素材的数量为3件,则根据编辑视频的分类信息共获得十二件视频编辑素材。
通过时长统计,第一分类目录表征的环境在编辑视频中的摄入时长为20s,第二分类目录表征的环境在编辑视频中的摄入时长为50s,,第三分类目录表征的环境在编辑视频中的摄入时长为30s,第四分类目录表征的环境在编辑视频中的摄入时长为15s。则进行排序时将与第一分类目录配合的3件视频编辑素材排布在前三位进行展示,以此类推,与第四分类目录配合的3件视频编辑素材排布在最后。
通过采用不同环境信息在编辑视频中时长的长短,对视频编辑材料进行排序,由于时长较长的环境信息,能够被选为编辑区域及编辑动作实际发生的区域概率较大,故根据摄入时长对视频编辑材料进行降幂排列,实现精准推送,提高了编辑效率。
在一些实施方式中,当编辑视频时长较长时,需要在获知编辑实际发生的位置后,调整推送给用户的视频编辑素材。具体请参阅图5,图5为本实施例获取环境信息片段时长的流程示意图。
如图5所示,步骤S3000之后还包括下述步骤:
S3210、当所述多个帧画面中图像表征的环境信息不一致时,获取相同环境信息的帧画面在所述编辑视频的片段时长;
根据比对结果当多个帧画面中图像表征的环境信息有多个时,获取相同环境信息的帧画面在编辑视频的片段时长。具体地,获取视频编辑素材后,获取分类目录表征的环境信息在编辑视频中的片段时长,由于帧画面的提取是按照定时提取的方式,故 编辑视频中的环境信息发生变化时,通过统计前后两个不一致的分类信息间隔的时间,就能够获取到某一环境在编辑视频中的片段时长。如,检测前后两个不一致的分类信息分别发生在1:20s和1:50s,则证明在前的分类信息表征的环境信息在编辑视频中发生的片段时长在1:20s-1:50s之间。
S3220、对所述多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
根据比对结果当多个帧画面中图像表征的环境信息有多个时,根据分类信息的种类进行统计,统计分类目录即为编辑视频的分类种类,统计完成后将统计分类目录标记为编辑视频的分类信息。
请参阅图6,图6为本实施例根据编辑位置调整视频编辑素材推送结果的一种流程示意图。
如图6所示,步骤S3000具体包括下述步骤:
S3310、获取所述编辑视频的编辑时段位置;
使用智能移动终端进行视频编辑时,需要选定视频的编辑位置,即对哪一时刻或哪一长度的视频进行编辑,在用户选定确定的编辑位置后,读取用户确定的编辑位置所在的时刻。
S3320、确定所述编辑时段位置位于所述编辑视频的某一所述片段时长内;
将编辑位置发生的时刻与通过环境信息的持续时间得来的片段时长进行比较,确定该编辑位置发生的时刻位于编辑视频的哪一个环境信息持续摄入的片段时长内。如编辑时刻发生在1:25s,而编辑视频中1:10s-1:40s的环境信息为“下雪时”则确定编辑时刻发生在“下雪”的环境中。
S3330、获取与所述片段时长环境信息相对应的视频编辑素材。确定编辑时刻位于编辑视频的哪一个环境信息持续摄入的片段时长内后,获取与该片段时长表征的环境信息配合的视频编辑素材,供用户选用。
通过先对编辑视频的分类信息进行获取,然后分类信息进行统计,并获取各分类信息表征的环境信息在编辑视频中的摄入时长,当获知编辑实际发生时刻后,通过比对时刻所在的片段时长,并调用配合于该片段时长内环境信息的视频编辑素材。实现了视频编辑素材的快速推送。
为解决上述技术问题,本发明实施例还提供一种视频编辑推送系统。具体请参阅图7,图7为本实施例视频编辑推送系统基本结构框图。
如图7所示,一种视频编辑推送系统,包括:获取模块2100、处理模块2200和执 行模块2300。其中,获取模块2100用于获取用户待执行的编辑指令;处理模块2200用于根据编辑指令获取编辑视频的至少一个帧画面,将帧画面输入到预设的环境分类模型中,并获取环境分类模型输出的表征帧画面所处环境的分类信息;执行模块2300用于根据分类信息获取视频编辑素材,以使视频编辑素材配合于帧画面的图像环境。
本实施方式视频编辑推送系统通过将编辑视频中抽取的帧画面图像,输入到已经训练完成的环境分类模型中,获取该帧画面的环境分类信息,并以该环境分类信息为条件,获取与该分类信息相配合或者相同类型的视频编辑素材,供用户选择使用,采用该方法能够通过分析帧画面中环境因素,并以该环境因素为过滤条件,对海量编辑素材进行初步筛选,提供给用户适合于在该环境的视频编辑素材,能够使用户方便快捷的获得适配度较高的编辑素材,提高了推送的效率。同时,也提高了编辑素材的投放精准度,减少用户数据流量的消耗。
在一些实施方式中,视频编辑系统还包括:第一获取子模块、第一统计子模块和第一处理子模块。其中,第一获取子模块用于在视频拍摄状态下定时获取拍摄视频的多个帧画面;第一统计子模块用于将多个帧画面分别输入到环境分类模型中,获取多个帧画面对应的多个分类信息,并对多个分类信息进行统计;第一处理子模块用于根据统计结果将多个分类信息中分布率最高的分类信息标记为拍摄视频的分类信息。
在一些实施方式中,视频编辑系统还包括:第二获取子模块、第一比对子模块和第二处理子模块。其中,第二获取子模块用于定时获取编辑视频的多个帧画面;第一比对子模块用于比对多个帧画面中图像表征的环境信息是否一致;第二处理子模块用于当多个帧画面中图像表征的环境信息不一致时,对多个分类信息进行统计,根据统计结果将统计分类目录标记为编辑视频的分类信息。
在一些实施方式中,视频编辑系统还包括:第三获取子模块和第一排序子模块。其中,第三获取子模块用于根据统计分类目录分别获取视频编辑素材;第一排序子模块用于获取统计分类目录表征的环境信息在编辑视频中的摄入时长,并根据摄入时长对视频编辑素材进行降幂排布。
在一些实施方式中,视频编辑系统还包括:第三处理子模块和第二统计子模块。其中,第三处理子模块用于当多个帧画面中图像表征的环境信息不一致时,获取相同环境信息的帧画面在编辑视频的片段时长;第二统计子模块用于对多个分类信息进行统计,根据统计结果将统计分类目录标记为编辑视频的分类信息。
在一些实施方式中,视频编辑系统还包括:第四获取子模块、第一确定子模块和第五获取子模块,其中,第四获取子模块用于获取编辑视频的编辑时段位置;第一确 定子模块用于确定编辑时段位置位于编辑视频的某一片段时长内;第五获取子模块用于获取与片段时长环境信息相对应的视频编辑素材。
在一些实施方式中,环境分类模型具体为训练至收敛状态的卷积神经网络模型,卷积神经网络模型被训练用于对图像环境进行分类。
本实施例还提供一种智能移动终端。具体请参阅图8,图8为本实施例智能移动终端基本结构示意图。
需要指出的是本实施列中,智能移动终端的存储器1520内存储用于实现本实施例中视频编辑推送方法中的所有程序,处理器1580能够调用该存储器1520内的程序,执行上述视频编辑推送方法所列举的所有功能。由于智能移动终端实现的功能在本实施例中的视频编辑推送方法进行了详述,在此不再进行赘述。
智能移动终端在进行视频编辑时,将表征模板视频时长的时间轴覆盖在编辑视频的进度条上,通过观察时间轴与进度条之间的相对位置关系就能够直观的获得模板视频在编辑视频中的添加位置,精简了编辑区域,减小了编辑区域所占区域的面积。同时,用户通过调整时间轴在进度条上的相对位置,就能够模板视频在编辑视频中的添加位置,精简后的编辑区域为时间轴容器的设计提供了足够大的空间,因此,方便用户调节编辑,降低编辑控制难度,提高了编辑的准确性和操作成功率。
本发明实施例还提供了智能移动终端,如图8所示,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该终端可以为包括智能移动终端、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等任意终端设备,以终端为智能移动终端为例:
图8示出的是与本发明实施例提供的终端相关的智能移动终端的部分结构的框图。参考图8,智能移动终端包括:射频(Radio Frequency,RF)电路1510、存储器1520、输入单元1530、显示单元1540、传感器1550、音频电路1560、无线保真(wireless fidelity,Wi-Fi)模块1570、处理器1580、以及电源1590等部件。本领域技术人员可以理解,图8中示出的智能移动终端结构并不构成对智能移动终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图8对智能移动终端的各个构成部件进行具体的介绍:
RF电路1510可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1580处理;另外,将设计上行的数据发送给基站。通常,RF电路1510包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声 放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路1510还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器1520可用于存储软件程序以及模块,处理器1580通过运行存储在存储器1520的软件程序以及模块,从而执行智能移动终端的各种功能应用以及数据处理。存储器1520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声纹播放功能、图像播放功能等)等;存储数据区可存储根据智能移动终端的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元1530可用于接收输入的数字或字符信息,以及产生与智能移动终端的用户设置以及功能控制有关的键信号输入。具体地,输入单元1530可包括触控面板1531以及其他输入设备1532。触控面板1531,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1531上或在触控面板1531附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1531可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1580,并能接收处理器1580发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1531。除了触控面板1531,输入单元1530还可以包括其他输入设备1532。具体地,其他输入设备1532可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元1540可用于显示由用户输入的信息或提供给用户的信息以及智能移动终端的各种菜单。显示单元1540可包括显示面板1541,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1541。进一步的,触控面板1531可覆盖显示面板1541,当触控面板1531检测到在其上或附近的触摸操作后,传送给处理器1580以确定触摸事件 的类型,随后处理器1580根据触摸事件的类型在显示面板1541上提供相应的视觉输出。虽然在图8中,触控面板1531与显示面板1541是作为两个独立的部件来实现智能移动终端的输入和输入功能,但是在某些实施例中,可以将触控面板1531与显示面板1541集成而实现智能移动终端的输入和输出功能。
智能移动终端还可包括至少一种传感器1550,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1541的亮度,接近传感器可在智能移动终端移动到耳边时,关闭显示面板1541和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别智能移动终端姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于智能移动终端还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路1560、扬声器1561,传声器1562可提供用户与智能移动终端之间的音频接口。音频电路1560可将接收到的音频数据转换后的电信号,传输到扬声器1561,由扬声器1561转换为声纹信号输出;另一方面,传声器1562将收集的声纹信号转换为电信号,由音频电路1560接收后转换为音频数据,再将音频数据输出处理器1580处理后,经RF电路1510以发送给比如另一智能移动终端,或者将音频数据输出至存储器1520以便进一步处理。
Wi-Fi属于短距离无线传输技术,智能移动终端通过Wi-Fi模块1570可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图8示出了Wi-Fi模块1570,但是可以理解的是,其并不属于智能移动终端的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器1580是智能移动终端的控制中心,利用各种接口和线路连接整个智能移动终端的各个部分,通过运行或执行存储在存储器1520内的软件程序和/或模块,以及调用存储在存储器1520内的数据,执行智能移动终端的各种功能和处理数据,从而对智能移动终端进行整体监控。可选的,处理器1580可包括一个或多个处理单元;优选的,处理器1580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1580中。
智能移动终端还包括给各个部件供电的电源1590(比如电池),优选的,电源可以通过电源管理系统与处理器1580逻辑相连,从而通过电源管理系统实现管理充电、放 电、以及功耗管理等功能。
尽管未示出,智能移动终端还可以包括摄像头、蓝牙模块等,在此不再赘述。
需要说明的是,本发明的说明书及其附图中给出了本发明的较佳的实施例,但是,本发明可以通过许多不同的形式来实现,并不限于本说明书所描述的实施例,这些实施例不作为对本发明内容的额外限制,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。并且,上述各技术特征继续相互组合,形成未在上面列举的各种实施例,均视为本发明说明书记载的范围;进一步地,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,而所有这些改进和变换都应属于本发明所附权利要求的保护范围。

Claims (10)

  1. 一种视频编辑推送方法,其特征在于,包括下述步骤:
    获取用户待执行的编辑指令;
    根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;
    根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。
  2. 根据权利要求1所述的视频编辑推送方法,其特征在于,所述根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息的步骤,具体包括下述步骤:
    在视频拍摄状态下定时获取拍摄视频的多个帧画面;
    将所述多个帧画面分别输入到所述环境分类模型中,获取所述多个帧画面对应的多个分类信息,并对所述多个分类信息进行统计;
    根据统计结果将所述多个分类信息中分布率最高的分类信息标记为所述拍摄视频的分类信息。
  3. 根据权利要求1所述的视频编辑推送方法,其特征在于,所述根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息的步骤,具体包括下述步骤:
    定时获取所述编辑视频的多个帧画面;
    比对所述多个帧画面中图像表征的环境信息是否一致;
    当所述多个帧画面中图像表征的环境信息不一致时,对多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
  4. 根据权利要求3所述的视频编辑推送方法,其特征在于,所述根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境的步骤,具体包括下述步骤:
    根据所述统计分类目录分别获取所述视频编辑素材;
    获取所述统计分类目录表征的环境信息在所述编辑视频中的摄入时长,并根据所述摄入时长对所述视频编辑素材进行降幂排布。
  5. 根据权利要求3所述的视频编辑推送方法,其特征在于,所述比对所述多个帧画面中图像表征的环境信息是否一致的步骤之后,还包括下述步骤:
    当所述多个帧画面中图像表征的环境信息不一致时,获取相同环境信息的帧画面在所述编辑视频的片段时长;
    对所述多个分类信息进行统计,根据统计结果将统计分类目录标记为所述编辑视频的分类信息。
  6. 根据权利要求5所述的视频编辑推送方法,其特征在于,所述根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境的步骤,具体包括下述步骤:
    获取所述编辑视频的编辑时段位置;
    确定所述编辑时段位置位于所述编辑视频的某一所述片段时长内;
    获取与所述片段时长环境信息相对应的视频编辑素材。
  7. 根据权利要求1-6任意一项所述的视频编辑推送方法,其特征在于,所述环境分类模型具体为训练至收敛状态的卷积神经网络模型,所述卷积神经网络模型被训练用于对图像环境进行分类。
  8. 一种视频编辑推送系统,其特征在于,包括:
    获取模块,用于获取用户待执行的编辑指令;
    处理模块,用于根据所述编辑指令获取编辑视频的至少一个帧画面,将所述帧画面输入到预设的环境分类模型中,并获取所述环境分类模型输出的表征所述帧画面所处环境的分类信息;
    执行模块,用于根据所述分类信息获取视频编辑素材,以使所述视频编辑素材配合于所述帧画面的图像环境。
  9. 根据权利要求8所述的视频编辑推送系统,其特征在于,所述视频编辑系统还包括:
    第一获取子模块,用于在视频拍摄状态下定时获取拍摄视频的多个帧画面;
    第一统计子模块,用于将所述多个帧画面分别输入到所述环境分类模型中,获取所述多个帧画面对应的多个分类信息,并对所述多个分类信息进行统计;
    第一处理子模块,用于根据统计结果将所述多个分类信息中分布率最高的分类信息标记为所述拍摄视频的分类信息。
  10. 一种智能移动终端,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行权利要求1-7任意一项所述的视频编辑推送方法。
PCT/CN2018/118373 2017-11-30 2018-11-30 视频编辑推送方法、系统及智能移动终端 WO2019105440A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/767,677 US11393205B2 (en) 2017-11-30 2018-11-30 Method of pushing video editing materials and intelligent mobile terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711239996.1A CN107959883B (zh) 2017-11-30 2017-11-30 视频编辑推送方法、系统及智能移动终端
CN201711239996.1 2017-11-30

Publications (1)

Publication Number Publication Date
WO2019105440A1 true WO2019105440A1 (zh) 2019-06-06

Family

ID=61961986

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118373 WO2019105440A1 (zh) 2017-11-30 2018-11-30 视频编辑推送方法、系统及智能移动终端

Country Status (3)

Country Link
US (1) US11393205B2 (zh)
CN (1) CN107959883B (zh)
WO (1) WO2019105440A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112738647A (zh) * 2020-12-28 2021-04-30 中山大学 一种基于多层级编码-解码器的视频描述方法及系统

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107959883B (zh) * 2017-11-30 2020-06-09 广州市百果园信息技术有限公司 视频编辑推送方法、系统及智能移动终端
CN109002857B (zh) * 2018-07-23 2020-12-29 厦门大学 一种基于深度学习的视频风格变换与自动生成方法及系统
CN109102543B (zh) * 2018-08-17 2021-04-02 深圳蓝胖子机器智能有限公司 基于图像分割的物体定位方法、设备和存储介质
CN110891191B (zh) * 2018-09-07 2022-06-07 阿里巴巴(中国)有限公司 素材选择方法、装置及存储介质
JPWO2020195557A1 (ja) * 2019-03-22 2021-12-23 富士フイルム株式会社 画像処理装置、画像処理方法、撮影装置及びプログラム
CN111866404B (zh) * 2019-04-25 2022-04-29 华为技术有限公司 一种视频编辑方法及电子设备
WO2020216096A1 (zh) * 2019-04-25 2020-10-29 华为技术有限公司 一种视频编辑方法及电子设备
CN110475157A (zh) * 2019-07-19 2019-11-19 平安科技(深圳)有限公司 多媒体信息展示方法、装置、计算机设备及存储介质
CN111491213B (zh) * 2020-04-17 2022-03-08 维沃移动通信有限公司 视频处理方法、视频处理装置及电子设备
CN111667310B (zh) * 2020-06-04 2024-02-20 上海燕汐软件信息科技有限公司 一种用于销售人员学习的数据处理方法、装置及设备
CN112118397B (zh) * 2020-09-23 2021-06-22 腾讯科技(深圳)有限公司 一种视频合成的方法、相关装置、设备以及存储介质
CN112672209A (zh) * 2020-12-14 2021-04-16 北京达佳互联信息技术有限公司 视频编辑方法和视频编辑装置
CN115442539B (zh) * 2021-06-04 2023-11-07 北京字跳网络技术有限公司 一种视频编辑方法、装置、设备及存储介质
CN115052198B (zh) * 2022-05-27 2023-07-04 广东职业技术学院 一种智慧农场的影像合成方法、装置和系统
CN115065864B (zh) * 2022-06-13 2024-05-10 广州博冠信息科技有限公司 游戏视频制作方法、分享方法、装置、介质和电子设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663794A (zh) * 2012-03-29 2012-09-12 清华大学 图像合成方法及装置
US20140101138A1 (en) * 2012-10-10 2014-04-10 Canon Kabushiki Kaisha Information processing apparatus capable of displaying list of multiple contents, control method therefor, and storage medium
CN104703043A (zh) * 2015-03-26 2015-06-10 努比亚技术有限公司 一种添加视频特效的方法和装置
CN105072337A (zh) * 2015-07-31 2015-11-18 小米科技有限责任公司 图片处理方法及装置
CN106779073A (zh) * 2016-12-27 2017-05-31 西安石油大学 基于深度神经网络的媒体信息分类方法及装置
CN107040795A (zh) * 2017-04-27 2017-08-11 北京奇虎科技有限公司 一种直播视频的监控方法和装置
CN107295362A (zh) * 2017-08-10 2017-10-24 上海六界信息技术有限公司 基于图像的直播内容筛选方法、装置、设备及存储介质
CN107959883A (zh) * 2017-11-30 2018-04-24 广州市百果园信息技术有限公司 视频编辑推送方法、系统及智能移动终端

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606084B2 (en) * 2001-06-27 2013-12-10 Knapp Investment Company Limited Method and system for providing a personal video recorder utilizing network-based digital media content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663794A (zh) * 2012-03-29 2012-09-12 清华大学 图像合成方法及装置
US20140101138A1 (en) * 2012-10-10 2014-04-10 Canon Kabushiki Kaisha Information processing apparatus capable of displaying list of multiple contents, control method therefor, and storage medium
CN104703043A (zh) * 2015-03-26 2015-06-10 努比亚技术有限公司 一种添加视频特效的方法和装置
CN105072337A (zh) * 2015-07-31 2015-11-18 小米科技有限责任公司 图片处理方法及装置
CN106779073A (zh) * 2016-12-27 2017-05-31 西安石油大学 基于深度神经网络的媒体信息分类方法及装置
CN107040795A (zh) * 2017-04-27 2017-08-11 北京奇虎科技有限公司 一种直播视频的监控方法和装置
CN107295362A (zh) * 2017-08-10 2017-10-24 上海六界信息技术有限公司 基于图像的直播内容筛选方法、装置、设备及存储介质
CN107959883A (zh) * 2017-11-30 2018-04-24 广州市百果园信息技术有限公司 视频编辑推送方法、系统及智能移动终端

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112738647A (zh) * 2020-12-28 2021-04-30 中山大学 一种基于多层级编码-解码器的视频描述方法及系统

Also Published As

Publication number Publication date
CN107959883A (zh) 2018-04-24
CN107959883B (zh) 2020-06-09
US11393205B2 (en) 2022-07-19
US20200293784A1 (en) 2020-09-17

Similar Documents

Publication Publication Date Title
WO2019105440A1 (zh) 视频编辑推送方法、系统及智能移动终端
CN108629747B (zh) 图像增强方法、装置、电子设备及存储介质
US11941883B2 (en) Video classification method, model training method, device, and storage medium
WO2019137167A1 (zh) 相册管理方法、装置、存储介质及电子设备
WO2018010512A1 (zh) 拍摄文件上传方法及装置
CN106131627B (zh) 一种视频处理方法、装置及系统
CN108289057B (zh) 视频编辑方法、装置及智能移动终端
WO2019109900A1 (zh) 视频编辑方法、装置及智能移动终端
WO2018233480A1 (zh) 照片推荐方法及相关产品
CN110166828A (zh) 一种视频处理方法和装置
US20170235445A1 (en) Electronic device for playing-playing contents and method thereof
CN111209423B (zh) 一种基于电子相册的图像管理方法、装置以及存储介质
CN105808542B (zh) 信息处理方法以及信息处理装置
CN107729946B (zh) 图片分类方法、装置、终端及存储介质
CN107831989A (zh) 一种应用程序参数调整方法及移动终端
CN111629247B (zh) 一种信息显示方法、装置及电子设备
CN110337646A (zh) 一种生成相册的方法、装置和移动终端
CN109325518A (zh) 图像的分类方法、装置、电子设备和计算机可读存储介质
CN107908770A (zh) 一种照片搜索方法及移动终端
CN108198162A (zh) 照片处理方法、移动终端、服务器、系统、存储介质
CN105512220A (zh) 图像页面输出方法及装置
CN111935516B (zh) 音频文件的播放方法、装置、终端、服务器及存储介质
CN111737520B (zh) 一种视频分类方法、视频分类装置、电子设备及存储介质
CN104239388A (zh) 媒体文件管理方法及系统
WO2020221121A1 (zh) 视频查询方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18883456

Country of ref document: EP

Kind code of ref document: A1