WO2021135286A1 - 视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质 - Google Patents

视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021135286A1
WO2021135286A1 PCT/CN2020/111032 CN2020111032W WO2021135286A1 WO 2021135286 A1 WO2021135286 A1 WO 2021135286A1 CN 2020111032 W CN2020111032 W CN 2020111032W WO 2021135286 A1 WO2021135286 A1 WO 2021135286A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
human body
processing method
video processing
Prior art date
Application number
PCT/CN2020/111032
Other languages
English (en)
French (fr)
Inventor
薛凯文
赖长明
徐永泽
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Priority to EP20908901.0A priority Critical patent/EP4086786A4/en
Priority to US17/758,179 priority patent/US12001479B2/en
Publication of WO2021135286A1 publication Critical patent/WO2021135286A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • This application relates to the field of image processing technology, and in particular to a video processing method, a video search method, a terminal device, and a computer-readable storage medium.
  • the main purpose of the embodiments of the present application is to provide a video processing method, which aims to solve the technical problems of poor relevance between keywords and segments and inaccurate descriptions when acquiring highlights by searching keywords in the prior art.
  • an embodiment of the present application provides a video processing method, including the following content:
  • the keyword is stored in association with the target video.
  • the step of obtaining the characteristic parameters of the target video includes:
  • the step of acquiring the sub-characteristic parameters of the plurality of image frames includes:
  • the sub-characteristic parameters are obtained according to the behavior characteristics and the human body characteristics of the person corresponding to the person information.
  • the step of generating keywords of the target video according to the characteristic parameters includes:
  • the behavior feature category and the identity information are set as keywords of the target video.
  • the step of obtaining the identity information corresponding to the human body feature further includes:
  • the identity information is acquired according to the preset human body characteristics corresponding to the human body characteristics.
  • the step of editing the video to be edited according to the scene and obtaining the target video includes:
  • the adjacent image frame with the scene change is used as the segmented frame
  • the step of determining whether there is a scene change in the adjacent image frame according to the grayscale image of the adjacent image frame includes:
  • an embodiment of the present application also provides a video search method, including the following content:
  • An embodiment of the present application also provides a terminal device.
  • the terminal device includes a processor, a memory, and a video processing program or a video search program that is stored on the memory and can run on the processor.
  • the video processing program is executed by the processor to implement the steps of the video processing method described above
  • the video search program is executed by the processor to implement the steps of the video search method described above.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a video processing program or a search program for the video, and the video processing program is executed by a processor to achieve the above The steps of the video processing method described above, and the steps of the video search method described above when the video search program is executed by the processor.
  • the video processing method proposed in the embodiment of the application can edit according to scene changes, which can ensure that the target video is in the same scene, can effectively improve the accuracy of identifying the feature parameters in the target video, and generate corresponding parameters based on the feature parameters of the target video. Keywords make the target video and keywords have strong relevance and high description accuracy.
  • FIG. 1 is a schematic diagram of a terminal structure of a hardware operating environment involved in a solution of an embodiment of the application;
  • FIG. 2 is a schematic flowchart of a first embodiment of a video processing method of this application
  • FIG. 3 is a schematic flowchart of a second embodiment of a video processing method of this application.
  • FIG. 4 is a schematic flowchart of a third embodiment of a video processing method of this application.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a video processing method of this application.
  • FIG. 6 is a schematic flowchart of a fifth embodiment of a video processing method of this application.
  • FIG. 7 is a schematic flowchart of a sixth embodiment of a video processing method of this application.
  • FIG. 8 is a schematic flowchart of a seventh embodiment of a video processing method of this application.
  • FIG. 9 is a schematic flowchart of an eighth embodiment of a video processing method according to this application.
  • the main solution of the embodiment of this application is: edit the video to be edited according to the scene, and obtain the target video; obtain the characteristic parameters of the target video; generate the keywords of the target video according to the characteristic parameters; The words are stored in association with the target video.
  • the embodiment of the application provides a solution to edit according to scene changes, which can ensure that the target video is in the same scene, can effectively improve the accuracy of identifying feature parameters in the target video, and generate corresponding keywords based on the feature parameters of the target video. This makes the target video and keywords have strong relevance and high description accuracy.
  • FIG. 1 is a schematic diagram of the terminal structure of the hardware operating environment involved in the solution of the embodiment of the application.
  • the execution subject of the embodiments of the present application may be a PC, or a mobile or non-removable terminal device such as a smart phone, a tablet computer, and a portable computer.
  • the terminal device may include: a processor 1001, such as a CPU, a communication bus 1002, and a memory 1003.
  • the communication bus 1002 is configured to realize the connection and communication between these components.
  • the memory 1003 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1003 may also be a storage device independent of the aforementioned processor 1001.
  • the structure of the terminal device shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or fewer components than shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 1003 as a computer storage medium may include an operating system, a video processing program, or a video search program, and the processor 1001 may be configured to call the video processing program stored in the memory 1003 and execute The following steps:
  • the keyword is stored in association with the target video.
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • the sub-characteristic parameters are obtained according to the behavior characteristics and the human body characteristics of the person corresponding to the person information.
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • the behavior feature category and the identity information are set as keywords of the target video.
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • the identity information is acquired according to the preset human body characteristics corresponding to the human body characteristics.
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • the adjacent image frame with the scene change is used as the segmented frame
  • processor 1001 may be configured to call a video processing program stored in the memory 1003, and execute the following steps:
  • processor 1001 may be configured to call the search program of the video stored in the memory 1003, and execute the following steps:
  • Search for a target video in a preset database according to the target keyword and display the target video associated with the target keyword.
  • FIG. 2 is a schematic flowchart of the first embodiment of the video processing method of this application.
  • the video processing method includes the following steps:
  • step S100 the video to be edited is edited according to the scene, and the target video is obtained;
  • the execution subject is a terminal device.
  • the video to be edited can be any editable video such as movies, TV shows, and recorded videos.
  • the preset frame rate refers to the number of video frames extracted per unit time, which can be set according to requirements, such as 50 frames per minute . It can be understood that the higher the preset frame rate, the higher the editing accuracy.
  • the scene can be determined according to the content changes of adjacent image frames among the above-mentioned multiple image frames.
  • the segmented frame corresponding to the video to be edited is determined, and then the target video is obtained.
  • the target video may be a video where any scene in the video to be edited is located.
  • the duration of the target video is determined by the scene of the video to be edited, such as 3 minutes.
  • the video to be edited can be edited into multiple target videos of different scenes.
  • the video to be edited can be edited through any of ffmpeg, shotdetect, and pyscenedetect.
  • ffmpeg Comprehensive speed and accuracy, ffmpeg method is preferred for editing.
  • Step S200 Acquire characteristic parameters of the target video
  • the characteristic parameters may include one or more of scene parameters, character information, and sound parameters. Because the target video is edited according to the scene, the scene parameters are relatively stable, such as playgrounds, buses, indoors, beaches, etc.; character information can include character behavior characteristics and identity information; sound parameters can include key information in speech One or more of, volume, tone, and noise. Scene parameters and person information can be recognized through image recognition technology, and voice parameters can be recognized through voice recognition technology.
  • Step S300 generating keywords of the target video according to the characteristic parameters
  • the acquired characteristic parameters can be matched with the pre-stored characteristic parameters in the database of the terminal device.
  • the matching degree is high, the keywords corresponding to the characteristic parameters are obtained, and then the keywords of the target video are generated.
  • keywords corresponding to the scene can be generated according to the scene parameters, such as beach; keywords corresponding to the character’s behavior characteristics and keywords of the identity information can be generated according to the character information.
  • Public figures can be keywords for identity information; voice keywords can be generated based on voice parameters, such as noise. Based on this information, the keyword "a public figure is basking in the noisy beach" can be derived.
  • Step S400 Associate and save the keyword with the target video.
  • the keywords of the target video are generated, the keywords are associated with the target video, and the target video and the keywords associated with the target video are stored in the terminal device, and can also be stored in the cloud database.
  • editing according to scene changes can ensure that the target video is in the same scene, which can effectively improve the accuracy of identifying feature parameters in the target video, and generate corresponding keywords based on the feature parameters of the target video, so that the target video is consistent with
  • the keywords have strong relevance and high accuracy of description.
  • Fig. 3 is a schematic flowchart of a second embodiment of a video processing method of this application. Fig. 3 is also a detailed flowchart of step S200. Based on the above-mentioned first embodiment, step S200 includes:
  • Step S210 extract multiple image frames of the target video
  • Step S220 Obtain multiple sub-feature parameters of the image frames
  • Step S230 Acquire feature parameters of the target video according to the sub feature parameters.
  • extracting multiple image frames from the target video at a predetermined frame rate can reduce the number of video frames processed by the terminal device, thereby improving the efficiency of acquiring the content of the target video.
  • the sub-feature parameters of each image frame can be identified one by one. Since the image frame loses sound information, the sub-feature parameters include at least one of scene parameters and character information.
  • the person information of the sub-feature parameters is mainly obtained according to each image frame.
  • the above-mentioned multiple image frames are input to the neural network model, and the characters and scenes in the multiple image frames can be extracted through a three-dimensional convolution network to obtain character information and scene information.
  • the scene parameter of any one of the multiple image frames can be used as the scene parameter of the feature parameter of the target video; the target video contains only one behavior, therefore, the character of each sub-feature parameter can be
  • the information is integrated to obtain the character information of the characteristic parameters.
  • the behavior feature of squatting may include three sub-behavior features of the target character standing, the target character bending his legs, and the target character squatting.
  • the facial features of the target person in each image frame can be obtained, and the average value of each facial feature can be calculated, such as the distance between the eyes, the size of the glasses, the thickness of the lips, etc., and the total facial features of each image frame can be obtained. .
  • the feature parameters may include at least one of scene parameters and character information.
  • FIG. 4 is a schematic flow chart of the third embodiment of the video processing method of this application. Based on the above-mentioned second embodiment, FIG. 4 is also a detailed flowchart of step S220 in FIG. Characteristic parameters include:
  • Step S221 acquiring person information in a plurality of the image frames
  • Step S222 Acquire the sub-feature parameter according to the behavior feature and the human body feature of the person corresponding to the person information.
  • the sub-feature parameters include character information.
  • the character information can include the character’s behavioral characteristics and the human body characteristics.
  • the behavioral characteristics can be any behavior of human activities, such as waving, arguing, running, etc., and can include the behavior and actions of the target character in each image frame;
  • the human body characteristics can include each At least one of the facial feature, the iris feature, and the body type feature of the target person in the image frame.
  • the neural network can identify the target person in the image frame, the position coordinates of the target person in the image frame, the start time point of the target person’s behavior, and the time point when the target person’s behavior ends. Among them, when there are multiple people on the image frame, There can be more than one target person.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a video processing method according to this application.
  • FIG. 5 is also a detailed flowchart of step S300 in FIG. 4. Based on the above third embodiment, step S300 includes:
  • Step S310 Obtain the behavior feature category corresponding to the behavior feature
  • Step S320 obtaining identity information corresponding to the human body characteristics
  • Step S330 Set the behavior feature category and the identity information as keywords of the target video.
  • the behavior characteristic category may be any of human actions, such as dancing, squatting, skating, etc.; the identity information may include one or more of the name, gender, and age of the public figure.
  • the behavior characteristics After acquiring the behavior characteristics of the target person in the target video, the behavior characteristics are classified, and the behavior characteristic category corresponding to the behavior characteristic is obtained.
  • the above-mentioned extracted multiple image frames can be input to the neural network model, the position coordinates of the target person in the image frame are identified through the neural network model, and then the target person is extracted according to the position coordinates of the target person through the three-dimensional convolution network
  • the behavior characteristics of the target person and the corresponding weights are obtained, and the behavior characteristic categories corresponding to the behavior characteristics of the target person are calculated according to the behavior characteristics of the target person and the corresponding weights.
  • the neural network model can be trained from tens of thousands of image frames with known behavior characteristics, and the behavior feature category calculated by the neural network model can be compared with the actual behavior characteristics through the loss function, and the neural network can be continuously optimized
  • the parameters of the model improve the accuracy of the neural network model in recognizing the behavior characteristics of the characters in the image frame.
  • the human body characteristics of the target person in the target video can be compared with the human body characteristics of the person with known identity information pre-stored in the terminal device. For example, facial features are compared with preset facial features, and the correlation is higher than expected. When the value is set and the correlation is the highest, the identity information of the person with high correlation is used as the identity information of the target person to obtain the identity information of the target person. Iris characteristics are similar to human body characteristics, so I won’t repeat them here.
  • the behavior feature category is set as the behavior keyword of the target video, and the identity information of the target person in the target video is set as the person keyword of the target video.
  • a man takes a box from a woman. Three image frames can be extracted to determine that the target person is a man and a woman.
  • the man’s behavior characteristics and behavior characteristics can be identified respectively.
  • the lady’s behavioral characteristics can be used to identify the target person’s identity information, such as Will Smith, through any of the target person’s body characteristics, such as face, iris, and body shape.
  • the behavior feature category and the identity information of the target person are acquired according to the behavior characteristics and the human body characteristics of the person information of the target video, the behavior feature category and the identity information are set as keywords of the target video, and the target video is identified
  • the behavior and identity information of the characters in the target video are converted into keywords in the target video, which can accurately summarize the behavior and identity of the characters in the target video.
  • FIG. 6 is a schematic flowchart of a fifth embodiment of a video processing method according to this application.
  • FIG. 6 is also a detailed flowchart of step S320 in FIG. 5.
  • step S320 includes:
  • Step S321 comparing the human body characteristics with the preset human body characteristics, and obtaining a comparison result
  • Step S322 Acquire a preset human body feature corresponding to the human body feature according to the comparison result
  • Step S323 Acquire the identity information according to the preset human body characteristics corresponding to the human body characteristics.
  • the human body features may include one or more of facial features, iris features, and body shape features.
  • the preset human body feature corresponds to the human body feature. If the human body feature is a facial feature, then the corresponding preset human feature is a preset facial feature; if the human body feature is an iris feature, then the corresponding preset human feature is a preset iris Features: If the human body feature is a body type feature, then the corresponding preset body feature is a preset body type feature; if there are multiple corresponding human body features, then the preset body feature also corresponds to multiple. Take facial features as an example.
  • the facial features of the person information are compared with the preset facial features in the database in the terminal device, wherein the identity information of the person corresponding to the preset facial features is known.
  • the comparison result can be determined according to whether the difference between the feature value of the facial feature and the feature value of the preset facial feature is greater than the preset difference, where the comparison result includes successful matching and failed matching one of.
  • the identity information of the preset facial feature corresponding to the facial feature is used as the identity information of the person with the facial feature.
  • the feature value can be a 128-dimensional vector of a human face.
  • the 128-dimensional vector of the target person in the target video can be obtained from the facial features, and then the 128-dimensional vector of the target person can be combined with the 128-dimensional vector of the preset facial feature.
  • the vector difference is performed to obtain the difference value.
  • the identity information corresponding to the preset facial feature is used as the identity information corresponding to the human body feature. If the difference between the facial features and all the preset facial features in the database is greater than the preset value, the target person corresponding to the facial feature is not a known person, and the identity corresponding to the facial feature can be obtained by gender and age Information, such as grandma.
  • the identity information of the preset human body characteristics corresponding to the human body characteristics is used as the identity information of the person of the human body characteristics.
  • the comparison result is obtained, and the preset human body characteristics corresponding to the human body characteristics are obtained, and the identity information of the predetermined human body characteristics is used as the identity information of the person of the human body characteristics. It can accurately identify the identity information of the target person in the target video.
  • FIG. 7 is a schematic flowchart of a sixth embodiment of a video processing method of this application, and FIG. 7 is also a detailed flowchart of step S100 in FIG. 6, based on any of the above-mentioned first to fifth embodiments.
  • step S100 includes:
  • Step S110 sequentially extracting multiple image frames of the to-be-edited video according to a preset frame rate
  • the preset frame rate can be set according to the needs of the designer, and the accuracy of the editing and the efficiency of the editing need to be considered comprehensively, such as 30 frames per minute.
  • the multiple image frames of the video to be edited are sequentially extracted according to the preset frame rate. It can be understood that the multiple image frames are acquired in order and at the same time interval.
  • Step S120 converting the image frame into a corresponding grayscale image
  • Step S130 Determine whether there is a scene change in the adjacent image frame according to the grayscale image of the adjacent image frame;
  • Step S140 when there is a scene change in adjacent image frames, use the adjacent image frame with the scene change as a segmented frame;
  • Step S150 Edit the video to be edited according to the divided frames to obtain the target video.
  • Each extracted image frame is converted into a grayscale image, and it can be determined whether there is a scene change by comparing the content change amount in the grayscale image of adjacent image frames. It is understandable that when the content change in the grayscale image of adjacent image frames is greater than the set value, it is considered that the scene has changed; the content change in the grayscale image of adjacent image frames is less than When it is equal to the set value, it is considered that the scene has not changed.
  • the adjacent image frame with the scene change is used as the segmented frame.
  • the previous image frame can be set as the end segmented frame of the previous target video, and the next image frame can be set as the subsequent target.
  • the first split frame of the video It is understandable that there may be multiple segmented frames in the video to be edited, which can then be segmented into target videos of different scenes.
  • the video to be edited is clipped according to the divided frames, and then the target video is obtained.
  • Fig. 8 is a schematic flowchart of a seventh embodiment of a video processing method according to this application.
  • Fig. 8 is also a detailed flowchart of step S130 in Fig. 7.
  • step S130 includes:
  • Step S131 extracting image blocks in grayscale images corresponding to adjacent image frames, respectively, where the positions and sizes of the image blocks extracted in the adjacent image frames are the same;
  • image blocks are extracted from grayscale images corresponding to adjacent image frames, where the coordinates of the upper left corner of the image block are randomly generated, and the size of the image block is also randomly generated. It can be understood that the positions and sizes of the image blocks extracted in adjacent image frames are the same, which is beneficial for subsequent comparison.
  • Step S132 acquiring the number of pixels in each preset grayscale range in each image block
  • Step S133 Obtain the absolute value of the difference between the corresponding numbers of adjacent image frames in each preset grayscale range
  • Step S134 summing the absolute values of each of the differences to obtain a sum
  • An image block is composed of pixels.
  • an image block of 10 pixels by 10 pixels includes 100 pixels.
  • the pixel has a gray value, which can be an integer between 0 and 255.
  • the preset gray scale range can be set according to requirements, such as 0-4, 5-9, 10-14, etc. It can be understood that the smaller the preset grayscale range, the greater the accuracy, but the lower the rate.
  • Each pixel corresponds to a preset grayscale range, and the number of pixels in each preset grayscale range for each image block can be obtained respectively.
  • Step S135 Determine whether the sum value is greater than a preset threshold value, wherein when the sum value is greater than the preset threshold value, it is determined that there is a scene change in the adjacent image frames.
  • the preset threshold may be a critical value set by a designer to determine whether the scene changes. It is determined whether the sum value is greater than a preset threshold to determine whether there is a scene change in adjacent image frames. When the sum value is less than or equal to the preset threshold, it is determined that there is no scene change in adjacent image frames; when the sum value is greater than the preset threshold, it is determined that there is a scene change in adjacent image frames.
  • the calculation Based on the relationship between the sum value and the preset threshold, determine whether there is a scene change in adjacent image frames, and determine whether the scene has changed through random extraction and comparison one by one, so that the video editing accuracy is high and comprehensive.
  • FIG. 9 is a schematic flowchart of an eighth embodiment of a video search method according to this application.
  • the video search method includes:
  • Step S500 obtain the target keyword input from the search interface
  • Step S600 Search for a target video in a preset database according to the target keyword, and display the target video associated with the target keyword.
  • the target video is obtained based on the above-mentioned video processing method, that is, the long video is divided into multiple short videos according to the scene.
  • the target video is stored in a preset database, and the keywords associated with the target video are also stored in Preset database.
  • the terminal device can output the search interface on the current interface, and obtain the target keywords input by the user through the search interface.
  • the target keyword may be a query sentence input by the user.
  • the preset database may include at least one of a cloud database and a local database.
  • the terminal device can perform a matching search in the preset database according to the target keyword, find the associated keyword corresponding to the target keyword in the preset database, obtain the corresponding target video according to the associated keyword, and sequentially according to the similarity of the match
  • the target video corresponding to the preset keyword is displayed on the current display interface.
  • the target keyword by acquiring the target keyword input by the user, searching for the target video in the preset database according to the target keyword, and displaying the target video corresponding to the target keyword, because the keyword corresponding to the target video itself has strong relevance , The characteristics of high description accuracy, and further, the corresponding target video is obtained by entering the target keyword, so that the search accuracy is high.
  • an embodiment of the present application also provides a terminal device.
  • the terminal device includes a processor, a memory, and a video processing program or a video search program that is stored on the memory and can run on the processor.
  • the video processing program is executed by the processor to realize the content of the video processing method embodiment described above
  • the video search program is executed by the processor to realize the content of the video search method embodiment described above .
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a video processing program or a video search program, and the video processing program is executed by a processor to achieve the above The content of the video processing method embodiment, and the content of the video search method embodiment described above when the video search program is executed by the processor.
  • the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as The ROM/RAM, magnetic disk, optical disk) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computational Linguistics (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种视频的处理方法,包括:根据场景对待剪辑视频进行剪辑,并得到目标视频(S100);获取所述目标视频的特征参数(S200);根据所述特征参数生成所述目标视频的关键词(S300);将所述关键词与所述目标视频关联保存(S400)。

Description

视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质
本申请要求2019年12月30日申请的,申请号为201911424339.3,名称为“视频的处理方法、视频的搜索方法及终端设备”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本申请涉及图像处理技术领域、尤其涉及一种视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质。
背景技术
随着互联网的普及,观众获取电影、电视剧越来越容易,由于电影以及电视剧的时长较长,观众有时候只想看某些精彩的片段,通过搜索关键词来获取精彩片段时,往往存在关键词与片段的关联度性差、描述不准确的问题。
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。
技术解决方案
本申请实施例的主要目的在于提供一种视频的处理方法,旨在解决现有技术中通过搜索关键词来获取精彩片段时,往往存在关键词与片段的关联性差、描述不准确的技术问题。
为解决上述问题,本申请实施例提供一种视频的处理方法,包括以下内容:
根据场景对待剪辑视频进行剪辑,并得到目标视频;
获取所述目标视频的特征参数;
根据所述特征参数生成所述目标视频的关键词;
将所述关键词与所述目标视频关联保存。
在一实施例中,所述获取所述目标视频的特征参数的步骤包括:
提取所述目标视频的多个图像帧;
获取多个所述图像帧的子特征参数;
根据所述子特征参数获取所述目标视频的特征参数。
在一实施例中,所述获取多个所述图像帧的子特征参数的步骤包括:
获取多个所述图像帧中的人物信息;
根据所述人物信息对应的人物的行为特征以及人体特征获取所述子特征参数。
在一实施例中,所述根据所述特征参数生成所述目标视频的关键词的步骤包括:
获取所述行为特征对应的行为特征类别;
获取所述人体特征对应的身份信息;
将所述行为特征类别以及所述身份信息设为所述目标视频的关键词。
在一实施例中,所述获取所述人体特征对应的身份信息的步骤还包括:
比对所述人体特征以及预设人体特征,并获取比对结果;
根据所述比对结果获取所述人体特征对应的预设人体特征;
根据所述人体特征对应的预设人体特征获取所述身份信息。
在一实施例中,所述根据场景对待剪辑视频进行剪辑,并得到目标视频的步骤包括:
按预设帧频依次提取所述待剪辑视频的多个图像帧;
将所述图像帧转换成对应的灰度图;
根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化;
在相邻图像帧存在场景变化时,将存在场景变化的相邻图像帧作为分割帧;
根据所述分割帧对所述待剪辑视频进行剪辑,以得到所述目标视频。
在一实施例中,所述根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化的步骤包括:
分别在相邻图像帧对应的灰度图中提取图像块,所述相邻的所述图像帧中提取的图像块的位置与大小相同;
获取每个图像块中在各个预设灰度范围内的像素点的数量;
获取各个预设灰度范围内相邻图像帧对应的数量的差值绝对值;
对各个所述差值绝对值求和,得到和值;
确定所述和值是否大于预设阈值,其中,在所述和值大于所述预设阈值时,确定相邻的所述图像帧存在场景变化。
此外,为解决上述问题,本申请实施例还提供一种视频的搜索方法,包括以下内容:
获取从搜索界面输入的目标关键词;
根据所述目标关键词搜索预设数据库中的目标视频,并显示所述目标关键词对应的目标视频。
本申请实施例还提供一种终端设备,所述终端设备包括处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的视频的处理程序或所述视频的搜索程序,所述视频的处理程序被所述处理器执行时实现如上所述的视频的处理方法的步骤,以及所述视频的搜索程序被处理器执行时实现如上所述的视频的搜索方法的步骤。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有视频的处理程序或所述视频的搜索程序,所述视频的处理程序被处理器执行时实现如上所述的视频的处理方法的步骤,以及所述视频的搜索程序被处理器执行时实现如上所述的视频的搜索方法的步骤。
本申请实施例提出的一种视频的处理方法,根据场景的变换进行剪辑,可保证目标视频处于同一场景,能有效提高识别目标视频中特征参数的准确度,根据目标视频的特征参数生成相应的关键词,使得目标视频与关键词之间关联性强、描述准确性高。
附图说明
图1为本申请实施例方案涉及的硬件运行环境的终端结构示意图;
图2为本申请视频的处理方法第一实施例的流程示意图;
图3为本申请视频的处理方法第二实施例的流程示意图;
图4为本申请视频的处理方法第三实施例的流程示意图;
图5为本申请视频的处理方法第四实施例的流程示意图;
图6为本申请视频的处理方法第五实施例的流程示意图;
图7为本申请视频的处理方法第六实施例的流程示意图;
图8为本申请视频的处理方法第七实施例的流程示意图;
图9为本申请视频的处理方法第八实施例的流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
应当理解,此处所描述的具体实施方式仅仅用以解释本申请,并不用于限定本申请。
本申请实施例的主要解决方案是:根据场景对待剪辑视频进行剪辑,并得到目标视频;获取所述目标视频的特征参数;根据所述特征参数生成所述目标视频的关键词;将所述关键词与所述目标视频关联保存。
由于现有技术中通过搜索关键词来获取精彩片段时,往往存在关键词与片段的关联性差、描述不准确的技术问题。
本申请实施例提供一种解决方案,根据场景的变换进行剪辑,可保证目标视频处于同一场景,能有效提高识别目标视频中特征参数的准确度,根据目标视频的特征参数生成相应的关键词,使得目标视频与关键词之间关联性强、描述准确性高。
如图1所示,图1为本申请实施例方案涉及的硬件运行环境的终端结构示意图。
本申请实施例的执行主体可以是PC,也可以是智能手机、平板电脑、便携式计算机等可移动式或不可移动式终端设备。
如图1所示,该终端设备可以包括:处理器1001,例如CPU,通信总线1002,存储器1003。其中,通讯总线1002配置为实现这些组件之间的连接通信。存储器1003可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),如磁盘存储器。存储器1003还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1示出的终端设备的结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1003可以包括操作系统、视频的处理程序或视频的搜索程序,而处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
根据场景对待剪辑视频进行剪辑,并得到目标视频;
获取所述目标视频的特征参数;
根据所述特征参数生成所述目标视频的关键词;
将所述关键词与所述目标视频关联保存。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
提取所述目标视频的多个图像帧;
获取多个所述图像帧的子特征参数;
根据所述子特征参数获取所述目标视频的特征参数。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
获取多个所述图像帧中的人物信息;
根据所述人物信息对应的人物的行为特征以及人体特征获取所述子特征参数。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
获取所述行为特征对应的行为特征类别;
获取所述人体特征对应的身份信息;
将所述行为特征类别以及所述身份信息设为所述目标视频的关键词。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
比对所述人体特征以及预设人体特征,并获取比对结果;
根据所述比对结果获取所述人体特征对应的预设人体特征;
根据所述人体特征对应的预设人体特征获取所述身份信息。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
按预设帧频依次提取所述待剪辑视频的多个图像帧;
将所述图像帧转换成对应的灰度图;
根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化;
在相邻图像帧存在场景变化时,将存在场景变化的相邻图像帧作为分割帧;
根据所述分割帧对所述待剪辑视频进行剪辑,以得到所述目标视频。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的处理程序,并执行以下步骤:
分别在相邻图像帧对应的灰度图中提取图像块,所述相邻的所述图像帧中提取的图像块的位置与大小相同;
获取每个图像块中在各个预设灰度范围内的像素点的数量;
获取各个预设灰度范围内相邻图像帧对应的数量的差值绝对值;
对各个所述差值绝对值求和,得到和值;
确定所述和值是否大于预设阈值,其中,在所述和值大于所述预设阈值时,确定相邻的所述图像帧存在场景变化。
进一步地,处理器1001可以配置为调用存储器1003中存储的视频的搜索程序,并执行以下步骤:
获取从搜索界面输入的目标关键词;
根据所述目标关键词搜索预设数据库中的目标视频,并显示所述目标关键词关联的目标视频。
基于上述终端的结构,提出本申请第一实施例,参照图2,图2为本申请视频的处理方法第一实施例的流程示意图,所述视频的处理方法包括以下步骤:
步骤S100,根据场景对待剪辑视频进行剪辑,并得到目标视频;
在本实施例中,执行主体为终端设备。待剪辑视频可以是电影、电视剧以及录制的视频等任意可剪辑的视频。
在获取到待剪辑视频后,按预设帧频提取待剪辑视频的多个图像帧,其中,预设帧频是指单位时间内提取的视频帧数,可根据需求设置,如50帧每分钟。可以理解的是,预设帧频越大,剪辑的精度越高。
场景可根据上述多个图像帧中相邻的图像帧的内容变化进行确定。在场景变换时,确定待剪辑视频对应的分割帧,进而得到目标视频。可以理解的是,目标视频可以是待剪辑视频中的任一场景所在的视频。目标视频的时长由待剪辑视频的场景所决定,如3分钟。
可选地,待剪辑视频可被剪辑成多个不同场景的目标视频。
可选地,剪辑待剪辑视频可通过ffmpeg、shotdetect以及pyscenedetect中的任一种。综合速度以及准确度,优先选用ffmpeg方法进行剪辑。
步骤S200,获取所述目标视频的特征参数;
根据场景的变换剪辑待剪辑视频,得到目标视频。特征参数可包括场景参数、人物信息以及声音参数中的一种或多种。由于目标视频根据场景进行剪辑,所以场景参数是相对稳定的,例如游乐场、公交车上、室内、沙滩等;人物信息可包括人物的行为特征以及身份信息;声音参数可包括语音中的关键信息、音量、音调以及噪声中的一种或多种。可通过图像识别技术识别场景参数以及人物信息,可通过语音识别技术识别声音参数。
步骤S300,根据所述特征参数生成所述目标视频的关键词;
可根据获取到的特征参数与终端设备的数据库中预存特征参数进行匹配,在匹配度高时,得到特征参数对应的关键词,进而生成目标视频的关键词。
可选地,可根据场景参数生成场景对应的关键词,如沙滩;可根据人物信息生成人物对应的行为特征的关键词以及身份信息的关键词,如晒太阳可以是行为特征的关键词,某公众人物可以是身份信息的关键词;可根据声音参数生成声音的关键词,如喧嚣。根据这些信息,可得出“某公众人物在喧嚣的沙滩晒太阳”这样的关键词。
步骤S400,将所述关键词与所述目标视频关联保存。
在生成目标视频的关键词后,将关键词与目标视频关联起来,将目标视频以及与目标视频关联的关键词保存在终端设备,也可将其保存在云端数据库。
在本实施例中,根据场景的变换进行剪辑,可保证目标视频处于同一场景,能有效提高识别目标视频中特征参数的准确度,根据目标视频的特征参数生成相应的关键词,使得目标视频与关键词之间关联性强、描述准确性高。
参照图3,图3为本申请视频的处理方法第二实施例的流程示意图,图3也是步骤S200的细化流程图,基于上述第一实施例,步骤S200包括:
步骤S210,提取所述目标视频的多个图像帧;
步骤S220,获取多个所述图像帧的子特征参数;
步骤S230,根据所述子特征参数获取所述目标视频的特征参数。
在本实施例中,按预定帧频从目标视频中提取多个图像帧,可减少终端设备处理视频帧的数量,进而提高获取目标视频的内容的效率。
可通过逐一识别各个图像帧的子特征参数。由于图像帧损失了声音信息,进而,子特征参数包括场景参数以及人物信息中的至少一个。
可选地,由于目标视频处于同一场景,场景参数是固定的,因此,主要根据各个图像帧获取子特征参数的人物信息。
可选地,将上述多个图像帧输入到神经网络模型,可通过三维卷积网络对多个图像帧中的人物以及场景进行特征提取,获取人物信息以及场景信息。
由于目标视频处于同一场景,可将多个图像帧中的任一图像帧的场景参数作为目标视频的特征参数的场景参数;目标视频只包含一种行为,因而,可将各个子特征参数的人物信息进行整合,得到特征参数的人物信息。例如,蹲下这个行为特征可包括目标人物站立、目标人物弯曲双腿以及目标人物蹲下这三个子行为特征构成。又例如,获取到各个图像帧中目标人物的脸部特征,可计算出各个脸部特征的平均值,如,眼间距、眼镜大小、嘴唇的厚薄等,可得到各个图像帧总的脸部特征。
根据子特征参数,特征参数可包括场景参数以及人物信息中的至少一个。
在本实施例中,通过提取目标视频中的多个图像帧,从各个图像帧中提取子特征参数,得到特征参数,综合多个图像帧的子特征参数,使特征参数能从整体反映目标视频的关键信息,使得对目标视频的理解的准确度更高。
参照图4,图4为本申请视频的处理方法第三实施例的流程示意图,基于上述第二实施例,图4也为图3中步骤S220的细化流程图,获取多个图像帧的子特征参数包括:
步骤S221,获取多个所述图像帧中的人物信息;
步骤S222,根据所述人物信息对应的人物的行为特征以及人体特征获取所述子特征参数。
由于根据人物信息搜索短视频或精彩片段更符合用户搜索的实际,因而,在本实施例中,子特征参数包括人物信息。
人物信息可包括人物的行为特征以及人体特征,其中,行为特征可以是人类活动的任一行为,如挥手、吵架以及跑步等,可包括各个图像帧中目标人物的行为动作;人体特征可包括各个图像帧中目标人物的脸部特征、虹膜特征以及体型特征中的至少一个。
可通过神经网络识别图像帧中的目标人物、目标人物在图像帧中的位置坐标、目标人物的行为开始时间点以及目标人物行为结束的时间点,其中,在图像帧上存在多个人物时,目标人物可以有多个。
在本实施例中,通过获取多个图像帧中的人物信息,根据人物信息对应的人物的行为特征以及人体特征获取子特征参数,将识别度高的人物信息作为特征参数,使之更符合用户的搜索逻辑,使得对目标视频的理解更加形象生动。
参照图5,图5为本申请视频的处理方法第四实施例的流程示意图,图5也为图4中步骤S300的细化流程图,基于上述第三实施例,步骤S300包括:
步骤S310,获取所述行为特征对应的行为特征类别;
步骤S320,获取所述人体特征对应的身份信息;
步骤S330,将所述行为特征类别以及所述身份信息设为所述目标视频的关键词。
在本实施例中,行为特征类别可以是人类动作的任一种,如跳舞、蹲下、溜冰等;身份信息可包括公众人物的名字、性别以及年龄中的一种或多种。
在获取到目标视频的目标人物的行为特征后,对行为特征进行分类,得到行为特征对应的行为特征类别。
可选地,可将上述提取的多个图像帧输入到神经网络模型,通过神经网络模型识别出图像帧中的目标人物的位置坐标,再通过三维卷积网络根据目标人物的位置坐标提取目标人物的行为特征,并获取相应的权值,根据目标人物的行为特征以及对应的权值计算出目标人物的行为特征对应的行为特征类别。
可选地,神经网络模型可通过上万张已知行为特征的图像帧训练而成,可通过损失函数,将神经网络模型计算出来的行为特征类别与实际行为特征进行比对,不断优化神经网络模型的参数,提高神经网络模型对图像帧中人物的行为特征识别的准确性。
可根据目标视频的目标人物的人体特征与终端设备中预存的已知身份信息的人物的人体特征进行比对,如脸部特征与预设脸部特征进行比对,得出相关度高于预设值,且相关度最高时,将相关度高的人物的身份信息作为目标人物的身份信息,得到目标人物的身份信息。虹膜特征与人体特征与此类似,在此不再赘述。
将行为特征类别设为目标视频的行为关键词,将目标视频中的目标人物的身份信息设为目标视频的人物关键词。例如,在一段目标视频中,一位男士从一位女士手中拿走一个箱子,可通过提取三张图像帧,确定目标人物为一位男士以及一位女士,可分别识别出男士的行为特征以及女士的行为特征,可通过目标人物的人体特征如脸部、虹膜以及体型中的任一种可识别出目标人物的身份信息,如威尔史密斯。将目标人物的行为特征类别设为目标视频的行为关键词,即拿箱子;将目标人物的身份信息设为目标视频的人物关键,如威尔史密斯以及一位女士。综合目标视频的行为关键词以及人物关键词,可得出目标视频的关键词为威尔史密斯从一位女士手中拿走一个箱子。
可选地,还可以将目标人物的行为开始时间点以及行为结束时间点添加至目标视频的关键词,如“11分13秒到12分14秒,威尔史密斯从一位女士手中拿走一个箱子”。
在本实施例中,根据目标视频的人物信息的行为特征以及人体特征,获取行为特征类别以及目标人物的身份信息,将行为特征类别以及身份信息设为目标视频的关键词,识别出目标视频中的人物的行为以及身份信息,并转换成目标视频的关键词,可精准概括出目标视频中的人物行为以及身份。
参照图6,图6为本申请视频的处理方法第五实施例的流程示意图,图6也为图5中步骤S320的细化流程图,基于上述第四实施例,步骤S320包括:
步骤S321,比对所述人体特征以及预设人体特征,并获取比对结果;
步骤S322,根据所述比对结果获取所述人体特征对应的预设人体特征;
步骤S323,根据所述人体特征对应的预设人体特征获取所述身份信息。
在获取到目标视频中人物信息的人体特征后,人体特征可包括脸部特征、虹膜特征以及体型特征中的一个或多个。预设人体特征与人体特征相对应,若人体特征为脸部特征,那么对应的预设人体特征为预设脸部特征;若人体特征为虹膜特征,那么对应的预设人体特征为预设虹膜特征;若人体特征为体型特征,那么对应的预设人体特征为预设体型特征;若人体特征对应有多个,那么预设人体特征也对应多个。以脸部特征为例,进行说明。
将人物信息的脸部特征与终端设备中的数据库中的预设脸部特征进行比对,其中,预设脸部特征所对应的人物的身份信息是已知的。
比对结果可根据脸部特征的特征值与预设脸部特征的特征值之间的差值是否大于预设差值,来确定比对结果,其中,比对结果包括匹配成功以及匹配失败中的一个。
在比对结果为匹配成功时,将脸部特征对应的预设脸部特征的身份信息作为脸部特征的人物的身份信息。
可选地,特征值可以是人脸的可128维向量,可通过脸部特征得到目标视频中目标人物的128维向量,再将目标人物的128维向量与预设脸部特征的128维向量做矢量差,得出差值,在差值小于或等于预设值时,则将预设脸部特征对应的身份信息作为人体特征对应的身份信息。若脸部特征与数据库中所有的预设脸部特征得出的差值均大于预设值,则脸部特征对应的目标人物不是公知人物,可通过性别、年龄来得出脸部特征对应的身份信息,如老奶奶。
进而,将人体特征对应的预设人体特征的身份信息作为人体特征的人物的身份信息。
在本实施例中,通过比对人体特征以及预设人体特征,获取比对结果,得出人体特征对应的预设人体特征,将预设人体特征的身份信息作为人体特征的人物的身份信息,可精准识别出目标视频中目标人物的身份信息。
参照图7,图7为本申请视频的处理方法第六实施例的流程示意图,图7也为图6中步骤S100的细化流程图,基于上述第一实施例至第五实施例中的任一实施例,步骤S100包括:
步骤S110,按预设帧频依次提取所述待剪辑视频的多个图像帧;
预设帧频可根据设计人员的需求进行设置,需综合考虑剪辑的精度以及剪辑的效率,如30帧每分钟。按预设帧频依次提取待剪辑视频的多个图像帧,可以理解的是,按次序以及相同的时间间隔获取多个图像帧。
步骤S120,将所述图像帧转换成对应的灰度图;
步骤S130,根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化;
步骤S140,在相邻图像帧存在场景变化时,将存在场景变化的相邻图像帧作为分割帧;
步骤S150,根据所述分割帧对所述待剪辑视频进行剪辑,以得到所述目标视频。
将提取的各个图像帧转换成灰度图,可通过比对相邻图像帧的灰度图中的内容的变化量确定是否存在场景变化。可以理解的是,在相邻的图像帧的灰度图中的内容的变化量大于设定值时,则认为场景发生变化;在相邻的图像帧的灰度图中的内容的变化量小于或等于设定值时,则认为场景没有发生变化。
在相邻的图像帧存在场景变化时,将存在场景变化的相邻图像帧作为分割帧,前一图像帧可设为前一目标视频的终止分割帧,后一图像帧可设为后以目标视频的起始分割帧。可以理解的是,待剪辑视频中可以有多个分割帧,进而能被分割成不同场景的目标视频。
根据分割帧对待剪辑视频进行剪辑,进而得到目标视频。
在本实施例中,通过提取待剪辑视频的多个图像帧,并将图像帧转化成灰度图,根据相邻图像帧的灰度图确定场景是否存在变化,在存在变化时,将存在场景变化的相邻图像帧作为分割帧,根据分割帧剪辑待剪辑视频,得到目标视频,使得剪辑精度高、简便高效。
参照图8,图8为本申请视频的处理方法第七实施例的流程示意图,图8也为图7中步骤S130的细化流程图,基于上述第六实施例,步骤S130包括:
步骤S131,分别在相邻图像帧对应的灰度图中提取图像块,所述相邻的所述图像帧中提取的图像块的位置与大小相同;
在本实施例中,分别从相邻图像帧对应的灰度图中提取图像块,其中,图像块左上角的坐标是随机生成的,图像块的大小也是随机生成的。可以理解的是,相邻图像帧中提取的图像块的位置与大小相同,有利于进行后续比对。
步骤S132,获取每个图像块中在各个预设灰度范围内的像素点的数量;
步骤S133,获取各个预设灰度范围内相邻图像帧对应的数量的差值绝对值;
步骤S134,对各个所述差值绝对值求和,得到和值;
图像块由像素点组成,如10个像素点乘以10个像素点的图像块包括100个像素点。像素点具有灰度值,可以是0到255间的整数。预设灰度范围可根据需求进行设置,如0-4,5-9,10-14等。可以理解的是,预设灰度范围越小,精度越大,但速率越低。
每个像素点对应一个预设灰度范围,可分别获取每个图像块在各个预设灰度范围内的像素点的数量。
在获取到每个图像块在各个预设灰度范围内的像素点的数量后,计算相邻图像帧的图像块在各个预设灰度范围内对应的像素点的数量的差值,并可得到各个差值的绝对值,即差值绝对值,对各个差值绝对值进行求和,得到和值。
可选地,可通过绘制每个图像块的直方图,根据直方图在各个预设灰度范围内的像素点的数量,以及在各个预设灰度范围内数量的差值的绝对值,来计算和值。
步骤S135,确定所述和值是否大于预设阈值,其中,在所述和值大于所述预设阈值时,确定相邻的所述图像帧存在场景变化。
预设阈值可以是设计人员设置的,用于确定场景是否发生变化的临界值。判断和值是否大于预设阈值来确定相邻图像帧是否存在场景变化。在和值小于或等于预设阈值时,确定相邻的图像帧不存在场景变化;在和值大于预设阈值时,确定相邻的图像帧存在场景变化。
在本实施例中,通过获取相邻图像帧的图像块,并计算图像块在各个预设灰度范围内的像素点的数量以及在各个预设灰度范围内数量的差值绝对值,计算出和值,根据和值与预设阈值之间的关系,确定相邻图像帧是否存在场景变化,通过随机提取、逐一比对确定场景是否变化,使得视频的剪辑精度高、全面。
参照图9,图9为本申请视频的搜索方法第八实施例的流程示意图,所述视频的搜索方法包括:
步骤S500,获取从搜索界面输入的目标关键词;
步骤S600,根据所述目标关键词搜索预设数据库中的目标视频,并显示所述目标关键词关联的目标视频。
在本实施中,目标视频是基于上述视频的处理方法得到的,即根据场景将长视频分割成多个短视频,其中,目标视频存储于预设数据库,与目标视频关联的关键词也保存于预设数据库。
终端设备可在当前界面上输出搜索界面,通过搜索界面获取用户输入的目标关键词。其中,目标关键词可以是用户输入的查询语句。预设数据库可包括云端数据库以及本地数据库中的至少一个。
终端设备可根据目标关键词在预设数据库中进行匹配查找,查找目标关键词在预设数据库中对应的关联的关键词,根据关联的关键词获取对应的目标视频,根据匹配的相似度高低依次在当前显示界面上显示预设关键词对应的目标视频。
在本实施例中,通过获取用户输入的目标关键词,根据目标关键词搜索预设数据库中的目标视频,并显示目标关键词对应的目标视频,由于目标视频自身对应的关键词具有关联性强、描述准确性高的特点,进而,通过输入目标关键词获取对应的目标视频,使得搜索精准度高。
此外,本申请实施例还提供一种终端设备,所述终端设备包括处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的视频的处理程序或视频的搜索程序,所述视频的处理程序被所述处理器执行时实现如上所述的视频的处理方法实施例的内容,以及所述视频的搜索程序被处理器执行时实现如上述的视频的搜索方法实施例的内容。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有视频的处理程序或视频的搜索程序,所述视频的处理程序被处理器执行时实现如上所述的视频的处理方法实施例的内容,以及所述视频的搜索程序被处理器执行时实现如上述的视频的搜索方法实施例的内容。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个计算机可读存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (14)

  1. 一种视频的处理方法,其中,所述视频的处理方法包括以下步骤:
    根据场景对待剪辑视频进行剪辑,并得到目标视频;
    获取所述目标视频的特征参数;
    根据所述特征参数生成所述目标视频的关键词;以及
    将所述关键词与所述目标视频关联保存。
  2. 如权利要求1所述的视频的处理方法,其中,所述获取所述目标视频的特征参数的步骤包括:
    提取所述目标视频的多个图像帧;
    获取多个所述图像帧的子特征参数;以及
    根据所述子特征参数获取所述目标视频的特征参数。
  3. 如权利要求2所述的视频的处理方法,其中,所述获取多个所述图像帧的子特征参数的步骤包括:
    获取多个所述图像帧中的人物信息;以及
    根据所述人物信息对应的人物的行为特征以及人体特征获取所述子特征参数。
  4. 如权利要求3所述的视频的处理方法,其中,所述根据所述特征参数生成所述目标视频的关键词的步骤包括:
    获取所述行为特征对应的行为特征类别;
    获取所述人体特征对应的身份信息;以及
    将所述行为特征类别以及所述身份信息设为所述目标视频的关键词。
  5. 如权利要求4所述的视频的处理方法,其中,还包括:
    将目标人物的行为开始时间点以及行为结束时间点添加至目标视频的关键词。
  6. 如权利要求3所述的视频的处理方法,其中,所述获取所述人体特征对应的身份信息的步骤还包括:
    比对所述人体特征以及预设人体特征,并获取比对结果;
    根据所述比对结果获取所述人体特征对应的预设人体特征;以及
    根据所述人体特征对应的预设人体特征获取所述身份信息。
  7. 如权利要求6所述的视频的处理方法,其中,所述人体特征包括脸部特征、虹膜特征以及体型特征中的一个或多个,所述预设人体特征与人体特征相对应。
  8. 如权利要求6所述的视频的处理方法,其中,还包括:
    确定所述比对结果为匹配成功,将人体特征对应的预设人体特征的身份信息作为人体特征的人物的身份信息。
  9. 如权利要求1所述的视频的处理方法,其中,所述根据场景对待剪辑视频进行剪辑,并得到目标视频的步骤包括:
    按预设帧频依次提取所述待剪辑视频的多个图像帧;
    将所述图像帧转换成对应的灰度图;
    根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化;
    在相邻图像帧存在场景变化时,将存在场景变化的相邻图像帧作为分割帧;以及
    根据所述分割帧对所述待剪辑视频进行剪辑,以得到所述目标视频。
  10. 如权利要求9所述的视频的处理方法,其中,所述根据相邻图像帧的所述灰度图确定相邻的所述图像帧是否存在场景变化的步骤包括:
    分别在相邻图像帧对应的灰度图中提取图像块,所述相邻的所述图像帧中提取的图像块的位置与大小相同;
    获取每个图像块中在各个预设灰度范围内的像素点的数量;
    获取各个预设灰度范围内相邻图像帧对应的数量的差值绝对值;
    对各个所述差值绝对值求和,得到和值;以及
    确定所述和值是否大于预设阈值,其中,在所述和值大于所述预设阈值时,确定相邻的所述图像帧存在场景变化。
  11. 如权利要求1所述的视频的处理方法,其中,所述特征参数包括场景参数、人物信息以及声音参数中的一个或多个。
  12. 一种视频的搜索方法,其中,所述视频的搜索方法包括以下步骤:
    获取从搜索界面输入的目标关键词;以及
    根据所述目标关键词搜索预设数据库中的目标视频,并显示所述目标关键词关联的目标视频,其中,所述目标视频基于权利要求1至4、6以及9至10中任一项所述的视频的处理方法得到的。
  13. 一种终端设备,其中,所述终端设备包括处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的视频的处理程序或视频的搜索程序,所述视频的处理程序被所述处理器执行时实现如权利要求1至4、6以及9至10中任一项所述的视频的处理方法的步骤,以及所述视频的搜索程序被所述处理器执行时实现如权利要求12所述视频的搜索方法的步骤。
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有视频的处理程序或视频的搜索程序,所述视频的处理程序被处理器执行时实现如权利要求1至4、6以及9至10中的任一项所述的视频的处理方法的步骤,以及所述视频的搜索程序被所述处理器执行时实现如权利要求12所述视频的搜索方法的步骤。
PCT/CN2020/111032 2019-12-30 2020-08-25 视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质 WO2021135286A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20908901.0A EP4086786A4 (en) 2019-12-30 2020-08-25 VIDEO PROCESSING METHOD, VIDEO SEARCHING METHOD, TERMINAL AND COMPUTER-READABLE STORAGE METHOD
US17/758,179 US12001479B2 (en) 2019-12-30 2020-08-25 Video processing method, video searching method, terminal device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911424339.3 2019-12-30
CN201911424339.3A CN111177470B (zh) 2019-12-30 2019-12-30 视频的处理方法、视频的搜索方法及终端设备

Publications (1)

Publication Number Publication Date
WO2021135286A1 true WO2021135286A1 (zh) 2021-07-08

Family

ID=70646548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111032 WO2021135286A1 (zh) 2019-12-30 2020-08-25 视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质

Country Status (4)

Country Link
US (1) US12001479B2 (zh)
EP (1) EP4086786A4 (zh)
CN (1) CN111177470B (zh)
WO (1) WO2021135286A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177470B (zh) 2019-12-30 2024-04-30 深圳Tcl新技术有限公司 视频的处理方法、视频的搜索方法及终端设备
CN111711771B (zh) * 2020-05-20 2022-09-30 北京奇艺世纪科技有限公司 一种图像选取方法、装置、电子设备及存储介质
CN114697700A (zh) * 2020-12-28 2022-07-01 北京小米移动软件有限公司 视频剪辑方法、视频剪辑装置及存储介质
CN113542818B (zh) * 2021-07-16 2023-04-25 北京字节跳动网络技术有限公司 一种视频展示方法、视频编辑方法及装置
CN114139015A (zh) * 2021-11-30 2022-03-04 招商局金融科技有限公司 基于关键事件识别的视频存储方法、装置、设备及介质
CN116431857B (zh) * 2023-06-14 2023-09-05 山东海博科技信息系统股份有限公司 一种用于无人场景的视频处理方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120230588A1 (en) * 2009-11-13 2012-09-13 JVC Kenwood Corporation Image processing device, image processing method and image processing program
CN109508406A (zh) * 2018-12-12 2019-03-22 北京奇艺世纪科技有限公司 一种信息处理方法、装置及计算机可读存储介质
CN110309353A (zh) * 2018-02-06 2019-10-08 上海全土豆文化传播有限公司 视频索引方法及装置
CN110401873A (zh) * 2019-06-17 2019-11-01 北京奇艺世纪科技有限公司 视频剪辑方法、装置、电子设备和计算机可读介质
CN111177470A (zh) * 2019-12-30 2020-05-19 深圳Tcl新技术有限公司 视频的处理方法、视频的搜索方法及终端设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341152B1 (en) * 2006-09-12 2012-12-25 Creatier Interactive Llc System and method for enabling objects within video to be searched on the internet or intranet
US9099161B2 (en) * 2011-01-28 2015-08-04 Apple Inc. Media-editing application with multiple resolution modes
CN102682308B (zh) * 2011-03-17 2014-05-28 株式会社理光 图像处理方法和图像处理设备
US8515241B2 (en) * 2011-07-07 2013-08-20 Gannaway Web Holdings, Llc Real-time video editing
US9440152B2 (en) * 2013-05-22 2016-09-13 Clip Engine LLC Fantasy sports integration with video content
CN103914561B (zh) * 2014-04-16 2018-04-13 北京酷云互动科技有限公司 一种图像搜索方法和装置
CN110582025B (zh) * 2018-06-08 2022-04-01 北京百度网讯科技有限公司 用于处理视频的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120230588A1 (en) * 2009-11-13 2012-09-13 JVC Kenwood Corporation Image processing device, image processing method and image processing program
CN110309353A (zh) * 2018-02-06 2019-10-08 上海全土豆文化传播有限公司 视频索引方法及装置
CN109508406A (zh) * 2018-12-12 2019-03-22 北京奇艺世纪科技有限公司 一种信息处理方法、装置及计算机可读存储介质
CN110401873A (zh) * 2019-06-17 2019-11-01 北京奇艺世纪科技有限公司 视频剪辑方法、装置、电子设备和计算机可读介质
CN111177470A (zh) * 2019-12-30 2020-05-19 深圳Tcl新技术有限公司 视频的处理方法、视频的搜索方法及终端设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4086786A4 *

Also Published As

Publication number Publication date
EP4086786A4 (en) 2024-02-21
CN111177470A (zh) 2020-05-19
CN111177470B (zh) 2024-04-30
US20230044146A1 (en) 2023-02-09
US12001479B2 (en) 2024-06-04
EP4086786A1 (en) 2022-11-09

Similar Documents

Publication Publication Date Title
WO2021135286A1 (zh) 视频的处理方法、视频的搜索方法、终端设备及计算机可读存储介质
CN107797984B (zh) 智能交互方法、设备及存储介质
CN109146892B (zh) 一种基于美学的图像裁剪方法及装置
KR20210144625A (ko) 영상 데이터 처리 방법, 장치 및 판독 가능 저장 매체
CN106682632B (zh) 用于处理人脸图像的方法和装置
CN111160264B (zh) 一种基于生成对抗网络的漫画人物身份识别方法
KR102124466B1 (ko) 웹툰 제작을 위한 콘티를 생성하는 장치 및 방법
CN104170374A (zh) 在视频会议期间修改参与者的外观
CN110796089B (zh) 用于训练换脸模型的方法和设备
CN110288513B (zh) 用于改变人脸属性的方法、装置、设备和存储介质
CN107133567B (zh) 一种创可贴广告点位选取方法及装置
CN109886223B (zh) 人脸识别方法、底库录入方法、装置及电子设备
US20220392128A1 (en) Beauty processing method, electronic device, and computer-readable storage medium
CN114723888B (zh) 三维发丝模型生成方法、装置、设备、存储介质及产品
CN116308530A (zh) 一种广告植入方法、装置、设备和可读存储介质
CN114677402A (zh) 海报文本布局、海报生成方法及相关装置
CN113850083A (zh) 确定播报风格的方法、装置、设备和计算机存储介质
CN112973122A (zh) 游戏角色上妆方法、装置及电子设备
CN110580297A (zh) 基于菜品图像的商户及菜品匹配方法、装置、电子设备
CN114742991A (zh) 海报背景图像选取、模型训练、海报生成方法及相关装置
CN111046232B (zh) 一种视频分类方法、装置及系统
CN110781345B (zh) 视频描述生成模型的获取方法、视频描述生成方法及装置
JP2021043841A (ja) 仮想キャラクタ生成装置及びプログラム
CN112714362B (zh) 确定属性的方法、装置、电子设备和介质
CN113269141B (zh) 一种图像处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20908901

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020908901

Country of ref document: EP

Effective date: 20220801