WO2019023953A1 - Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent - Google Patents

Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent Download PDF

Info

Publication number
WO2019023953A1
WO2019023953A1 PCT/CN2017/095540 CN2017095540W WO2019023953A1 WO 2019023953 A1 WO2019023953 A1 WO 2019023953A1 CN 2017095540 W CN2017095540 W CN 2017095540W WO 2019023953 A1 WO2019023953 A1 WO 2019023953A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
portrait
feature
character
picture
Prior art date
Application number
PCT/CN2017/095540
Other languages
English (en)
Chinese (zh)
Inventor
覃桐
Original Assignee
深圳传音通讯有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳传音通讯有限公司 filed Critical 深圳传音通讯有限公司
Priority to PCT/CN2017/095540 priority Critical patent/WO2019023953A1/fr
Publication of WO2019023953A1 publication Critical patent/WO2019023953A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • the present invention relates to the field of smart devices, and in particular, to a video editing method and a video editing system based on a smart terminal.
  • the image recognition algorithm distinguishes the back of the target person from the front picture, separately compares and provides a method of video editing in combination with the user's needs, and is applicable not only to the video program but also to the life of the user.
  • Video you can edit a video containing a specific character, or you can edit a video that contains multiple characters, or a video that does not contain a certain character, and provide a video containing the back of the target person, saving time and accuracy. And by filtering the extracted content by interacting with the user, again improving the accuracy.
  • an object of the present invention is to provide a video editing method and a video editing system based on a smart terminal, which can perform a dedicated video clip with or without a certain character or a plurality of characters according to the needs of the user, and Fast and convenient, high precision and time saving.
  • the invention discloses a video editing method based on a smart terminal, comprising the following steps:
  • the step of acquiring a person portrait picture having a portrait element and extracting the person portrait feature of the person portrait element comprises:
  • the body contour feature of the character portrait element is extracted as a first body contour feature
  • the facial portrait feature of the person portrait element is extracted as a first facial portrait feature
  • the step of acquiring the video clip of the character to be clipped that matches the character portrait feature comprises:
  • the video clip or the remaining video clips other than the video clip in the video to be clipped are spelled
  • the steps to follow include:
  • the step of acquiring a video segment of the character in the video to be clipped that matches the character portrait feature is splicing the remaining video segments other than the video segment in the video segment or the video to be clipped between the steps, the video editing method further includes:
  • the video clip is pushed to the user and filtered by the user to remove irrelevant video clips.
  • the invention also discloses a video editing system based on a smart terminal, comprising:
  • a video acquisition module which acquires a video file to be edited and stores the video file in the smart terminal
  • a portrait feature extraction module acquiring a portrait image of a person having a portrait element, and extracting a portrait feature of the portrait element
  • a video segment acquisition module configured to connect to the video acquisition module and the portrait feature extraction module, and acquire a video segment of the to-be-edited video that includes a character that matches the character portrait feature;
  • the video splicing module is connected to the video segment obtaining module to splicing the video clip or the remaining video segments except the video segment in the video to be clipped.
  • the portrait feature extraction module comprises:
  • a picture obtaining unit which acquires a portrait picture of a person having a portrait element and stores it in the smart terminal;
  • a portrait element identification unit connected to the picture acquisition unit to identify a person portrait element in the person portrait picture
  • a portrait feature extraction unit coupled to the portrait element recognition unit, extracting a body profile feature of the person portrait element as a first body profile feature, and extracting a face portrait feature of the person portrait element as a first face portrait feature .
  • the video segment obtaining module includes:
  • An element extracting unit splits the video to be clipped, obtains a frame of each frame, and extracts a pair of portrait elements in the frame of each frame, including a character back element and a character face element;
  • a feature extraction unit connected to the element extraction unit, extracting a body contour feature of the character back image element as a second body shape contour feature, and extracting a face portrait feature of the character face element as a second face portrait feature;
  • a back image acquisition unit connected to the feature extraction unit, and comparing the second body contour feature with the first body contour feature, and acquiring the second when the similarity is greater than or equal to the first similarity threshold
  • the picture corresponding to the figure outline feature is a character back picture
  • a front view acquiring unit connected to the feature extracting unit, comparing the second facial portrait feature with the first facial portrait feature, and acquiring the second facial when the similarity is greater than or equal to a second similarity threshold
  • the picture corresponding to the portrait feature is the front view of the character
  • the cutting unit is connected to the back view picture acquiring unit and the front picture acquiring unit, and cuts the character back picture from the character front picture from the to-be-edited video to form a video segment.
  • the video splicing module comprises:
  • a separating unit separating audio information and video information in the video segment to form an audio portion and a video portion
  • a splicing unit connected to the separating unit, splicing the audio portion and the video portion separately to form a complete audio portion and a complete video portion;
  • a synchronization unit coupled to the tiling unit to synchronize the complete audio portion with the complete video portion.
  • the video editing system further includes:
  • a video clip screening module that pushes the video clip to a user, and the user filters the video to eliminate irrelevant video Fragment.
  • the back of the target person can be identified to obtain a video clip containing or not including a character or a plurality of characters;
  • FIG. 1 is a flow chart showing a video editing method in accordance with a preferred embodiment of the present invention
  • FIG. 2 is a flow chart showing a method for extracting a portrait feature of a video editing method in accordance with a preferred embodiment of the present invention
  • FIG. 3 is a flow chart showing a method for acquiring a video clip by a video editing method in accordance with a preferred embodiment of the present invention
  • FIG. 4 is a schematic flow chart of a method for splicing a video clip or a video clip other than a video clip in a video clip or a video to be clipped according to a preferred embodiment of the present invention
  • FIG. 5 is a schematic flow chart of a video editing method according to another preferred embodiment of the present invention.
  • Figure 6 is a block diagram showing the structure of a video editing system in accordance with a preferred embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a portrait feature extraction module of a video editing system in accordance with a preferred embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a video clip acquiring module of a video editing system in accordance with a preferred embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a video splicing module of a video editing system in accordance with a preferred embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a system of a video editing system in accordance with another preferred embodiment of the present invention.
  • the mobile terminal can be implemented in various forms.
  • the terminal described in the present invention may include a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP (Portable Multimedia Player), a navigation device, and the like, and such as Fixed terminal for digital TV, desktop computer, etc.
  • a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP (Portable Multimedia Player), a navigation device, and the like, and such as Fixed terminal for digital TV, desktop computer, etc.
  • PDA Personal Digital Assistant
  • PAD Tablet
  • PMP Portable Multimedia Player
  • FIG. 1 is a schematic flowchart of a video clip method based on a smart terminal according to a preferred embodiment of the present invention.
  • the video editing method specifically includes the following steps:
  • S100 Acquire a video file to be edited and store it in the smart terminal.
  • the video file to be edited In order to implement the video clip, the video file to be edited must first be obtained.
  • the method for obtaining the video file to be edited includes importing the video in the smart terminal, and importing the video from outside the smart terminal and storing it in the smart terminal.
  • the video file to be edited imported here must contain the target person that the user wants to edit. If the user imports the video error, that is, does not include the target person who wants to edit, there will be no result when the subsequent image recognition acquires the video clip, and Remind users that they have not obtained the relevant video. Please check if the clip video file or the target portrait image is imported incorrectly.
  • the method of obtaining a portrait picture includes not only importing the picture in the smart terminal, but also importing the picture from outside the smart terminal and storing it in the smart terminal.
  • the portrait image imported here must be closely integrated with the user's needs. If the user needs a frontal portrait video clip of the target person, the user-provided portrait image must contain the facial element of the target person, if the user needs a video containing the target person's back. Fragment, the user-provided portrait image must contain the back of the target character.
  • the user-provided portrait image must also be a picture containing the target person alone; when the number of target characters is greater than one
  • the user needs to provide the corresponding portrait image of the target person, and no other person except the target person may appear in all the pictures, but may be a portrait picture of the person who simultaneously includes only a plurality of target characters.
  • the characters of the picture in the video need to be compared according to the extracted portrait feature to obtain a video segment containing the character corresponding to the extracted portrait feature.
  • it is necessary to establish a strategy according to the needs of the user including the distinction between the frontal image of the target person and the back image of the target person, the distinction between the number of target characters, and the distinction between the target person and other characters.
  • the video clip that the user needs is a screen containing only the front portrait of the target person, only the front portrait of the target person needs to be obtained. If the video clip that the user needs is only the image of the target person's back, only the target person needs to be acquired.
  • the picture of the back if both are needed, there should be a logical OR relationship between the picture containing the frontal portrait of the target person and the picture containing the back of the target person.
  • the relationship between the characters needs to be considered.
  • the relationship between the portrait features of each target character should be logical and when the user needs When any of the target characters appears, the relationship between the portrait features of each target person should be logical or.
  • the logical relationship between the target characters can be determined by the user's needs, such as two of them. Logic and relationship, the other is a logical OR relationship with the two. Take TV dramas as an example.
  • the two characters should appear at the same time.
  • the relationship between the two should be logical and the only picture of both characters appears. Both the back and the front.
  • the number of characters extracted in the picture should be consistent with the number of target characters in the picture.
  • the process of obtaining a video clip of a character to be clipped that matches the character of the target person is as follows: the clipped video is framed, and each frame is acquired, through image transform technology, image enhancement technology, image recognition technology, and image segmentation technology. Extracting the portrait elements in each frame, and extracting the portrait features in the portrait elements by sampling, and comparing them with the portrait elements extracted from the target figures, if the two match, the picture That is, the picture containing the target person, the connected picture forms a video segment.
  • the clip After obtaining the video clip of the character to be clipped that matches the character of the target person, the clip needs to be stitched, and the stitching should be in a certain order, either in chronological order or according to the characters in the screen. More or less in the order, in which the change of characters from small to many is the change from only the target person to the other person, from the opposite to the other, it can also be small according to the proportion of the target person in the video picture. In the order of large or small to large, the latter two sequences should be supplemented by chronological order. For example, the order of the target person's proportion of the video screen is taken as an example.
  • the ratio of the character to the video screen can be calculated by dividing the area of the portrait element in the video screen by the area of the video screen, and calculating the ratio. It should be done after identifying each frame of the picture.
  • the proportion of the character of a certain picture is relatively high, if the picture is in a selected video segment, the video segment is regarded as one body, and the time ratio is spliced regardless of the proportion of the other pictures in the video segment.
  • the user can quickly and accurately obtain the exclusive video with or without a certain character or a plurality of characters according to the user's needs, and can accurately identify the video segment containing the back of the target person.
  • the step of acquiring a portrait image of a person with a portrait element and extracting the portrait feature of the portrait element includes:
  • S201 Acquire a portrait picture of a person having a portrait element and store it in the smart terminal.
  • the method of obtaining the portrait image includes both importing the image in the smart terminal and importing from outside the smart terminal.
  • the picture is stored in the smart terminal.
  • image transformation such as Fourier transform, Wo The Ershi-Adama transform and the discrete Kafner-Levy transform transform the image from the time domain to the frequency domain, and then enhance the high-frequency abrupt components in the frequency domain image by image enhancement technology to enhance the edge of the image and enhance the edge of the image.
  • image recognition technology is needed to extract the portrait elements in the picture by extracting features, establishing indexes, and query steps, and extracting the portrait elements by image segmentation technology.
  • the extraction feature operation here is based on the external corresponding portrait database, and the recognition model of different portrait elements is established by sampling the portrait elements in the portrait database to distinguish different portrait elements, such as a large number of facial portraits in the database.
  • the recognition model for creating a facial portrait is sampled, and the model is used for recognition in the process of recognizing a portrait.
  • the portion is considered to be a facial portrait element.
  • the portrait feature needs to be extracted.
  • the portrait image should be distinguished.
  • the outline feature of the character's back should be extracted, including the body contour.
  • the proportion of each part and other features form a first figure outline feature;
  • the facial portrait features of the frontal portrait of the person should be extracted, including facial skin color, facial features, size, position distance Relationships and features with recognizable features on the face, such as black squats in the corners of the mouth, can also be uniformly sampled on the face and the size of the face portrait in the picture.
  • acquiring the video to be edited includes matching the character portrait feature.
  • the steps of the video clip of the character may specifically include:
  • S301 split the video to be clipped, obtain a frame of each frame, and extract a to-be-obtained portrait element in each frame, including a character back element and a character face element.
  • the video In order to identify the portrait of a person in a clip video, it is necessary to first frame the video to form a frame of one frame and one frame, and extract the portrait element in each frame, first through image transformation, such as Fourier transform.
  • image transformation such as Fourier transform.
  • the Walsh-Hadamard transform and the discrete Kafner-Levy transform transform the image from the time domain to the frequency domain, and then enhance the high-frequency abrupt component in the frequency domain image by image enhancement technology to enhance the image edge and image edge.
  • image enhancement technology After being enhanced, it is necessary to identify the portrait elements in the image by extracting features, indexing and query steps through image recognition technology, and finally extracting the portrait elements by image segmentation technology, and the extracted portrait elements include the characters of the characters and the facial elements of the characters. .
  • the extraction feature operation here is based on the external corresponding portrait database, and the recognition model of different portrait elements is established by sampling the portrait elements in the portrait database to distinguish different portrait elements, such as a large number of facial portraits in the database.
  • the recognition model for creating a facial portrait is sampled, and the model is used for recognition in the process of recognizing a portrait.
  • the portion is considered to be a facial portrait element.
  • the extraction method here should be The method of extracting the first figure outline feature of the character is consistent; after extracting the face part of the character in the picture of the video to be edited, the face part of the character is sampled, and the facial skin color, the facial features shape, the position distance relationship, and the face are extracted.
  • the step of deleting the sunglasses portion includes forming features such as contour lines and colors by sampling a large number of sunglasses database.
  • the constructed sunglasses model is indexed by the sunglasses model indexing.
  • S303 Align the second body contour feature with the first body contour feature, and obtain a picture corresponding to the second body contour feature when the similarity is greater than or equal to the first similarity threshold.
  • the first body contour feature is indexed according to the first body contour feature, and the second body contour feature is scaled to the same size as the first body contour feature, and according to the index pair
  • the scaled second body contour feature is sampled and searched, such as whether the body contour line is consistent, whether the proportion of each part of the body is consistent, etc., and the first similarity threshold is 90%, and the coincidence degree is greater than or equal to the first similarity.
  • the threshold value is considered to be greater than or equal to the first similarity threshold value, and the person corresponding to the second figure-shaped contour feature and the first figure-shaped contour feature is considered to be The same person obtains a picture corresponding to the contour feature of the second figure, and is a picture of the character's back.
  • the standard of the first similarity threshold can be adjusted up and down to meet a certain recognition accuracy. Since the back recognition is difficult and easy to identify errors, setting a higher similarity threshold is beneficial to improve the accuracy. To prevent the missing picture when the similarity threshold is too high, the similarity can be set between 85% and 90%. This screen pops up, and the user selects whether to obtain the screen, thereby reducing the omission of the screen.
  • S304 Align the second facial portrait feature with the first facial portrait feature, and obtain a screen corresponding to the second facial portrait feature when the similarity is greater than or equal to the second similarity threshold.
  • indexing is performed according to the first facial portrait feature, and the second facial portrait feature is scaled to the same size as the first facial portrait feature, and is scaled according to the index pair
  • the second facial portrait feature is compared, such as whether the facial skin color is the same, whether the facial shape shape, the position distance relationship are consistent, and the facial recognition feature, such as whether the black corner of the mouth is the same, etc., and the second similarity threshold is taken.
  • the second facial portrait feature is considered to be the first facial portrait
  • the similarity of the feature is greater than or equal to the second similarity threshold, and the person corresponding to the second facial portrait feature is considered to be the same person as the first facial portrait feature, and the image corresponding to the second facial portrait feature is obtained as the front of the character.
  • the standard of the second similarity threshold can be adjusted up and down to meet a certain recognition accuracy. Since the facial portrait features are more and the alignment is more accurate, the facial portrait feature comparison threshold, that is, the second similarity threshold, is slightly lower than the body contour feature comparison threshold, that is, the first similarity threshold.
  • the screen can be popped up when the similarity is between 80% and 85%. The user selects whether to obtain the picture, thereby reducing the omission of the picture.
  • the continuous frame picture is regarded as an integral cut from the video to be clipped to form a video segment.
  • the step of splicing the video clips or the remaining video clips other than the video clips in the video to be clipped includes:
  • S401 Separating the audio information and the video information in the remaining video segments except the video segment in the video clip or the video to be clipped to form an audio portion and a video portion.
  • the audio information in each clip needs to be separated from the video information, where the video information does not include audio information, and the audio information needs to be recorded.
  • the positional relationship of the video information, and the audio information and the video information are extracted to form an audio part and a video part.
  • the audio portion and the video portion unit need to be sequentially stitched together to form a complete audio portion composed entirely of the audio portion and a complete video portion composed entirely of the video portion.
  • the complete audio portion is synchronized with the complete video portion according to the recorded positional relationship between the audio information and the video information to form a final complete video.
  • S500 Push the video clip to a user, and the user performs screening to remove irrelevant video clips.
  • the user can perform screening by pushing the obtained video clip to the user, and can perform the delete operation to eliminate the discriminating unrelated video clip.
  • a smart terminal-based video editing system 100 in accordance with a preferred embodiment of the present invention specifically includes the following components:
  • the video capture module 11 In order to implement the video clip, the video capture module 11 must first obtain the video file to be edited, and the method for obtaining the video file to be edited includes not only importing the video in the smart terminal, but also importing the video from outside the smart terminal and storing it in the smart terminal. Inside.
  • the video file to be edited imported here must contain the target person that the user wants to edit. If the user imports the video error, that is, does not include the target person who wants to edit, there will be no result when the subsequent image recognition acquires the video clip, and Remind users that they have not obtained the relevant video. Please check if the clip video file or the target portrait image is imported incorrectly.
  • the portrait feature extraction module 13 is configured to acquire a character portrait image having a portrait element and extract a character portrait of the portrait element in the image after the video to be clipped is acquired. Sign.
  • the method of obtaining a portrait picture includes not only importing the picture in the smart terminal, but also importing the picture from outside the smart terminal and storing it in the smart terminal.
  • the portrait image imported here must be closely integrated with the user's needs. If the user needs a frontal portrait video clip of the target person, the user-provided portrait image must contain the facial element of the target person, if the user needs a video containing the target person's back. Fragment, the user-provided portrait image must contain the back of the target character.
  • the user-provided portrait image must also be a picture containing the target person alone; when the number of target characters is greater than one
  • the user needs to provide the corresponding portrait image of the target person, and no other person except the target person may appear in all the pictures, but may be a portrait picture of the person who simultaneously includes only a plurality of target characters.
  • the video segment obtaining module 12 is connected to the video acquiring module 11 and the portrait feature extracting module 13 to obtain a to-be-edited video file and a portrait image and extract the character portrait feature, and then needs to be based on the extracted character portrait feature in the video.
  • the characters of the screen are compared to obtain a video clip containing the character corresponding to the extracted portrait feature.
  • it is necessary to establish a strategy according to the needs of the user including the distinction between the frontal image of the target person and the back image of the target person, the distinction between the number of target characters, and the distinction between the target person and other characters.
  • the video clip that the user needs is a screen containing only the front portrait of the target person, only the front portrait of the target person needs to be obtained.
  • the video clip that the user needs is only the image of the target person's back, only the target person needs to be acquired.
  • the picture of the back if both are needed, there should be a logical OR relationship between the picture containing the frontal portrait of the target person and the picture containing the back of the target person.
  • the relationship between the characters needs to be considered.
  • the relationship between the portrait features of each target character should be logical and when the user needs When any of the target characters appears, the relationship between the portrait features of each target person should be logical or.
  • the logical relationship between the target characters can be determined by the user's needs, such as two of them.
  • the process of obtaining a video clip of a character to be clipped that matches the character of the target person is as follows: the clipped video is framed, and each frame is acquired, through image transform technology, image enhancement technology, image recognition technology, and image segmentation technology. Extracting the portrait elements in each frame, and extracting the portrait features in the portrait elements by sampling, and comparing them with the portrait elements extracted from the target figures, if the two match, the picture That is, the picture containing the target person, the connected picture forms a video segment.
  • the video splicing module 14 is connected to the video segment obtaining module 12, and after acquiring the video segment of the character to be clipped that matches the character of the target person, the spliced segment needs to be spliced, and the splicing should be in a certain order. It can be in chronological order or in the order of the characters in the picture from less to more or from more to less. The change of characters from small to many is the change from only the target person to other people, from more to less. Conversely, the order of the target characters in the video screen may be in the order of small to large or large to small, and the latter two sequences shall be supplemented by chronological order. For example, the order of the target person's proportion of the video screen is taken as an example.
  • the ratio of the character to the video screen can be calculated by dividing the area of the portrait element in the video screen by the area of the video screen, and calculating the ratio. It should be done after identifying each frame of the picture.
  • the proportion of the character of a certain picture is high, if the picture is in a selected video segment, the video segment is regarded as one body, regardless of the proportion of other pictures in the video segment. Low, all spliced in chronological order, preventing the splicing of the picture by the ratio splicing to cause the picture to be broken. It is equivalent to comparing the maximum ratio of the pictures in the video clip, and splicing in the order of the largest ratio. At the same time, when the ratio is the same, it is also spliced in chronological order.
  • the portrait feature extraction module 13 specifically includes:
  • the image acquisition unit obtains the video to be edited, in order to realize the video clip centered on the target person, it is necessary to obtain the portrait image of the person having the portrait element, and the manner of obtaining the portrait image includes both the image imported into the smart terminal and the The smart terminal externally imports the picture and stores it in the smart terminal.
  • the portrait element identification unit is connected to the picture acquisition unit, and after acquiring the person portrait picture having the person portrait element, since there may be a background with interference factors in the picture, the person portrait element in the picture needs to be extracted, here Firstly, the image is transformed from the time domain to the frequency domain by image transform, such as Fourier transform, Walsh-Hadamard transform and discrete Kafner-Levy transform, and then the image in the frequency domain image is high.
  • image transform such as Fourier transform, Walsh-Hadamard transform and discrete Kafner-Levy transform
  • the frequency mutation component is strengthened, and the edge of the image is strengthened.
  • the image recognition technology is used to extract the character portrait element in the image by extracting the feature, establishing the index build and the query step, and extracting the portrait element by the image segmentation technology.
  • the extraction feature operation here is based on the external corresponding portrait database, and the recognition model of different portrait elements is established by sampling the portrait elements in the portrait database to distinguish different portrait elements, such as a large number of facial portraits in the database.
  • the recognition model for creating a facial portrait is sampled, and the model is used for recognition in the process of recognizing a portrait.
  • the portion is considered to be a facial portrait element.
  • a portrait feature extraction unit is connected to the portrait element recognition unit, and after extracting the portrait element, the portrait feature needs to be extracted, where the portrait image should be distinguished, and when the portrait element in the portrait image is the back of the character,
  • the silhouette of the figure of the character should be extracted, including the contour of the body, the proportion of each part, etc., to form the first figure outline feature; when the portrait element in the portrait picture is the front portrait of the character, the front portrait of the character should be extracted.
  • the facial portrait features including the facial skin color, the size of the facial features, the positional distance relationship, and the features of the facial recognition, such as the black scorpion of the corner of the mouth, can also be uniformly sampled on the face and record the size of the face portrait in the picture.
  • the video segment obtaining module 12 specifically includes:
  • the element extraction unit in order to recognize the portrait of the person in the clip video, needs to first frame the video to form a frame of one frame and one frame, and extract the portrait element in each frame, first through image transformation, such as Fourier transform, Walsh-Hadamard transform and discrete Kafner-Levy transform transform the image from time domain to frequency domain, and then enhance the high frequency mutation component in the frequency domain image by image enhancement technology to enhance the image.
  • image transformation such as Fourier transform, Walsh-Hadamard transform and discrete Kafner-Levy transform transform the image from time domain to frequency domain
  • image enhancement technology to enhance the image.
  • Edge after the image edge is strengthened, it is necessary to identify the portrait element in the image by extracting features, building an index build and query step through image recognition technology, and finally extracting the portrait element by image segmentation technology, and the extracted character portrait element includes the character back view.
  • image transformation such as Fourier transform, Walsh-Hadamard transform and discrete Kafner-Levy transform transform the image from time domain to frequency domain
  • the extraction feature operation here is based on the external corresponding portrait database, and the recognition model of different portrait elements is established by sampling the portrait elements in the portrait database to distinguish different portrait elements, such as a large number of facial portraits in the database.
  • the recognition model for creating a facial portrait is sampled, and the model is used for recognition in the process of recognizing a portrait.
  • the portion is considered to be a facial portrait element.
  • the feature extraction unit is connected to the element extraction unit, and after extracting the back view element of the character in the picture of the video to be edited, the back view element of the character is sampled, and the outline of the body, the proportion of each part, and the like are extracted to form a second shape.
  • the outline contour feature, the extraction method here should be consistent with the method of extracting the first figure outline feature of the character; after extracting the face element of the character in the picture of the video to be edited, the face element of the character needs to be sampled, and the extraction includes Facial skin color, facial shape shape, position distance relationship, and facial recognition features, such as the corner of the mouth
  • the facial portrait features of features such as black scorpion form a second facial portrait feature
  • the extraction method here should be consistent with the method of extracting the first facial portrait feature of the character.
  • the target person may wear sunglasses, so it is necessary to delete the sunglasses portion when extracting the facial portrait feature, and only the remaining facial portrait portion is considered.
  • the step of deleting the sunglasses portion includes forming features such as contour lines and colors by sampling a large number of sunglasses database.
  • the constructed sunglasses model is indexed by the sunglasses model indexing.
  • the part matching the sunglasses model is inquired, the part is considered to be the sunglasses part, and the part is deleted.
  • a back image acquisition unit is connected to the feature extraction unit, and after acquiring the second body contour feature and the first body contour feature, the index is established according to the first body contour feature, and the second body contour feature is scaled to The shape of the contour is the same size, and the scaled second contour contour feature is sampled according to the index, such as whether the body contour line is consistent, whether the proportion of each part of the body is consistent, etc., taking the first similarity threshold 90%, when the degree of coincidence is greater than or equal to the first similarity threshold, the similarity between the second body contour feature and the first body contour feature is considered to be greater than or equal to the first similarity threshold, and the second body contour feature is considered to correspond to The character corresponding to the first figure contour feature is the same person, and the picture corresponding to the second body shape contour feature is obtained, which is a character back picture.
  • the standard of the first similarity threshold can be adjusted up and down to meet a certain recognition accuracy. Since the back recognition is difficult and easy to identify errors, setting a higher similarity threshold is beneficial to improve the accuracy. To prevent the missing picture when the similarity threshold is too high, the similarity can be set between 85% and 90%. This screen pops up, and the user selects whether to obtain the screen, thereby reducing the omission of the screen.
  • a front picture acquiring unit connected to the feature extracting unit, after acquiring the second facial portrait feature and the first facial portrait feature, indexing according to the first facial portrait feature, and scaling the second facial portrait feature to the first face
  • the portrait features the same size, and the scaled second facial portrait features are compared according to the index, such as whether the facial skin color is the same, the facial features, the positional distance relationship, and the facial recognition feature, such as the black corner of the mouth.
  • the second similarity threshold is 85%, and when the degree of coincidence is greater than or equal to the second similarity threshold, the similarity between the second facial portrait feature and the first facial portrait feature is considered to be greater than or equal to the second similarity.
  • the degree threshold is that the person corresponding to the second facial portrait feature and the person corresponding to the first facial portrait feature are the same person, and the screen corresponding to the second facial portrait feature is acquired as the front view of the character.
  • the standard of the second similarity threshold can be adjusted up and down to meet a certain recognition accuracy. Since the facial portrait features are more and the alignment is more accurate, the facial portrait feature comparison threshold, that is, the second similarity threshold, is slightly lower than the body contour feature comparison threshold, that is, the first similarity threshold. Similarly, for the comparison of facial portrait features, setting a higher similarity threshold is beneficial to improve the accuracy.
  • the screen can be popped up when the similarity is between 80% and 85%. The user selects whether to obtain the picture, thereby reducing the omission of the picture.
  • the cutting unit is connected to the back image acquiring unit and the front screen acquiring unit, and after acquiring the back image of the character and the front view of the character, whether the image of the adjacent frame is also acquired, when the image of the adjacent frame is acquired.
  • the continuous frame picture is regarded as an integral cut from the video to be clipped to form a video segment.
  • the video splicing module 14 specifically includes:
  • the separating unit after acquiring the video clip or the remaining video clips except the video clip in the video to be clipped, separates the audio information in each segment from the video information, where the video information does not include the audio information, and needs to be recorded.
  • the positional relationship between the audio information and the video information, and the audio information and the video information are extracted to form an audio part and a video part.
  • the audio portion and the video portion unit need to be spliced in order to form a complete audio portion composed entirely of the audio portion and a complete video portion composed entirely of the video portion.
  • the synchronization unit after acquiring the complete audio part and the complete video part, synchronizes the complete audio part with the complete video part according to the recorded position relationship of the audio information and the video information to form a final complete video.
  • the video editing system 100 further includes the following components:
  • the video segment screening module 15 obtains the video segment, and further improves the accuracy. By pushing the obtained video segment to the user, the user performs screening, and the deletion operation may be performed to remove the unrelated video segment that identifies the error.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

L'invention concerne un procédé d'édition de vidéo et un système d'édition de vidéo basés sur un terminal intelligent. Le procédé d'édition de vidéo comprend les étapes suivantes : acquisition d'un fichier vidéo en vue de l'édition et du stockage de celui-ci dans un terminal intelligent ; acquisition d'une image de portrait ayant un élément de portrait et extraction d'une caractéristique de portrait de l'élément de portrait ; acquisition de clips vidéo dans la vidéo en vue de l'édition contenant une personne correspondant à la caractéristique de portrait ; assemblage des clips vidéo acquis ou des clips vidéo restants dans la vidéo à des fins d'édition autres que les clips vidéo acquis. La solution technique ci-dessus peut réaliser une édition intelligente conformément aux besoins d'un utilisateur en utilisant une image de portrait importée par l'utilisateur pour former une vidéo complète contenant ou ne contenant pas une certaine personne ou certaines personnes et fournir une interface d'interaction pour que l'utilisateur sélectionne de nouveau les clips vidéo extraits, améliorant ainsi la précision.
PCT/CN2017/095540 2017-08-02 2017-08-02 Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent WO2019023953A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095540 WO2019023953A1 (fr) 2017-08-02 2017-08-02 Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095540 WO2019023953A1 (fr) 2017-08-02 2017-08-02 Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent

Publications (1)

Publication Number Publication Date
WO2019023953A1 true WO2019023953A1 (fr) 2019-02-07

Family

ID=65232285

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095540 WO2019023953A1 (fr) 2017-08-02 2017-08-02 Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent

Country Status (1)

Country Link
WO (1) WO2019023953A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953919A (zh) * 2019-05-17 2020-11-17 成都鼎桥通信技术有限公司 视频单呼中手持终端的视频录制方法和装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521565A (zh) * 2011-11-23 2012-06-27 浙江晨鹰科技有限公司 低分辨率视频的服装识别方法及系统
JP2013196518A (ja) * 2012-03-21 2013-09-30 Casio Comput Co Ltd 画像処理装置、画像処理方法及びプログラム
CN103577063A (zh) * 2012-07-23 2014-02-12 Lg电子株式会社 移动终端及其控制方法
CN103827913A (zh) * 2011-09-27 2014-05-28 三星电子株式会社 用于在便携式终端中剪辑和共享内容的装置和方法
CN104820711A (zh) * 2015-05-19 2015-08-05 深圳久凌软件技术有限公司 复杂场景下对人形目标的视频检索方法
CN106021496A (zh) * 2016-05-19 2016-10-12 海信集团有限公司 视频搜索方法及视频搜索装置
CN106534967A (zh) * 2016-10-25 2017-03-22 司马大大(北京)智能系统有限公司 视频剪辑方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103827913A (zh) * 2011-09-27 2014-05-28 三星电子株式会社 用于在便携式终端中剪辑和共享内容的装置和方法
CN102521565A (zh) * 2011-11-23 2012-06-27 浙江晨鹰科技有限公司 低分辨率视频的服装识别方法及系统
JP2013196518A (ja) * 2012-03-21 2013-09-30 Casio Comput Co Ltd 画像処理装置、画像処理方法及びプログラム
CN103577063A (zh) * 2012-07-23 2014-02-12 Lg电子株式会社 移动终端及其控制方法
CN104820711A (zh) * 2015-05-19 2015-08-05 深圳久凌软件技术有限公司 复杂场景下对人形目标的视频检索方法
CN106021496A (zh) * 2016-05-19 2016-10-12 海信集团有限公司 视频搜索方法及视频搜索装置
CN106534967A (zh) * 2016-10-25 2017-03-22 司马大大(北京)智能系统有限公司 视频剪辑方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953919A (zh) * 2019-05-17 2020-11-17 成都鼎桥通信技术有限公司 视频单呼中手持终端的视频录制方法和装置
CN111953919B (zh) * 2019-05-17 2022-11-04 成都鼎桥通信技术有限公司 视频单呼中手持终端的视频录制方法和装置

Similar Documents

Publication Publication Date Title
US10198820B1 (en) Object oriented image editing
Dhall et al. Emotion recognition in the wild challenge 2013
US9218367B2 (en) Method and interface for indexing related media from multiple sources
US6578040B1 (en) Method and apparatus for indexing of topics using foils
US9478054B1 (en) Image overlay compositing
US20170147869A1 (en) Digital image processing method and apparatus, and storage medium
WO2015101289A1 (fr) Procédé, appareil et système de gestion d'image
WO2021203823A1 (fr) Procédé et appareil de classification d'image, support d'enregistrement et dispositif électronique
US10229323B2 (en) Terminal and method for managing video file
KR20090010855A (ko) 인물 별로 디지털 컨텐츠를 분류하여 저장하는 시스템 및방법
WO2013060269A1 (fr) Procédé et dispositif d'établissement de relation d'association
CN105513007A (zh) 一种基于移动终端的拍照美颜方法、系统及移动终端
RU2667802C2 (ru) Способ и терминал сопоставления изображений по адресной книге
CN103604271A (zh) 一种基于智能冰箱的食品识别方法
WO2013049374A2 (fr) Numérisation de photographie par l'utilisation de photographie vidéo et de technologie de vision informatique
CN104756188A (zh) 基于自动的单词翻译改变嘴唇形状的装置及方法
CN110110147A (zh) 一种视频检索的方法及装置
TWI472936B (zh) 人物照片搜尋系統
JP2006079458A (ja) 画像伝送システム、画像伝送方法、及び画像伝送プログラム
CN105247606B (zh) 一种照片显示方法及用户终端
US20130308864A1 (en) Information processing apparatus, information processing method, computer program, and image display apparatus
WO2013152682A1 (fr) Procédé de marquage de sous-titres de vidéo d'actualités
WO2019023953A1 (fr) Procédé d'édition de vidéo et système d'édition de vidéo basés sur un terminal intelligent
Liu et al. Rule-based semantic summarization of instructional videos
US20110304644A1 (en) Electronic apparatus and image display method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17920352

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17920352

Country of ref document: EP

Kind code of ref document: A1