CN111556335A - Video sticker processing method and device - Google Patents

Video sticker processing method and device Download PDF

Info

Publication number
CN111556335A
CN111556335A CN202010297623.5A CN202010297623A CN111556335A CN 111556335 A CN111556335 A CN 111556335A CN 202010297623 A CN202010297623 A CN 202010297623A CN 111556335 A CN111556335 A CN 111556335A
Authority
CN
China
Prior art keywords
target
sticker
video
text
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010297623.5A
Other languages
Chinese (zh)
Inventor
林倩雅
夏天
何雷米一阳
陈斯
黄子汕
刘荣潺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Good Morning Technology Guangzhou Co ltd
Original Assignee
Good Morning Technology Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Good Morning Technology Guangzhou Co ltd filed Critical Good Morning Technology Guangzhou Co ltd
Priority to CN202010297623.5A priority Critical patent/CN111556335A/en
Priority to US16/935,167 priority patent/US11218648B2/en
Publication of CN111556335A publication Critical patent/CN111556335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/036Insert-editing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/09Recognition of logos

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video sticker processing method and device. The method comprises the following steps: respectively carrying out face recognition and voice recognition on a video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful; matching the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text; adding the target paster at the default position or the target position of the target video frame; wherein the target position is calculated from the face position data. The invention can automatically determine the target paster and the adding position thereof according to the face recognition result and the voice recognition result of the video to be processed, realizes intelligent selection and placement of the target paster and improves the processing efficiency of the video paster.

Description

Video sticker processing method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a method and a device for processing video stickers.
Background
Since video social contact becomes an emerging internet social contact mode, various video editing software is produced. To enhance the entertainment effect of the video, users often add stickers to the video using video editing software. In practical application, a user manually selects a target sticker from a sticker library according to personal preference requirements, manually selects a target video frame from video frames of a video, and manually adjusts the placement position of the target sticker after the target sticker is added to the target video frame, so that the target sticker is rendered and displayed in the target video frame in the video playing process. In the prior art, the video sticker needs to be processed by manual operation of a user, so that the processing time of the video sticker is increased, and the processing efficiency of the video sticker is low.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a video sticker processing method and a video sticker processing device, which can automatically determine a target sticker and an adding position thereof according to a face recognition result and a voice recognition result of a video to be processed, realize intelligent selection and placement of the target sticker, and improve the video sticker processing efficiency.
In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a video sticker processing method, including:
respectively carrying out face recognition and voice recognition on a video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful;
matching the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text;
adding the target paster at the default position or the target position of the target video frame; wherein the target position is calculated from the face position data.
Further, the face recognition and the voice recognition are respectively performed on the video to be processed to obtain face position data when the face recognition is successful, and obtain a voice recognition text when the voice recognition is successful, specifically:
sequentially carrying out face recognition on the video frames of the video to be processed, and obtaining the face position data of the corresponding video frame when the face recognition of one video frame is successful;
and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain the voice recognition text.
Further, matching the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text, specifically:
matching text words obtained by word segmentation processing of the voice recognition text with description texts of each sticker in the sticker library to obtain the target sticker;
and acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as the target video frame.
Further, the adding the target sticker at the default position or the target position of the target video frame further includes:
and when the appearance time of the target paster at the default position or the target position reaches a preset threshold value, removing the target paster.
Further, after the performing face recognition and voice recognition on the video to be processed respectively to obtain face position data when the face recognition is successful and obtaining a voice recognition text when the voice recognition is successful, the method further includes:
and adding the voice recognition text at the subtitle position of the target video frame.
In a second aspect, an embodiment of the present invention provides a video sticker processing apparatus, including:
the face and voice recognition module is used for respectively carrying out face recognition and voice recognition on the video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful;
the target paster obtaining module is used for matching the voice recognition text with the description text of each paster in the paster library to obtain a target paster and obtaining a target video frame according to the voice recognition text;
the target paster adding module is used for adding the target paster at the default position or the target position of the target video frame; wherein the target position is calculated from the face position data.
Further, the face recognition and the voice recognition are respectively performed on the video to be processed to obtain face position data when the face recognition is successful, and obtain a voice recognition text when the voice recognition is successful, specifically:
sequentially carrying out face recognition on the video frames of the video to be processed, and obtaining the face position data of the corresponding video frame when the face recognition of one video frame is successful;
and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain the voice recognition text.
Further, matching the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text, specifically:
matching text words obtained by word segmentation processing of the voice recognition text with description texts of each sticker in the sticker library to obtain the target sticker;
and acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as the target video frame.
Further, the target sticker adding module is further configured to remove the target sticker when the appearance duration of the target sticker at the default position or the target position reaches a preset threshold.
Furthermore, the video sticker processing device further comprises a voice recognition text adding module, wherein the voice recognition text adding module is used for performing face recognition and voice recognition on the video to be processed respectively so as to obtain face position data when the face recognition is successful, and adding the voice recognition text at the subtitle position of the target video frame after the voice recognition text is obtained when the voice recognition is successful.
The embodiment of the invention has the following beneficial effects:
the method comprises the steps of respectively carrying out face recognition and voice recognition on videos to be processed to obtain face position data when the face recognition is successful, obtaining voice recognition texts when the voice recognition is successful, further matching the voice recognition texts with description texts of all stickers in a sticker library to obtain target stickers, obtaining target video frames according to the voice recognition texts, adding the target stickers at default positions of the target video frames or target positions obtained by calculation according to the face position data, and finishing video sticker processing. Compared with the prior art, the embodiment of the invention carries out face recognition and voice recognition on the video to be processed, so that when the voice recognition is successful, the voice recognition text is matched with the description text of each sticker in the sticker library to obtain the target sticker, the target video frame is obtained according to the voice recognition text, when the face recognition is failed, the target sticker is added at the default position of the target video frame according to the default position preset aiming at the target sticker, when the face recognition is successful, the target position is calculated according to the face position data, and the target sticker is added at the target position of the target video frame. The embodiment of the invention can automatically determine the target paster and the adding position thereof according to the face recognition result and the voice recognition result of the video to be processed, realize intelligent selection and placement of the target paster and improve the processing efficiency of the video paster.
Drawings
FIG. 1 is a flowchart illustrating a video sticker processing method according to a first embodiment of the present invention;
FIG. 2 is another flow chart illustrating a video sticker processing method according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of a video sticker processing apparatus according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a preferred embodiment in the second embodiment of the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps. The method provided by the embodiment can be executed by the relevant server, and the server is taken as an example for explanation below.
Please refer to fig. 1-2.
As shown in fig. 1-2, the first embodiment provides a video sticker processing method, including steps S1-S3:
and S1, respectively carrying out face recognition and voice recognition on the video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful.
And S2, matching the voice recognition text with the description text of each sticker in the sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text.
S3, adding a target paster at the default position or the target position of the target video frame; wherein the target position is obtained by calculation according to the face position data.
As an example, a user uploads a video to be processed through a user terminal, and when the video to be processed is received by a server, face recognition and voice recognition are respectively performed on the video to be processed. If the face recognition is successful, the face position data can be obtained, and if the voice recognition is successful, the voice recognition text can be obtained. The user terminal comprises a mobile phone, a computer, a tablet and other communication equipment which can be connected with the server.
In a preferred embodiment of this embodiment, after obtaining the speech recognition text, the server may issue the speech recognition text to the user terminal, so that the user may confirm the speech recognition text through the user terminal.
And when the voice recognition is successful, matching the voice recognition text with the description text of each paster in the paster library, wherein the paster corresponding to the description text successfully matched with the voice recognition text is the target paster. And simultaneously, acquiring a target video frame according to the voice recognition text.
In a preferred embodiment of this embodiment, after obtaining the target sticker, the server may issue the target sticker to the user terminal, so that the user may confirm the target sticker through the user terminal. After the target video frame is obtained, the server can issue the target video frame to the user terminal, so that the user can confirm the target video frame through the user terminal.
After the target sticker and the target video frame are obtained, determining the adding position of the target sticker by combining the face recognition result, namely when the face recognition fails, adding the target sticker at the default position of the target video frame according to the default position preset for the target sticker, when the face recognition succeeds, calculating to obtain the target position according to the face position data, and adding the target sticker at the target position of the target video frame.
Wherein, the setting process of the default position can refer to: when the face recognition of the video to be processed fails, namely the face cannot be recognized or the width of the face rectangle is smaller than 30% of the width of the mobile phone screen, firstly, a default rectangle of 300 x 380 is added in the center of the mobile phone screen, then, an inscribed ellipse of the default rectangle is drawn, points on the inscribed ellipse are default effective points, and finally, a default effective point is randomly screened from all the default effective points to serve as a default position.
Wherein, the calculation process of the target position can refer to: when the face recognition of the video to be processed is successful, namely the width of the face rectangle is greater than 30% of the width of the mobile phone screen, the width of the face rectangle is widened by 40%, the upper half of the face rectangle is increased by 60%, and the lower half of the face rectangle is increased by 30%, so that the width of the whole face rectangle is not less than 65% of the width of the mobile phone screen. And drawing an inscribed ellipse of the face rectangle, wherein points on the ellipse are standby points (equally divided into 8-10) of the target sticker, the standby points outside the mobile phone screen are unavailable points, and the standby points inside the mobile phone screen are available points. And then adding a default paster (the width of the paster is greater than 45% of the width of the face rectangle) to each available point, wherein if the placement area of the default paster exceeds 20% of the mobile phone screen, the corresponding available point is an invalid point, and if the placement area of the default paster does not exceed 20% of the mobile phone screen, the corresponding available point is a valid point. And finally randomly screening one effective point from all the effective points as a target position. When the number of the effective points is less than 3, one width in the middle is 80% planerwidth, and height is 70% planerheight (rectangle of the safety area), at this time, it needs to judge whether the upper or lower height of the center point of the rectangle is greater than 5% of the height of the mobile phone screen, if yes, the reverse effective point is determined as the target position.
Wherein, the selection process of the target paster rotation angle can refer to: if the adding position of the target sticker is on the left side of the mobile phone screen, the rotating angle is a random angle of 0-45 degrees clockwise, and if the adding position of the target sticker is on the right side of the mobile phone screen, the rotating angle is a random angle of 0-45 degrees anticlockwise.
In a preferred embodiment of this embodiment, after adding the target sticker at the default position or the target position of the target video frame, the server may issue the target video frame with the target sticker added to the user terminal, so that the user may confirm the video sticker processing through the user terminal.
In the embodiment, the face recognition and the voice recognition are respectively carried out on the video to be processed, so that the face position data is obtained when the face recognition is successful, the voice recognition text is obtained when the voice recognition is successful, the voice recognition text is matched with the description text of each sticker in the sticker library to obtain the target sticker, the target video frame is obtained according to the voice recognition text, the target sticker is added at the default position of the target video frame or the target position obtained by calculation according to the face position data, and the video sticker processing is completed.
In the embodiment, the face recognition and the voice recognition are performed on the video to be processed, so that when the voice recognition is successful, the voice recognition text is matched with the description text of each sticker in the sticker library to obtain the target sticker, the target video frame is obtained according to the voice recognition text, when the face recognition is failed, the target sticker is added at the default position of the target video frame according to the default position preset for the target sticker, when the face recognition is successful, the target position is calculated according to the face position data, and the target sticker is added at the target position of the target video frame. The embodiment can automatically determine the target sticker and the adding position thereof according to the face recognition result and the voice recognition result of the video to be processed, realize intelligent selection and placement of the target sticker, and improve the processing efficiency of the video sticker.
In an embodiment, the performing face recognition and voice recognition on the video to be processed respectively to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful specifically includes: sequentially carrying out face recognition on video frames of a video to be processed, and obtaining face position data corresponding to the video frames when the face recognition of one video frame is successful; and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain a voice recognition text.
As an example, a user records a video to be processed through a user terminal, uploads a video frame of the video to be processed, and when the server receives the video frame of the video to be processed, the server performs face recognition on the video frames of the video to be processed in sequence according to the receiving sequence of the video frames of the video to be processed, if the face recognition on one video frame is successful, the face recognition of the video to be processed is determined to be successful, so that face position data of the video frame is obtained, and if the face recognition on all the video frames is failed, the face recognition of the video to be processed is determined to be failed. The user finishes recording the video to be processed through the user terminal, uploads the last video frame of the video to be processed, the server performs voice recognition on the video to be processed when receiving the last video frame, if the voice recognition is successful, the recognized voice data is converted into text data to obtain a voice recognition text, and if the voice recognition is failed, the video sticker processing is quitted.
According to the embodiment, the face recognition is performed on the video frames of the video to be processed in sequence, the face position data corresponding to the video frames is obtained when the face recognition of one video frame is successful, the face recognition can be performed on the received video frames when the user records the video to be processed, the face recognition of the rest video frames is not required after the face position data is obtained, the processing time of the face recognition of the video to be processed is greatly shortened, and the processing efficiency of the video paster is improved.
In a preferred embodiment, the matching the speech recognition text with the description text of each sticker in the sticker library to obtain a target sticker, and obtaining a target video frame according to the speech recognition text specifically includes: matching text words obtained by performing word segmentation processing on the voice recognition text with description texts of each sticker in a sticker library to obtain target stickers; acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as a target video frame.
As an example, after obtaining the voice recognition text, the server performs word segmentation processing on the voice recognition text to obtain a text word set, and matches the text words in the text word set with the description text of each sticker in the sticker library one by one, if the description text with a sticker in all matching results matches with the text words, randomly selects a sticker from the matched stickers as a target sticker, and if the description text without a sticker in all matching results matches with the text words, quits the video sticker processing.
For example, the word segmentation processing is performed on the voice recognition text "good happy" from front to back to obtain a text word set { ("good", "open", "heart"), ("good", "open"), ("good" or "open") }, the "good", "open", "heart", "good open", "good open" and "open" are respectively matched with the descriptive text of each sticker in the sticker library, if the descriptive text of a sticker in all matching results matches the text word, a sticker is randomly selected from the matched stickers as a target sticker, and if the descriptive text of no sticker in all matching results matches the text word, the video sticker processing is exited.
In a preferred implementation of this embodiment, a sticker is randomly selected from the matching results of the text word with the longest text length as the target sticker.
For example, randomly sift one sticker from the matching result of "good happy" as the target sticker.
As an example, after obtaining the voice recognition text, the server performs word segmentation processing on the voice recognition text to obtain a text word set, and matches the text words in the text word set with the description text of each sticker in the sticker library one by one according to the sequence of the text length of the text words from long to short, if the description text with a sticker in the current matching result matches with the text words, randomly selects a sticker from the matched stickers as a target sticker, and if the description text without a sticker in all matching results matches with the text words, the video sticker processing is exited.
For example, the word segmentation processing is performed on the voice recognition text "good happy" from front to back to obtain a text word set { ("good happy"), ("good", "open", "heart") }, the "good happy", "good", "open", "heart" are sequentially matched with the descriptive text of each sticker in the sticker library, if the descriptive text of a sticker in the current matching result matches the text word, a sticker is randomly selected from the matched stickers as a target sticker, and if the descriptive text of no sticker in all matching results matches the text word, the video sticker processing is exited.
According to the method and the device, the obtained text words are matched with the description text of each sticker in the sticker library to obtain the target sticker by performing word segmentation processing on the voice recognition text, the success rate of sticker matching can be effectively increased, and therefore the processing efficiency of the video stickers is improved.
In a preferred embodiment of this embodiment, after obtaining the text word set, the server may issue the text word set to the user terminal, so that the user may confirm the text word set through the user terminal.
Wherein, the data structure of the issued text word set can refer to: { (text word 1, startTime, endTime), (text word 2, startTime, endTime), … … }, startTime denotes the start time of the corresponding text word, and endTime denotes the end time of the corresponding text word.
In a preferred embodiment of this embodiment, after obtaining the matching sticker, the server may issue the matching sticker to the user terminal, so that the user may confirm the matching sticker through the user terminal.
Wherein, the data structure of the issued matching sticker can refer to: { (text word 1: matching Sticker 1), (text word 2, matching Sticker 2), … … }.
In a preferred embodiment, the adding the target sticker at the default position or the target position of the target video frame further comprises: and when the appearance time of the target paster at the default position or the target position reaches a preset threshold value, removing the target paster.
For example, after adding the target sticker at the default position or the target position of the target video frame, the occurrence duration of the target sticker at the default position or the target position is detected, and if the occurrence duration of the target sticker at the default position or the target position reaches a preset threshold, the target sticker is removed from the target video frame. Wherein the preset threshold is preset according to actual needs, such as 2 seconds.
According to the embodiment, the target sticker is removed when the appearance duration of the target sticker at the default position or the target position reaches the preset threshold value, so that the situation that the target sticker stays at the default position or the target position for too long to block video content can be avoided.
In a preferred embodiment, after the performing face recognition and voice recognition on the video to be processed respectively to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful, the method further includes: and adding voice recognition text at the position of the subtitle of the target video frame.
According to the embodiment, the voice recognition text is added to the subtitle position of the target video frame, so that the adding position of the subtitle can be automatically determined according to the voice recognition text, and the video editing processing efficiency is improved.
Please refer to fig. 3-4.
As shown in fig. 3, a second embodiment provides a video-sticker processing apparatus comprising: the face and voice recognition module 21 is used for performing face recognition and voice recognition on the video to be processed respectively so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful; the target sticker acquiring module 22 is configured to match the voice recognition text with the description text of each sticker in the sticker library to obtain a target sticker, and acquire a target video frame according to the voice recognition text; a target sticker adding module 23, configured to add a target sticker at a default position or a target position of a target video frame; wherein the target position is obtained by calculation according to the face position data.
As an example, a user uploads a to-be-processed video through a user terminal, and when the to-be-processed video is received, the face and voice recognition module 21 performs face recognition and voice recognition on the to-be-processed video, respectively. If the face recognition is successful, the face position data can be obtained, and if the voice recognition is successful, the voice recognition text can be obtained. The user terminal comprises a mobile phone, a computer, a tablet and other communication equipment which can be connected with the server.
In a preferred embodiment of this embodiment, after obtaining the speech recognition text, the face and speech recognition module 21 may issue the speech recognition text to the user terminal, so that the user may confirm the speech recognition text through the user terminal.
When the voice recognition is successful, the target sticker acquiring module 22 matches the voice recognition text with the description text of each sticker in the sticker library, and the sticker corresponding to the description text successfully matched with the voice recognition text is the target sticker. Meanwhile, the target video frame is acquired according to the voice recognition text by the target sticker acquisition module 22.
In a preferred embodiment of this embodiment, after the target sticker is obtained, the target sticker obtaining module 22 may issue the target sticker to the user terminal, so that the user may confirm the target sticker through the user terminal. After the target video frame is obtained, the target sticker acquisition module 22 may issue the target video frame to the user terminal, so that the user may confirm the target video frame through the user terminal.
After the target sticker and the target video frame are obtained, the adding position of the target sticker is determined by combining the face recognition result through the target sticker adding module 23, namely when the face recognition fails, the target sticker is added at the default position of the target video frame according to the default position preset for the target sticker, when the face recognition succeeds, the target position is obtained through calculation according to the face position data, and the target sticker is added at the target position of the target video frame.
Wherein, the setting process of the default position can refer to: when the face recognition of the video to be processed fails, namely the face cannot be recognized or the width of the face rectangle is smaller than 30% of the width of the mobile phone screen, firstly, a default rectangle of 300 x 380 is added in the center of the mobile phone screen, then, an inscribed ellipse of the default rectangle is drawn, points on the inscribed ellipse are default effective points, and finally, a default effective point is randomly screened from all the default effective points to serve as a default position.
Wherein, the calculation process of the target position can refer to: when the face recognition of the video to be processed is successful, namely the width of the face rectangle is greater than 30% of the width of the mobile phone screen, the width of the face rectangle is widened by 40%, the upper half of the face rectangle is increased by 60%, and the lower half of the face rectangle is increased by 30%, so that the width of the whole face rectangle is not less than 65% of the width of the mobile phone screen. And drawing an inscribed ellipse of the face rectangle, wherein points on the ellipse are standby points (equally divided into 8-10) of the target sticker, the standby points outside the mobile phone screen are unavailable points, and the standby points inside the mobile phone screen are available points. And then adding a default paster (the width of the paster is greater than 45% of the width of the face rectangle) to each available point, wherein if the placement area of the default paster exceeds 20% of the mobile phone screen, the corresponding available point is an invalid point, and if the placement area of the default paster does not exceed 20% of the mobile phone screen, the corresponding available point is a valid point. And finally randomly screening one effective point from all the effective points as a target position. When the number of the effective points is less than 3, one width in the middle is 80% planerwidth, and height is 70% planerheight (rectangle of the safety area), at this time, it needs to judge whether the upper or lower height of the center point of the rectangle is greater than 5% of the height of the mobile phone screen, if yes, the reverse effective point is determined as the target position.
Wherein, the selection process of the target paster rotation angle can refer to: if the adding position of the target sticker is on the left side of the mobile phone screen, the rotating angle is a random angle of 0-45 degrees clockwise, and if the adding position of the target sticker is on the right side of the mobile phone screen, the rotating angle is a random angle of 0-45 degrees anticlockwise.
In a preferred embodiment of this embodiment, after the target sticker is added at the default position or the target position of the target video frame, the target sticker adding module 23 may issue the target video frame with the target sticker added to the user terminal, so that the user may confirm the video sticker processing through the user terminal.
In this embodiment, the face and voice recognition module 21 performs face recognition and voice recognition on the video to be processed, so as to obtain face position data when the face recognition is successful, obtain voice recognition text when the voice recognition is successful, and further the target sticker acquisition module 22 matches the voice recognition text with the description text of each sticker in the sticker library to obtain a target sticker, and acquires a target video frame according to the voice recognition text, so that the target sticker addition module 23 adds the target sticker at the default position of the target video frame or the target position obtained by calculation according to the face position data, thereby completing the video sticker processing.
In the embodiment, the face recognition and the voice recognition are performed on the video to be processed, so that when the voice recognition is successful, the voice recognition text is matched with the description text of each sticker in the sticker library to obtain the target sticker, the target video frame is obtained according to the voice recognition text, when the face recognition is failed, the target sticker is added at the default position of the target video frame according to the default position preset for the target sticker, when the face recognition is successful, the target position is calculated according to the face position data, and the target sticker is added at the target position of the target video frame. The embodiment can automatically determine the target sticker and the adding position thereof according to the face recognition result and the voice recognition result of the video to be processed, realize intelligent selection and placement of the target sticker, and improve the processing efficiency of the video sticker.
In an embodiment, the performing face recognition and voice recognition on the video to be processed respectively to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful specifically includes: sequentially carrying out face recognition on video frames of a video to be processed, and obtaining face position data corresponding to the video frames when the face recognition of one video frame is successful; and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain a voice recognition text.
As an example, a user records a video to be processed through a user terminal, uploads a video frame of the video to be processed, and when the video frame of the video to be processed is received by the face and voice recognition module 21, face recognition is performed on the video frames of the video to be processed in sequence according to a video frame receiving sequence of the video to be processed, if face recognition of one video frame is successful, face recognition of the video to be processed is determined to be successful, face position data of the video frame is obtained, and if face recognition of all video frames is failed, face recognition of the video to be processed is determined to be failed. The user finishes recording the video to be processed through the user terminal, uploads the last video frame of the video to be processed, the face and voice recognition module 21 performs voice recognition on the video to be processed when receiving the last video frame, if the voice recognition is successful, the recognized voice data is converted into text data to obtain a voice recognition text, and if the voice recognition is failed, the video sticker processing is quitted.
In the embodiment, the face and voice recognition module 21 is used for sequentially performing face recognition on video frames of videos to be processed, face position data corresponding to the video frames are obtained when the face recognition of one video frame is successful, the face recognition can be performed on the received video frames when a user records the videos to be processed, face recognition on the rest video frames is not needed after the face position data is obtained, the processing time of the face recognition of the videos to be processed is greatly shortened, and the processing efficiency of video stickers is improved.
In a preferred embodiment, the matching the speech recognition text with the description text of each sticker in the sticker library to obtain a target sticker, and obtaining a target video frame according to the speech recognition text specifically includes: matching text words obtained by performing word segmentation processing on the voice recognition text with description texts of each sticker in a sticker library to obtain target stickers; acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as a target video frame.
As an example, after obtaining the voice recognition text, the target sticker obtaining module 22 performs word segmentation processing on the voice recognition text to obtain a text word set, matches text words in the text word set with description texts of each sticker in the sticker library one by one, randomly selects one sticker from matched stickers as a target sticker if the description texts with the stickers in all matching results are matched with the text words, and exits from the video sticker processing if the description texts without the stickers in all matching results are not matched with the text words.
For example, the word segmentation processing is performed on the voice recognition text "good happy" from front to back to obtain a text word set { ("good", "open", "heart"), ("good", "open"), ("good" or "open") }, the "good", "open", "heart", "good open", "good open" and "open" are respectively matched with the descriptive text of each sticker in the sticker library, if the descriptive text of a sticker in all matching results matches the text word, a sticker is randomly selected from the matched stickers as a target sticker, and if the descriptive text of no sticker in all matching results matches the text word, the video sticker processing is exited.
In a preferred implementation of this embodiment, a sticker is randomly selected from the matching results of the text word with the longest text length as the target sticker.
For example, randomly sift one sticker from the matching result of "good happy" as the target sticker.
As an example, after obtaining the voice recognition text, the target sticker obtaining module 22 performs word segmentation processing on the voice recognition text to obtain a text word set, and matches the text words in the text word set with the description text of each sticker in the sticker library one by one according to the sequence of the text length of the text word from long to short, if the description text of a sticker in the current matching result matches the text word, randomly selects a sticker from the matched stickers as a target sticker, and if the description text of no sticker in all matching results matches the text word, exits from the video sticker processing.
For example, the word segmentation processing is performed on the voice recognition text "good happy" from front to back to obtain a text word set { ("good happy"), ("good", "open", "heart") }, the "good happy", "good", "open", "heart" are sequentially matched with the descriptive text of each sticker in the sticker library, if the descriptive text of a sticker in the current matching result matches the text word, a sticker is randomly selected from the matched stickers as a target sticker, and if the descriptive text of no sticker in all matching results matches the text word, the video sticker processing is exited.
In the embodiment, the target sticker acquisition module 22 is used for performing word segmentation processing on the voice recognition text, matching the obtained text words with the description text of each sticker in the sticker library to obtain the target sticker, so that the success rate of sticker matching can be effectively increased, and the processing efficiency of the video sticker is improved.
In a preferred embodiment of this embodiment, after obtaining the text word set, the target sticker acquiring module 22 may issue the text word set to the user terminal, so that the user may confirm the text word set through the user terminal.
Wherein, the data structure of the issued text word set can refer to: { (text word 1, startTime, endTime), (text word 2, startTime, endTime), … … }, startTime denotes the start time of the corresponding text word, and endTime denotes the end time of the corresponding text word.
In a preferred embodiment of this embodiment, after the matching sticker is obtained, the target sticker acquiring module 22 may issue the matching sticker to the user terminal, so that the user may confirm the matching sticker through the user terminal.
Wherein, the data structure of the issued matching sticker can refer to: { (text word 1: matching Sticker 1), (text word 2, matching Sticker 2), … … }. In a preferred embodiment, the target sticker adding module 23 is further configured to remove the target sticker when the target sticker appears at the default location or the target location for a period of time reaching a preset threshold.
In a preferred embodiment, the target sticker adding module 23 is further configured to remove the target sticker when the target sticker appears at the default location or the target location for a period of time reaching a preset threshold.
For example, after adding the target sticker at the default position or the target position of the target video frame, the occurrence duration of the target sticker at the default position or the target position is detected, and if the occurrence duration of the target sticker at the default position or the target position reaches a preset threshold, the target sticker is removed from the target video frame. Wherein the preset threshold is preset according to actual needs, such as 2 seconds.
In the embodiment, through the target sticker adding module 23, when the occurrence duration of the target sticker at the default position or the target position reaches the preset threshold, the target sticker is removed, so that the situation that the target sticker stays too long at the default position or the target position to block the video content can be avoided.
In a preferred embodiment, as shown in fig. 4, the video sticker processing apparatus further includes a speech recognition text adding module 24, configured to perform face recognition and speech recognition on the video to be processed respectively, so as to obtain face position data when the face recognition is successful, and add a speech recognition text at a subtitle position of the target video frame after obtaining the speech recognition text when the speech recognition is successful.
In the embodiment, the speech recognition text adding module 24 is used for adding the speech recognition text at the subtitle position of the target video frame, and the adding position of the subtitle can be automatically determined according to the speech recognition text, so that the video editing processing efficiency is improved.
In summary, the embodiment of the present invention has the following advantages:
the method comprises the steps of respectively carrying out face recognition and voice recognition on videos to be processed to obtain face position data when the face recognition is successful, obtaining voice recognition texts when the voice recognition is successful, further matching the voice recognition texts with description texts of all stickers in a sticker library to obtain target stickers, obtaining target video frames according to the voice recognition texts, adding the target stickers at default positions of the target video frames or target positions obtained by calculation according to the face position data, and finishing video sticker processing. The embodiment of the invention carries out face recognition and voice recognition on a video to be processed, so that when the voice recognition is successful, a voice recognition text is matched with a description text of each sticker in a sticker library to obtain a target sticker, a target video frame is obtained according to the voice recognition text, when the face recognition is failed, the target sticker is added at the default position of the target video frame according to the default position preset aiming at the target sticker, when the face recognition is successful, the target position is calculated according to face position data, and the target sticker is added at the target position of the target video frame. The embodiment of the invention can automatically determine the target paster and the adding position thereof according to the face recognition result and the voice recognition result of the video to be processed, realize intelligent selection and placement of the target paster and improve the processing efficiency of the video paster.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (10)

1. A video sticker processing method is characterized by comprising the following steps:
respectively carrying out face recognition and voice recognition on a video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful;
matching the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and acquiring a target video frame according to the voice recognition text;
adding the target paster at the default position or the target position of the target video frame; wherein the target position is calculated from the face position data.
2. The video sticker processing method of claim 1, wherein the face recognition and the voice recognition are performed on the video to be processed, respectively, to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful, specifically:
sequentially carrying out face recognition on the video frames of the video to be processed, and obtaining the face position data of the corresponding video frame when the face recognition of one video frame is successful;
and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain the voice recognition text.
3. The video sticker processing method according to claim 1, wherein the matching of the voice recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and obtaining a target video frame according to the voice recognition text specifically comprises:
matching text words obtained by word segmentation processing of the voice recognition text with description texts of each sticker in the sticker library to obtain the target sticker;
and acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as the target video frame.
4. The video sticker processing method of claim 1, wherein said adding the target sticker at a default position or a target position of the target video frame further comprises:
and when the appearance time of the target paster at the default position or the target position reaches a preset threshold value, removing the target paster.
5. The video sticker processing method of claim 1, wherein after the performing face recognition and voice recognition on the video to be processed respectively to obtain face position data when the face recognition is successful and obtain the voice recognition text when the voice recognition is successful, further comprising:
and adding the voice recognition text at the subtitle position of the target video frame.
6. A video sticker processing apparatus, comprising:
the face and voice recognition module is used for respectively carrying out face recognition and voice recognition on the video to be processed so as to obtain face position data when the face recognition is successful and obtain a voice recognition text when the voice recognition is successful;
the target paster obtaining module is used for matching the voice recognition text with the description text of each paster in the paster library to obtain a target paster and obtaining a target video frame according to the voice recognition text;
the target paster adding module is used for adding the target paster at the default position or the target position of the target video frame; wherein the target position is calculated from the face position data.
7. The video sticker processing device according to claim 6, wherein the video to be processed is subjected to face recognition and voice recognition respectively, so as to obtain face position data when the face recognition is successful and obtain voice recognition text when the voice recognition is successful, specifically:
sequentially carrying out face recognition on the video frames of the video to be processed, and obtaining the face position data of the corresponding video frame when the face recognition of one video frame is successful;
and performing voice recognition on the video to be processed, and converting the recognized voice data into text data when the voice recognition is successful to obtain the voice recognition text.
8. The video sticker processing apparatus of claim 6, wherein the matching of the speech recognition text with the description text of each sticker in a sticker library to obtain a target sticker, and obtaining a target video frame according to the speech recognition text specifically comprises:
matching text words obtained by word segmentation processing of the voice recognition text with description texts of each sticker in the sticker library to obtain the target sticker;
and acquiring the appearance time of the voice recognition text in the video to be processed, and taking the video frame with the playing time corresponding to the appearance time as the target video frame.
9. The video sticker processing apparatus of claim 6, wherein the target sticker adding module is further to remove the target sticker when a length of time of occurrence of the target sticker at the default location or the target location reaches a preset threshold.
10. The video sticker processing device of claim 6, further comprising a speech recognition text adding module, configured to perform face recognition and speech recognition on the video to be processed respectively to obtain face position data when the face recognition is successful, and add the speech recognition text at the subtitle position of the target video frame after obtaining the speech recognition text when the speech recognition is successful.
CN202010297623.5A 2020-04-15 2020-04-15 Video sticker processing method and device Pending CN111556335A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010297623.5A CN111556335A (en) 2020-04-15 2020-04-15 Video sticker processing method and device
US16/935,167 US11218648B2 (en) 2020-04-15 2020-07-21 Video sticker processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010297623.5A CN111556335A (en) 2020-04-15 2020-04-15 Video sticker processing method and device

Publications (1)

Publication Number Publication Date
CN111556335A true CN111556335A (en) 2020-08-18

Family

ID=72004362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010297623.5A Pending CN111556335A (en) 2020-04-15 2020-04-15 Video sticker processing method and device

Country Status (2)

Country Link
US (1) US11218648B2 (en)
CN (1) CN111556335A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613067A (en) * 2021-08-03 2021-11-05 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium
US11705120B2 (en) * 2019-02-08 2023-07-18 Samsung Electronics Co., Ltd. Electronic device for providing graphic data based on voice and operating method thereof
WO2023160515A1 (en) * 2022-02-25 2023-08-31 北京字跳网络技术有限公司 Video processing method and apparatus, device and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114125485B (en) * 2021-11-30 2024-04-30 北京字跳网络技术有限公司 Image processing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791692A (en) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 Information processing method and terminal
CN106210545A (en) * 2016-08-22 2016-12-07 北京金山安全软件有限公司 Video shooting method and device and electronic equipment
CN109660855A (en) * 2018-12-19 2019-04-19 北京达佳互联信息技术有限公司 Paster display methods, device, terminal and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4310916B2 (en) * 2000-11-08 2009-08-12 コニカミノルタホールディングス株式会社 Video display device
JP2014085796A (en) * 2012-10-23 2014-05-12 Sony Corp Information processing device and program
KR102108893B1 (en) * 2013-07-11 2020-05-11 엘지전자 주식회사 Mobile terminal
US10446189B2 (en) * 2016-12-29 2019-10-15 Google Llc Video manipulation with face replacement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791692A (en) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 Information processing method and terminal
CN106210545A (en) * 2016-08-22 2016-12-07 北京金山安全软件有限公司 Video shooting method and device and electronic equipment
CN109660855A (en) * 2018-12-19 2019-04-19 北京达佳互联信息技术有限公司 Paster display methods, device, terminal and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11705120B2 (en) * 2019-02-08 2023-07-18 Samsung Electronics Co., Ltd. Electronic device for providing graphic data based on voice and operating method thereof
CN113613067A (en) * 2021-08-03 2021-11-05 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium
WO2023011146A1 (en) * 2021-08-03 2023-02-09 北京字跳网络技术有限公司 Video processing method and apparatus, device, and storage medium
CN113613067B (en) * 2021-08-03 2023-08-22 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium
WO2023160515A1 (en) * 2022-02-25 2023-08-31 北京字跳网络技术有限公司 Video processing method and apparatus, device and medium

Also Published As

Publication number Publication date
US11218648B2 (en) 2022-01-04
US20210329176A1 (en) 2021-10-21

Similar Documents

Publication Publication Date Title
CN111556335A (en) Video sticker processing method and device
CN110446115B (en) Live broadcast interaction method and device, electronic equipment and storage medium
CN109473123B (en) Voice activity detection method and device
EP3993434A1 (en) Video processing method, apparatus and device
CN104618803B (en) Information-pushing method, device, terminal and server
JP6968908B2 (en) Context acquisition method and context acquisition device
CN111785279A (en) Video speaker identification method and device, computer equipment and storage medium
CN108920640B (en) Context obtaining method and device based on voice interaction
CN106982344B (en) Video information processing method and device
CN111401238A (en) Method and device for detecting character close-up segments in video
WO2023151424A1 (en) Method and apparatus for adjusting playback rate of audio picture of video
US20220201357A1 (en) Limited-level picture detection method, device, display device and readable storage medium
CN113705300A (en) Method, device and equipment for acquiring phonetic-to-text training corpus and storage medium
CN105100647A (en) Subtitle correction method and terminal
CN112235632A (en) Video processing method and device and server
CN114120969A (en) Method and system for testing voice recognition function of intelligent terminal and electronic equipment
US11889127B2 (en) Live video interaction method and apparatus, and computer device
US20160142456A1 (en) Method and Device for Acquiring Media File
CN115225962B (en) Video generation method, system, terminal equipment and medium
CN115150660B (en) Video editing method based on subtitles and related equipment
KR20230106170A (en) Data processing method and apparatus, device, and medium
CN111128190B (en) Expression matching method and system
CN113613070A (en) Face video processing method and device, electronic equipment and storage medium
CN112487247A (en) Video processing method and video processing device
CN111013138A (en) Voice control method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200818