CN110602424A - Video processing method and electronic equipment - Google Patents

Video processing method and electronic equipment Download PDF

Info

Publication number
CN110602424A
CN110602424A CN201910803481.2A CN201910803481A CN110602424A CN 110602424 A CN110602424 A CN 110602424A CN 201910803481 A CN201910803481 A CN 201910803481A CN 110602424 A CN110602424 A CN 110602424A
Authority
CN
China
Prior art keywords
audio data
processing
data
subject
subject object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910803481.2A
Other languages
Chinese (zh)
Inventor
沈军行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN201910803481.2A priority Critical patent/CN110602424A/en
Publication of CN110602424A publication Critical patent/CN110602424A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/9201Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal
    • H04N5/9202Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal the additional signal being a sound signal

Abstract

The embodiment of the invention discloses a video processing method and electronic equipment. The video processing method comprises the following steps: acquiring first image data and first audio data of first video data; focusing pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data; performing focusing processing on audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data; and carrying out coding compression processing on the second image data and the second audio data to obtain second video data. The embodiment of the invention can realize the focusing processing of the image and the audio of each main object.

Description

Video processing method and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a video processing method and electronic equipment.
Background
At present, image and audio separation technology is widely applied; after the image or audio is separated, the separated image or audio may be subjected to a focusing process, so as to achieve focusing of the image or audio.
However, the focusing process is simply performed on the image or the audio, and it is not considered that the image and the audio corresponding to each subject object in one video may be different, and the image and the audio corresponding to each subject object are not focused separately.
Disclosure of Invention
The embodiment of the invention provides a video processing method and electronic equipment, and aims to solve the problem that images and audios corresponding to each main object cannot be focused respectively.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a video processing method, where the video processing method includes:
acquiring first image data and first audio data of first video data;
focusing pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
performing focusing processing on audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
and carrying out coding compression processing on the second image data and the second audio data to obtain second video data.
In a second aspect, an embodiment of the present invention provides an electronic device, including:
the acquisition module is used for acquiring first image data and first audio data of the first video data;
the first focusing module is used for focusing the pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
the second focusing module is used for focusing the audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
and the coding module is used for coding and compressing the second image data and the second audio data to obtain second video data.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the video processing method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides an electronic device, including:
a touch screen, wherein the touch screen comprises a touch sensitive surface and a display screen;
one or more processors;
one or more memories;
one or more sensors;
and one or more computer programs, wherein the one or more computer programs are stored in the one or more memories, the one or more computer programs comprising instructions which, when executed by the electronic device, cause the electronic device to perform the steps of the video processing method of the first aspect.
In a fifth aspect, an embodiment of the present invention further provides a computer non-transitory storage medium, where a computer program is stored in the computer non-transitory storage medium, where the computer program is executed by a computing device to implement the steps of the video processing method according to the first aspect.
In a sixth aspect, an embodiment of the present invention further provides a computer program product, which is characterized in that when the computer program product runs on a computer, the computer is caused to execute the video processing method according to the first aspect.
In the embodiment of the invention, through a preset object separation network, focusing processing is carried out on pixels of at least one main object in first image data of electronic equipment to obtain at least one second image data; and performing focusing processing on the audio data matched with at least one main body object in the first audio data of the electronic equipment through a preset voice separation network to obtain at least one second audio data, thereby realizing the focusing processing on the image data and the audio data of each main body object.
Drawings
Fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention;
FIG. 2 is a schematic view of multi-user focusing according to an embodiment of the present invention;
fig. 3 is a schematic view of a video processing method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another video processing method according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a video processing method according to an embodiment of the present invention. As shown in fig. 1, the video processing method may include:
step 101: acquiring image data and audio data of first video data;
step 102: focusing pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
step 103: performing focusing processing on audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
step 104: and carrying out coding compression processing on the second image data and the second audio data to obtain second video data.
In the embodiment of the invention, through a preset object separation network, focusing processing is carried out on pixels of at least one main object in first image data of electronic equipment to obtain at least one second image data; focusing audio data matched with at least one main body object in the first audio data of the electronic equipment through a preset voice separation network to obtain at least one second audio data; thereby enabling focus processing of the image data and audio data of each subject object.
In this embodiment of the present invention, the first image data in step 101 is image data in first video data; the first audio data in step 101 is audio data in first video data; the first image data can be acquired from the first audio data through a camera or other image acquisition equipment; the first audio data may be obtained from the first audio data by a microphone or other audio capture device.
In this embodiment of the present invention, the focusing processing on the pixels of the at least one subject object in step 102 includes:
identifying pixels of a non-subject object from each frame of image of the image data based on pixels of at least one subject object;
based on a predetermined Gaussian filter processing coefficient, carrying out Gaussian filter processing on pixels of the non-subject object; or
The pixels other than the subject are subjected to gradation processing based on a predetermined gradation processing coefficient.
In the embodiment of the invention, the pixels of the non-main body object are identified from each frame of image according to the pixels of at least one main body object, and the pixels of the non-main body object are subjected to blurring or color-reserving treatment, so that the focusing treatment on the pixels of the main body object is realized; the focusing processing of the pixels means that the pixels of the main object are displayed in a concentrated mode, and the display of the pixels of the non-main object is weakened; wherein, the blurring processing refers to the fuzzification processing of the pixels of the non-main object by applying Gaussian filtering: ib ═ gaussblu (Ib, alpha); the color retention is to retain the original color for the pixels of the subject object, and to perform the graying processing for the pixels of the non-subject object: ib ═ Gray (Ib, alpha). Wherein Ib is a pixel of the non-main object, Ib' is second image data, and alpha is an adjustment parameter.
Wherein, the alpha parameter can be a fixed value or a variable value; for example, when the first video data is recorded video data, the alpha parameter is a fixed value, and is generally preset before recording, and after the video is recorded, the value of the alpha parameter cannot be changed; when the first video data is the video data in the recording process, the value of the alpha parameter can be changed according to the setting requirement of the user, so that the second video data can better meet the requirement of the user, and the user experience is improved.
In an embodiment of the present invention, after identifying pixels of a non-subject object from the first image data, the video processing method further includes: based on the acquired image brightness of the non-subject object, adjusting a predetermined gaussian filter processing coefficient or a predetermined gray processing coefficient, specifically including:
acquiring the image brightness of a non-subject object;
and adjusting a predetermined Gaussian filter processing coefficient or a predetermined gray processing coefficient according to the image brightness.
If the image brightness value of the non-subject object is high, that is, if the brightness is high, the predetermined gaussian filter processing coefficient or the predetermined gray scale processing coefficient may be increased to darken the image brightness of the non-subject object, thereby implementing the focusing process on the pixels of the subject object.
In this embodiment of the present invention, the performing the focusing process on the audio data of at least one subject object in the first audio data in step 103 includes:
identifying audio data of a non-subject object from the first audio data based on the audio data of the at least one subject object;
and carrying out attenuation processing on the audio data of the non-subject object based on a preset attenuation coefficient.
In the embodiment of the invention, according to the audio data of at least one main body object, the audio data of a non-main body object is identified from each frame of audio, and the audio data of the non-main body object is subjected to suppression attenuation processing, so that the focusing processing of the audio data of the main body object is realized, wherein the focusing processing of the audio data refers to the attenuation processing of the audio data of the non-main body object, so that the audio data of the main body object is displayed in a centralized manner; wherein, the attenuation inhibition treatment: ab' is beta Ab, where beta is an attenuation coefficient between 0 and 1, and beta is 0 if attenuation is completely suppressed.
Wherein, the beta parameter can be a fixed value or a variable value; for example, when the first video data is recorded video data, the beta parameter is a fixed value, and is generally preset before recording, and after the video is recorded, the value of the beta parameter cannot be changed; when the first video data is the video data in the recording process, the value of the beta parameter can be changed according to the setting requirement of the user, so that the second video data can better meet the requirement of the user, and the user experience is improved.
In this embodiment of the present invention, the performing the focus processing on the audio data of at least one subject object in step 103 includes:
and replacing the audio data of the subject object with preset audio data.
In the embodiment of the present invention, the manner of focusing on the audio data of the main object is not limited to the manner of suppressing attenuation, replacing by preset audio data, and the like, for example, a virtual sound, as long as the audio data of the main object can be highlighted compared with the audio data of the non-main object, which can be covered in the protection scope of the embodiment of the present invention, and is not described herein again.
In an embodiment of the present invention, after acquiring the first image data and the first audio data of the first video data, the video processing method further includes:
determining a target pixel of each subject object based on a selection input of at least one subject object by a user;
determining target audio data matched with each subject object;
and establishing a mapping relation between the target pixel of each subject object and the target audio data matched with each subject object, and storing the mapping relation into the second video data.
The target pixels are all pixels of the main object, and the target audio data are all audio data of the main object.
In separating the subject objects, based on a selection input of at least one subject object by a user, segmenting pixels of a plurality of subject objects, for example, I0, I1, … …, In, from the first image data; and determines audio data matching each subject object, e.g., a0, a1, … …, An; wherein, I0 and A0 correspond to the same main body object, I1 and A1 correspond to the same main body object, …, and In and An correspond to the same main body object; and establishing a mapping relation between the pixels of the same subject object and the audio data, and storing the mapping relation in the second video data. The audio data matched with the subject object refers to the audio data generated by the subject object or the audio data corresponding to the subject object.
After the mapping relationship is established, different importance coefficients (i.e., weight values) may also be set for different subject objects, for example, different first weight values may be set for pixels of different subject objects, and different second weight values may be set for audio data of different subject objects. After the weight values are set, a map is stored in the second audio data. The above-mentioned weight value that sets up different to a plurality of main part objects can further enrich user's visual effect in subsequent broadcast, improves user experience.
In an embodiment of the present invention, determining target audio data matching each subject object includes:
screening target audio data matched with the audio characteristics of each subject object from at least one pre-stored audio data;
alternatively, the audio data selected by the user is determined as the target audio data matched with each subject object.
It should be noted that, for the first determination of the target audio data matched with the subject object, the electronic device mainly performs identification, that is, stores the audio data of each subject object in advance, and performs screening, and because the audio features (for example, voiceprints) of each subject object are different, voiceprint identification can be performed respectively, and the target audio data matched with the audio features of each subject object is screened out; for the second type of determination of target audio data matching the subject object, the user's selection is determined as target audio data matching each subject object.
According to the embodiment of the invention, the audio data of the main object is matched, so that the pixels of the same main object are associated with the audio data, the pixels of the main object and the audio data are conveniently processed simultaneously in the follow-up process, the visual effect of a user can be further enriched in the follow-up playing process, and the user experience is improved.
In an embodiment of the invention, the selection input comprises at least one of: single click, double click, long press, etc.
In the embodiment of the present invention, after performing focusing processing on pixels of at least one main object in first image data through a preset object separation network to obtain at least one second image data, the video processing method further includes:
performing gaussian filtering processing or gray processing on target pixels of at least two subject objects based on a first weight value in a case where at least two subject objects of the at least one subject object are selected;
playing the second image data which is subjected to Gaussian filtering processing or gray level processing;
the preset Gaussian filter processing coefficient or the preset gray processing coefficient of each main object in the at least one main object corresponds to different first weight values; and/or the presence of a gas in the gas,
in the embodiment of the present invention, after performing focusing processing on audio data matched with at least one main object in first audio data through a preset voice separation network to obtain at least one second audio data, the video processing method further includes:
performing mixing processing on target audio data of at least two subject objects based on the second weight value in a case where at least two subject objects of the at least one subject object are selected;
playing the second audio data subjected to the audio mixing processing;
and the preset attenuation coefficient of each main body object in the at least one main body object corresponds to a different second weight value.
In the embodiment of the invention, after a user directly clicks and selects a certain subject object, the pixels of the non-subject object except the pixel Ix (the pixel of one subject object In I0, I1, … … and In) of the subject object are blurred or decolored, and the audio data of the non-subject object except the audio data Ax matched with the subject object (the video data matched with one subject object In A0, A1, … … and An) is suppressed. When a user selects a plurality of subject objects, different importance coefficients may be set for the plurality of subject objects, for example, c0, c1, c2, … …, cn; for pixels of a subject object, the degree of blurring or the degree of color retention of different subject objects may be controlled based on the importance coefficients; for the audio data of the subject object, weighted mixing may be performed based on the importance coefficient (a ═ c0 × a0+ c1 × a1+ c2 × a2+ … … + cn × An).
It should be noted that, for the video data during the recording process, different importance coefficients may be set for different executed subject objects during the recording process, so as to process the second video data in real time for the subject object selected by the user, taking the pixel as an example, when the user selects the pixel Ix of the subject object I, the pixels other than Ix (including the pixel of the non-subject object corresponding to the subject object I, the pixels of the other subject objects other than I, and the pixels of the non-subject objects corresponding to the other subject objects other than I) are subjected to gaussian filtering processing or gray processing and played.
For the recorded video data, different importance coefficients need to be set for different execution subject objects before the recording is completed, and the importance coefficients are stored in the second video data after the setting is completed, so that in the subsequent process of playing the second video data, when a user selects the pixel Ix of the subject object I, the gaussian filtering processing or the gray processing is performed on the pixels except the pixel Ix in the second video data, and the pixels are played.
According to the embodiment of the invention, the importance parameters can be introduced for different main body objects, so that different main body objects can be processed at different degrees, the visual experience of a user is enriched, and the video interestingness is enhanced.
In addition, the mixing process of the audio data may be performed simultaneously with or separately from the gaussian filtering process or the gradation process of the image data.
In an embodiment of the present invention, in a case where at least two subject objects of at least one subject object are selected, the video processing method includes:
step I: respectively acquiring first image data and first audio data in first video data;
step II: applying INet to each frame of image I In the first image data to perform body object segmentation, and segmenting a plurality of pixels I0, I1, … … and In of the body object; and segmenting each frame of audio in the first audio data by ANet to obtain a plurality of audio data A0, A1, … … and An matched with the main body object;
step III: establishing a mapping relation (Ix < - > Ay) between the audio data of the main object and the pixels of the main object;
and selecting on the separated audio waveform image, judging an attributive main object through audio data, clicking and selecting the main object on a screen of the electronic equipment, and further establishing a mapping relation between pixels of the same main object and the audio data.
Step IV: storing the mapping relation (Ix < - > Ay) in the recorded video data or the video data in the recording process;
step V: after the video playing end resolves the mapping relationship, the user can directly click and select an object (e.g., a portrait) on the playing interface, the pixels of the non-subject object other than the portrait Ix will be blurred or decolored, and the audio data of the corresponding non-subject object other than the audio Ax will be suppressed from being attenuated.
In fig. 2, if I0 is selected, the pixels of I1 and I2 will be blurred or decolored, and the audio data a1 of I1 and the audio data a2 of I2 will be suppressed from being attenuated; if I0 and I1 are selected, pixels of I0, I1 and I2 can be controlled to be blurred or decolored according to importance coefficients c1, c2 and c3, and weighted mixing is performed according to importance coefficients c1, c2 and c3, so that audio data after attenuation is suppressed is a-c 1 a1+ c2 a2+ c3 A3.
It should be noted that the user may also select multiple subjects, and importance coefficients (c0, c1, c2, … …, cn) are introduced, which are used to control the degree of blurring or fading of different subjects for visual pixels, and which may be used to weight-blend the audio data (a-c 0 a0+ c1 a1+ c2 a2+ … … + cn An).
And VI: the second image data I 'and the second audio data a' are recompressed and stored as new video data.
The embodiment of the invention introduces interaction on the basis of single image focusing, realizes multi-image focusing, and increases expandability and operation interestingness.
It should be noted that, for recorded video data, a first weight value corresponding to a pixel of each main object and a second weight value corresponding to audio data of each main object may be preset before recording the video data; in the subsequent playing process, controlling blurring of pixels of a non-main object or color-keeping processing of pixels of a main object according to a first weight value, controlling suppression attenuation processing of audio data of the main object according to a second weight value, and storing the processed video; and in the subsequent playing, playing the stored processed video data. For the video data in the recording process, a first weight value corresponding to the pixel of each main object and a second weight value corresponding to the audio data of each main object can be set in the recording process, and the video data in the recording process is processed so as to be convenient for real-time playing.
In an embodiment of the present invention, a subject object separation network includes: mask R-CNN, and/or, a voice separation network comprising: long Short-Term Memory networks (Long Short-Term Memory, LSTM, LSTM).
In an embodiment of the present invention, the first video data is video data in a recording process or recorded video data.
The following describes a video processing method in detail with respect to video data during recording or recorded video data.
In an embodiment of the present invention, in a case that the first video data is video data in a recording process (as shown in fig. 3), the video processing method includes:
the first step is as follows: respectively acquiring first image data and first audio data of video data in a recording process by using a camera and a microphone;
the second step is as follows: performing image segmentation on each frame of image (i.e., ith frame) I In the first image data by applying INet, wherein the pixel of a segmented subject object is In, and the background pixel is Ib (i.e., the pixel of a non-subject object);
the third step: and performing blurring or color-reserving processing on the separated pixels of the non-main object to obtain second image data.
Blurring is to apply gaussian filtering to pixels of the non-subject object for blurring: ib ═ gaussblu (Ib, alpha), color retention means that the original color is retained for the pixels of the subject object, and the pixels of the non-subject object are grayed: ib ═ Gray (Ib, alpha). Wherein alpha is an adjusting parameter, and for video data in the recording process, blurring or color retention parameters can be adjusted in real time to change the final effect. The second image data is denoted by I'.
The fourth step: applying An ANet to each frame of audio (i.e., the ith frame) a in the first audio data to separate audio data matched with a corresponding subject object, assuming that An is audio data matched with the subject object, Ab is background audio data (i.e., audio data of a non-subject object), and the original overall audio data is the superposition of the two, i.e., a + Ab;
the fifth step: performing suppression attenuation on the audio data of the non-subject object to obtain second audio data;
where Ab' is beta Ab, where beta is an attenuation coefficient between 0 and 1, and beta is 0 if completely suppressed. Beta parameters in the video data during recording can also be adjusted in real time. The overall audio data after the suppression processing is a '═ An + Ab' ═ An + beta × Ab.
A sixth step: respectively encoding and compressing the second image data P 'and the second audio data A';
a seventh step of: and transmitting the focused compressed video and compressed audio real-time stream through the network, wherein the real-time stream is video data in the recording process, such as live video data.
The invention effectively combines image segmentation and audio data separation, and realizes the focusing of two dimensions of images and audio in video data in the recording process.
In an embodiment of the present invention, in a case that the first video data is recorded video data (as shown in fig. 4), the video processing method includes:
step 1: decoding and separating the recorded video data to respectively obtain first image data and first audio data;
step 2: performing image segmentation on each frame of image (i.e., ith frame) I In the first image data by using INet, wherein a pixel of a segmented main object is In, and a background pixel is Ib;
step 3: blurring or color-reserving the separated pixels of the non-main object to obtain second image data;
blurring is to apply gaussian filtering to pixels of the non-subject object for blurring: ib ═ gaussblu (Ib, alpha), color retention means that the original color is retained for the pixels of the subject object, and the pixels of the non-subject object are grayed: ib ═ Gray (Ib, alpha). Unlike video data during recording, alpha can only be preset once and cannot be modified once the recorded video data is generated. The second image data is denoted by I';
step 4: applying An ANet to each frame of audio (i.e., ith frame) a in the first audio data for segmentation, assuming that An is audio data matched with a subject object, Ab is background audio data (i.e., audio data of a non-subject object), and the original overall audio data is a superposition of the two, i.e., An + Ab;
step 5: performing suppression attenuation on the audio data of the non-subject object to obtain second video data;
the inhibition attenuation is Ab' ═ beta × Ab, where beta is the attenuation coefficient between 0 and 1, and beta ═ 0 if completely inhibited. The total audio data after the attenuation suppression processing is A '═ An + Ab' ═ An + beta Ab;
step 6: and recompressing and coding the second image data I 'and the second audio data A' to be stored as second video data, wherein the second video data can be shared on the network as short video.
The invention effectively combines the AI image segmentation and the AI audio data separation, and realizes the focusing processing of two dimensions of the image and the audio in the recorded video data.
In the embodiment of the present invention, the subject object is not limited to a person, an animal, a cartoon character, a cartoon animal, and the like.
Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, the electronic device 50 includes:
an obtaining module 501, configured to obtain first image data and first audio data in first video data;
a first focusing module 502, configured to perform focusing processing on pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
the second focusing module 503 is configured to perform focusing processing on the audio data matched with the at least one main object in the first audio data through a preset voice separation network to obtain at least one second audio data;
the encoding module 504 is configured to perform encoding and compression processing on the second image data and the second audio data to obtain second video data.
In the embodiment of the invention, through a preset object separation network, focusing processing is carried out on pixels of at least one main object in first image data of electronic equipment to obtain at least one second image data; and performing focusing processing on the audio data matched with at least one main body object in the first audio data of the electronic equipment through a preset voice separation network to obtain at least one second audio data, thereby realizing the focusing processing on the image data and the audio data of each main body object.
Optionally, the first focusing module 502 is further configured to:
identifying pixels of a non-subject object from the first image data based on pixels of the at least one subject object;
and performing Gaussian filtering processing on the pixels of the non-main object based on a preset Gaussian filtering processing coefficient, or performing gray processing on the pixels of the non-main object based on a preset gray processing coefficient.
According to the embodiment of the invention, the pixels of the non-main object are subjected to Gaussian filtering processing or graying processing, so that the focusing processing of the pixels of the main object is realized.
Optionally, the electronic device further includes:
the acquisition module is also used for acquiring the image brightness of the non-subject object;
and the adjusting module is used for adjusting a preset Gaussian filter processing coefficient or a preset gray processing coefficient according to the image brightness.
According to the embodiment of the invention, the preset Gaussian filter processing coefficient or the preset gray processing coefficient can be dynamically adjusted and adjusted more flexibly based on the image brightness of the pixels of the non-main object, so that the focusing effect is dynamically adjusted.
Optionally, the second focusing module 503 is further configured to:
identifying audio data of a non-subject object from the first audio data based on the audio data of the at least one subject object;
and carrying out attenuation processing on the audio data of the non-subject object based on a preset attenuation coefficient.
According to the embodiment of the invention, the audio data of the non-main object is subjected to attenuation processing, so that the focusing processing of the audio data of the main object is realized.
Optionally, the second focusing module 503 is further configured to:
and replacing the audio data of the subject object with preset audio data.
In the embodiment of the present invention, the manner of focusing on the audio data of the main object is not limited to the manner of suppressing attenuation, replacing by preset audio data, and the like, for example, a virtual sound, as long as the audio data of the main object can be highlighted compared with the audio data of the non-main object, which can be covered in the protection scope of the embodiment of the present invention, and is not described herein again.
Optionally, the electronic device further includes:
a determination module for determining a target pixel of each subject object based on a selection input of at least one subject object by a user;
the determining module is further used for determining target audio data matched with each subject object;
an establishing module for establishing a mapping relation between the target pixel of each subject object and the target audio data matched with each subject object
And the storage module is used for storing the mapping relation into the second video data.
According to the embodiment of the invention, the pixel and audio data of each main object can be focused by establishing the mapping relation between the pixel and audio data of each main object.
Optionally, the determining module is further configured to:
screening target audio data matched with the audio characteristics of each subject object from at least one pre-stored audio data;
alternatively, the audio data selected by the user is determined as the target audio data matched with each subject object.
According to the embodiment of the invention, the audio data of the main object is matched, so that the pixel of each main object can be associated with the audio data, the pixel and the audio data of the main object can be conveniently processed simultaneously in the follow-up process, the visual effect of a user can be further enriched in the follow-up playing process, and the user experience degree is improved.
Optionally, the electronic device further includes:
a processing module configured to perform gaussian filtering processing or grayscale processing on target pixels of at least two subject objects based on a first weight value in a case where at least two subject objects of the at least one subject object are selected;
the playing module is used for playing the second image data which is subjected to Gaussian filtering processing or gray level processing;
the preset Gaussian filter processing coefficient or the preset gray processing coefficient of each main body object in the at least one main body object corresponds to different first weight values.
According to the embodiment of the invention, the importance parameters can be introduced for different main objects, so that the Gaussian filtering processing or the gray level processing of different degrees can be conveniently carried out on the pixels of different main objects, the visual experience of a user is enriched, and the video interestingness is enhanced.
Optionally, the electronic device further includes: the processing module is used for performing sound mixing processing on target audio data of at least two main body objects based on a second weight value under the condition that at least two main body objects in the at least one main body object are selected; the playing module is used for playing the second audio data subjected to the audio mixing processing; and the preset attenuation coefficient of each main body object in the at least one main body object corresponds to a different second weight value.
According to the embodiment of the invention, the importance parameters can be introduced for different subject objects, so that different degrees of attenuation processing can be conveniently carried out on the audio data matched with different subject objects, the visual experience of a user is enriched, and the video interestingness is enhanced.
Optionally, the first video data is video data in a recording process or recorded video data.
Optionally, the subject object separation network includes: mask R-CNN, and/or, a voice separation network comprising: long and short term memory networks LSTM.
The electronic device provided in the embodiment of the present invention can implement each process implemented by the electronic device in the method embodiment of fig. 1, and is not described herein again to avoid repetition.
In the embodiment of the invention, through a preset object separation network, focusing processing is carried out on pixels of at least one main object in first image data of electronic equipment to obtain at least one second image data; and performing focusing processing on the audio data matched with at least one main body object in the first audio data of the electronic equipment through a preset voice separation network to obtain at least one second audio data, so that focusing processing on the image and the audio in the first video data can be realized.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device 100 for implementing various embodiments of the present invention, where the electronic device 100 includes, but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 6 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, an electronic device, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.
A processor 110, configured to perform focusing processing on pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
performing focusing processing on audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
and carrying out coding compression processing on the second image data and the second audio data to obtain second video data.
In the embodiment of the invention, through a preset object separation network, focusing processing is carried out on pixels of at least one main object in first image data of electronic equipment to obtain at least one second image data; and performing focusing processing on the audio data matched with at least one main body object in the first audio data of the electronic equipment through a preset voice separation network to obtain at least one second audio data, thereby realizing the focusing processing on the image data and the audio data of each main body object.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 101 may be used for receiving and sending signals during a message transmission or call process, and specifically, after receiving downlink data from a base station, the downlink data is processed by the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through a wireless communication system.
The electronic device provides wireless broadband internet access to the user via the network module 102, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.
The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the network module 102 or stored in the memory 109 into an audio signal and output as sound. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the electronic apparatus 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 includes a speaker, a buzzer, a receiver, and the like.
The input unit 104 is used to receive an audio or video signal. The input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics processor 1041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the network module 102. The microphone 1042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode.
The electronic device 100 also includes at least one sensor 105, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or the backlight when the electronic device 100 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 105 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.
The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 107 includes a touch panel 1071 and other input devices 1072. Touch panel 1071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 1071 (e.g., operations by a user on or near touch panel 1071 using a finger, stylus, or any suitable object or attachment). The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and receives and executes commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Specifically, other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
Further, the touch panel 1071 may be overlaid on the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although in fig. 6, the touch panel 1071 and the display panel 1061 are two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the electronic device, and is not limited herein.
The interface unit 108 is an interface for connecting an external device to the electronic apparatus 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 100 or may be used to transmit data between the electronic apparatus 100 and the external device.
The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the electronic device, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 110 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the electronic device. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The electronic device 100 may further include a power source 111 (such as a battery) for supplying power to each component, and preferably, the power source 111 may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
In addition, the electronic device 100 includes some functional modules that are not shown, and are not described in detail herein.
Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor 110, a memory 109, and a computer program stored in the memory 109 and capable of running on the processor 110, where the computer program, when executed by the processor 110, implements each process of the above-mentioned video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.
An embodiment of the present invention further provides an electronic device, including:
a touch screen, wherein the touch screen comprises a touch sensitive surface and a display screen;
one or more processors 110;
one or more memories 109;
one or more sensors;
and one or more computer programs, where the one or more computer programs are stored in the one or more memories, where the one or more computer programs include instructions, and when the instructions are executed by the electronic device, the electronic device is enabled to execute each process for implementing the video processing method embodiment, and the same technical effects can be achieved, and in order to avoid repetition, details are not repeated here.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiment of the present invention further provides a computer non-transitory storage medium, where a computer program is stored in the computer non-transitory storage medium, and when the computer program is executed by a computing device, the computer program implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The embodiment of the present invention further provides a computer program product, when the computer program product runs on a computer, the computer when executing the computer realizes each process of the video processing method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (e.g., an electronic device, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (14)

1. A video processing method, comprising:
acquiring first image data and first audio data of first video data;
focusing pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
performing focusing processing on audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
and carrying out coding compression processing on the second image data and the second audio data to obtain second video data.
2. The method of claim 1, wherein the performing the focus processing on the pixels of the at least one subject object in the first image data comprises:
identifying pixels of a non-subject object from the first image data based on the pixels of the at least one subject object;
and performing Gaussian filtering processing on the pixels of the non-main object based on a preset Gaussian filtering processing coefficient, or performing gray processing on the pixels of the non-main object based on a preset gray processing coefficient.
3. The method of claim 2, wherein after identifying pixels of non-subject objects from the first image data, further comprising:
acquiring the image brightness of the non-subject object;
and adjusting the preset Gaussian filter processing coefficient or the preset gray processing coefficient according to the image brightness.
4. The method of claim 1, wherein the performing the focus processing on the audio data of the at least one subject object in the first audio data comprises:
identifying audio data of a non-subject object from the first audio data based on the audio data of the at least one subject object;
and carrying out attenuation processing on the audio data of the non-subject object based on a preset attenuation coefficient.
5. The method of claim 1, wherein the performing the focus processing on the audio data of the at least one subject object comprises:
and replacing the audio data of the main body object with preset audio data.
6. The method of claim 1, wherein after the obtaining the first image data and the first audio data of the first video data, the method further comprises:
determining a target pixel of each subject object based on a selection input of the at least one subject object by a user;
determining target audio data matched with each subject object;
and establishing a mapping relation between the target pixel of each subject object and the target audio data matched with each subject object, and storing the mapping relation into the second video data.
7. The method of claim 6, wherein determining the target audio data that matches each subject object comprises:
screening target audio data matched with the audio characteristics of each subject object from at least one pre-stored audio data;
alternatively, the audio data selected by the user is determined as the target audio data matched with each subject object.
8. The method according to claim 2, wherein after the focusing processing is performed on the pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data, the method further comprises:
performing Gaussian filtering processing or gray scale processing on target pixels of at least two subject objects based on a first weight value in a case where at least two subject objects of the at least one subject object are selected;
playing the second image data which is subjected to Gaussian filtering processing or gray level processing;
the preset Gaussian filter processing coefficient or the preset gray processing coefficient of each of the at least one main object corresponds to a different first weight value.
9. The method according to claim 4, wherein after the audio data matched with the at least one subject object in the first audio data is focused through a preset voice separation network to obtain at least one second audio data, the method further comprises:
performing mixing processing on target audio data of at least two subject objects based on a second weight value in a case where at least two subject objects of the at least one subject object are selected;
playing the second audio data subjected to sound mixing processing;
wherein the preset attenuation coefficient of each of the at least one subject object corresponds to a different second weight value.
10. The method of claim 1, wherein the first video data is in-process video data or recorded video data.
11. The method of claim 1, wherein the subject object separation network comprises: mask R-CNN.
12. The method of claim 1, wherein the voice separation network comprises: long and short term memory networks LSTM.
13. An electronic device, comprising:
the acquisition module is used for acquiring first image data and first audio data of the first video data;
the first focusing module is used for focusing the pixels of at least one main object in the first image data through a preset object separation network to obtain at least one second image data;
the second focusing module is used for carrying out focusing processing on the audio data matched with at least one main body object in the first audio data through a preset voice separation network to obtain at least one second audio data;
and the coding module is used for coding and compressing the second image data and the second audio data to obtain second video data.
14. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the video processing method according to any one of claims 1 to 12.
CN201910803481.2A 2019-08-28 2019-08-28 Video processing method and electronic equipment Pending CN110602424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910803481.2A CN110602424A (en) 2019-08-28 2019-08-28 Video processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910803481.2A CN110602424A (en) 2019-08-28 2019-08-28 Video processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN110602424A true CN110602424A (en) 2019-12-20

Family

ID=68856197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910803481.2A Pending CN110602424A (en) 2019-08-28 2019-08-28 Video processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN110602424A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112165591A (en) * 2020-09-30 2021-01-01 联想(北京)有限公司 Audio data processing method and device and electronic equipment
CN112235715A (en) * 2020-10-15 2021-01-15 中国电子科技集团公司第五十四研究所 Real-time route planning multifunctional terminal in unknown environment
CN112423081A (en) * 2020-11-09 2021-02-26 腾讯科技(深圳)有限公司 Video data processing method, device and equipment and readable storage medium
CN112584225A (en) * 2020-12-03 2021-03-30 维沃移动通信有限公司 Video recording processing method, video playing control method and electronic equipment
WO2021180046A1 (en) * 2020-03-13 2021-09-16 华为技术有限公司 Image color retention method and device

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007266967A (en) * 2006-03-28 2007-10-11 Yamaha Corp Sound image localizer and multichannel audio reproduction device
CN101132839A (en) * 2005-05-05 2008-02-27 索尼计算机娱乐公司 Selective sound source listening in conjunction with computer interactive processing
CN101563698A (en) * 2005-09-16 2009-10-21 富利克索尔股份有限公司 Personalizing a video
WO2012142323A1 (en) * 2011-04-12 2012-10-18 Captimo, Inc. Method and system for gesture based searching
CN103516894A (en) * 2012-06-25 2014-01-15 Lg电子株式会社 Mobile terminal and audio zooming method thereof
CN105075237A (en) * 2013-02-28 2015-11-18 索尼公司 Image processing apparatus, image processing method, and program
WO2016082199A1 (en) * 2014-11-28 2016-06-02 华为技术有限公司 Method for recording sound of image-recorded object and mobile terminal
US20160224545A1 (en) * 2007-12-20 2016-08-04 Porto Technology, Llc System And Method For Generating Dynamically Filtered Content Results, Including For Audio And/Or Video Channels
CN105989845A (en) * 2015-02-25 2016-10-05 杜比实验室特许公司 Video content assisted audio object extraction
CN107230187A (en) * 2016-03-25 2017-10-03 北京三星通信技术研究有限公司 The method and apparatus of multimedia signal processing
US20170289495A1 (en) * 2014-09-12 2017-10-05 International Business Machines Corporation Sound source selection for aural interest
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108369816A (en) * 2015-11-11 2018-08-03 微软技术许可有限责任公司 For the device and method from omnidirectional's video creation video clipping
CN109313904A (en) * 2016-05-30 2019-02-05 索尼公司 Video/audio processing equipment, video/audio processing method and program
CN109983786A (en) * 2016-11-25 2019-07-05 索尼公司 Transcriber, reproducting method, information processing unit, information processing method and program
CN110648612A (en) * 2018-06-26 2020-01-03 乐金显示有限公司 Display device

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101132839A (en) * 2005-05-05 2008-02-27 索尼计算机娱乐公司 Selective sound source listening in conjunction with computer interactive processing
CN101563698A (en) * 2005-09-16 2009-10-21 富利克索尔股份有限公司 Personalizing a video
JP2007266967A (en) * 2006-03-28 2007-10-11 Yamaha Corp Sound image localizer and multichannel audio reproduction device
US20160224545A1 (en) * 2007-12-20 2016-08-04 Porto Technology, Llc System And Method For Generating Dynamically Filtered Content Results, Including For Audio And/Or Video Channels
WO2012142323A1 (en) * 2011-04-12 2012-10-18 Captimo, Inc. Method and system for gesture based searching
CN103516894A (en) * 2012-06-25 2014-01-15 Lg电子株式会社 Mobile terminal and audio zooming method thereof
CN105075237A (en) * 2013-02-28 2015-11-18 索尼公司 Image processing apparatus, image processing method, and program
US20170289495A1 (en) * 2014-09-12 2017-10-05 International Business Machines Corporation Sound source selection for aural interest
WO2016082199A1 (en) * 2014-11-28 2016-06-02 华为技术有限公司 Method for recording sound of image-recorded object and mobile terminal
CN105989845A (en) * 2015-02-25 2016-10-05 杜比实验室特许公司 Video content assisted audio object extraction
CN108369816A (en) * 2015-11-11 2018-08-03 微软技术许可有限责任公司 For the device and method from omnidirectional's video creation video clipping
CN107230187A (en) * 2016-03-25 2017-10-03 北京三星通信技术研究有限公司 The method and apparatus of multimedia signal processing
CN109313904A (en) * 2016-05-30 2019-02-05 索尼公司 Video/audio processing equipment, video/audio processing method and program
CN109983786A (en) * 2016-11-25 2019-07-05 索尼公司 Transcriber, reproducting method, information processing unit, information processing method and program
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN110648612A (en) * 2018-06-26 2020-01-03 乐金显示有限公司 Display device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021180046A1 (en) * 2020-03-13 2021-09-16 华为技术有限公司 Image color retention method and device
EP4109879A4 (en) * 2020-03-13 2023-10-04 Huawei Technologies Co., Ltd. Image color retention method and device
CN112165591A (en) * 2020-09-30 2021-01-01 联想(北京)有限公司 Audio data processing method and device and electronic equipment
CN112165591B (en) * 2020-09-30 2022-05-31 联想(北京)有限公司 Audio data processing method and device and electronic equipment
CN112235715A (en) * 2020-10-15 2021-01-15 中国电子科技集团公司第五十四研究所 Real-time route planning multifunctional terminal in unknown environment
CN112423081A (en) * 2020-11-09 2021-02-26 腾讯科技(深圳)有限公司 Video data processing method, device and equipment and readable storage medium
CN112584225A (en) * 2020-12-03 2021-03-30 维沃移动通信有限公司 Video recording processing method, video playing control method and electronic equipment

Similar Documents

Publication Publication Date Title
CN107817939B (en) Image processing method and mobile terminal
CN110602424A (en) Video processing method and electronic equipment
CN107566739B (en) photographing method and mobile terminal
CN109688322B (en) Method and device for generating high dynamic range image and mobile terminal
CN110365907B (en) Photographing method and device and electronic equipment
CN110781899B (en) Image processing method and electronic device
CN111405199B (en) Image shooting method and electronic equipment
CN107730460B (en) Image processing method and mobile terminal
CN110012143B (en) Telephone receiver control method and terminal
CN108881782B (en) Video call method and terminal equipment
CN109727212B (en) Image processing method and mobile terminal
CN109474784B (en) Preview image processing method and terminal equipment
CN109005314B (en) Image processing method and terminal
CN110769186A (en) Video call method, first electronic device and second electronic device
CN111182118B (en) Volume adjusting method and electronic equipment
CN111182211B (en) Shooting method, image processing method and electronic equipment
CN109639981B (en) Image shooting method and mobile terminal
CN107798662B (en) Image processing method and mobile terminal
CN111246053B (en) Image processing method and electronic device
CN110930372B (en) Image processing method, electronic equipment and computer readable storage medium
CN110443752B (en) Image processing method and mobile terminal
CN109819331B (en) Video call method, device and mobile terminal
CN108259808B (en) Video frame compression method and mobile terminal
CN111475238A (en) Page processing method and device, electronic equipment and storage medium
CN111402157A (en) Image processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220