WO2022161146A1 - 视频录制方法及电子设备 - Google Patents

视频录制方法及电子设备 Download PDF

Info

Publication number
WO2022161146A1
WO2022161146A1 PCT/CN2022/071129 CN2022071129W WO2022161146A1 WO 2022161146 A1 WO2022161146 A1 WO 2022161146A1 CN 2022071129 W CN2022071129 W CN 2022071129W WO 2022161146 A1 WO2022161146 A1 WO 2022161146A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
electronic device
volume
shooting
shooting picture
Prior art date
Application number
PCT/CN2022/071129
Other languages
English (en)
French (fr)
Inventor
孙玥
熊奇
李军
张宇倩
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US18/263,376 priority Critical patent/US20240111478A1/en
Priority to EP22745024.4A priority patent/EP4270937A4/en
Publication of WO2022161146A1 publication Critical patent/WO2022161146A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • H04N5/772Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera the recording apparatus and the television camera being placed in the same enclosure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/633Control of cameras or camera modules by using electronic viewfinders for displaying additional information relating to control or operation of the camera
    • H04N23/635Region indicators; Field of view indicators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/802Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving processing of the sound signal

Definitions

  • the present application relates to the field of terminals, and in particular, to a video recording method and electronic device.
  • the volume of the sound information recorded by the terminal device is usually positively correlated with the volume of the sound information collected by the microphone of the terminal device.
  • the user cannot flexibly adjust the recording sound during the recording process, and the volume adjustment flexibility of the terminal device when recording video information is poor, resulting in poor audio-visual effects.
  • Embodiments of the present application provide a video recording method and an electronic device, which can solve the problem of poor flexibility of the volume of the video information recorded by the electronic device, thereby improving the audiovisual effect of the played video information.
  • an embodiment of the present application provides a video recording method, which is applied to a first electronic device.
  • the method provided by the first aspect includes: the first electronic device records a first shooting picture in response to a user's first operation on a preview interface of an application, and records audio corresponding to the first shooting picture at a first volume.
  • the first electronic device in response to the user's zoom-in operation on the first shooting screen, captures the second shooting screen and the audio corresponding to the second shooting screen.
  • the first shooting picture and the second shooting picture are continuous.
  • the first electronic device records the second shooting picture, and records the audio corresponding to the second shooting picture at the second volume.
  • the second volume is greater than the first volume, or the sound amplification ratio corresponding to the second volume is greater than the sound amplification ratio of the first volume, and the sound amplification ratio refers to the multiplication ratio of the volume output by the first electronic device and the collected volume.
  • the second shooting picture and the audio corresponding to the second shooting picture are obtained by the user after a zoom-in operation of the first shooting picture. While recording the second shooting picture, the audio corresponding to the second shooting picture is recorded at the second volume. In this way, when playing the recorded video information, the user can visually feel that the second shot is closer to him, and at the same time, it can also make the user hear the second shot. audio-visual effects of the video information.
  • the audio corresponding to the first shooting image includes the audio of the first shooting object
  • the first electronic device collects the second shooting image and the second shooting image in response to the user's zoom-in operation on the first shooting image. before the corresponding audio.
  • the method provided by the first aspect further includes: establishing, by the first electronic device, an image-audio association relationship of the first photographed object.
  • the first electronic device records the audio of the first photographing object as the audio to be adjusted in response to the second operation of the user on the first photographing screen.
  • the first electronic device recording the audio corresponding to the second shooting picture with the second volume includes: the first electronic device recording the audio corresponding to the first shooting object in the second shooting picture with the second sound volume. In this way, the first electronic device can record the audio corresponding to the selected first photographing object at the second volume, which is more flexible.
  • the audio corresponding to the first shooting picture includes the audio of the second shooting object
  • the method provided by the first aspect further includes: the first electronic device establishes an image-audio association relationship of the second shooting object.
  • the first electronic device records the audio corresponding to the second shooting picture with the second volume, further comprising: the first electronic device records the audio corresponding to the second shooting object in the second shooting picture with the first volume or the sound amplification rate corresponding to the first volume .
  • the first electronic device can only record the audio corresponding to the selected first photographing object at the second volume, which is more flexible.
  • the audio corresponding to the first shooting picture further includes the audio of the third shooting object.
  • the method provided by the first aspect further includes: the first electronic device creates a third shot object image-audio association.
  • the first electronic device records the audio of the third photographing object as the audio to be adjusted in response to the third operation of the user on the first photographing screen.
  • the first electronic device records the audio corresponding to the second shooting picture with the second volume, further comprising: the first electronic device records the audio corresponding to the third shooting object in the second shooting picture with the second volume or the sound amplification rate corresponding to the second volume . In this way, the first electronic device can also record the audio corresponding to the selected third photographic object at the second volume, and the flexibility is further higher.
  • the audio corresponding to the first shooting picture includes the audio of the second shooting object.
  • the method provided by the first aspect further includes: the first electronic device establishes an image-audio association relationship of the second photographed object.
  • the first electronic device records the audio corresponding to the second shooting picture at the second volume or the sound amplification rate corresponding to the second volume, and further includes: shielding the audio associated with the image of the second shooting object in the second shooting picture.
  • the first electronic device plays the audio of the first shooting object at a third volume higher than the first volume; the audio of the second shooting object is not played, thereby Interference with the audio of the first subject is reduced.
  • establishing an image-audio association relationship of the first shooting object by the first electronic device includes: the first electronic device extracting a first facial feature of the first shooting object and the first sound feature vector of the audio.
  • the first electronic device determines the first pronunciation feature corresponding to the lip shape according to the lip shape of the first facial feature of the first photographed object.
  • the first electronic device extracts the second pronunciation feature of the first sound feature vector. If the similarity between the first pronunciation feature and the second pronunciation feature is greater than the similarity threshold, the first electronic device establishes an association relationship between the first facial feature and the first voice feature vector.
  • the method provided by the first aspect further includes: when the first electronic device responds to the first operation by the user on the preview interface, respectively communicating with the second electronic device and the third electronic device.
  • the first electronic device establishes an image-audio association relationship of the first photographed object, and establishes an image-audio association relationship of the second photographed object, including: the first electronic device extracts the first facial feature of the first photographed object and the audio frequency a first sound feature vector, and extracting the second facial feature of the second photographed object and the second sound feature vector of the audio.
  • the first electronic device sends the first face feature, the first voice feature vector, the second face feature, and the second voice feature vector to the second electronic device and the third electronic device.
  • the first electronic device receives the association relationship between the first face feature and the first voice feature vector from the second electronic device, and receives the association relationship between the second face feature and the second voice feature vector from the third electronic device.
  • collecting the first electronic device to collect the second shooting picture and the audio corresponding to the second shooting picture includes: the first electronic device detecting the first transmission of the audio of the first shooting object in the second shooting picture. direction and a second propagation direction of the audio of the second subject.
  • the first electronic device directionally enhances the audio of the first object in the second shot in the first propagation direction, and directionally suppresses the audio of the second object in the second shot in the second propagation direction.
  • the played audio of the first shot is further clearer; furthermore, The audio of the second photographing object is collected by the array microphone of the first electronic device with directional suppression, and the interference to the audio of the second photographing object is small, which further improves the audio-visual effect of the user enjoying the video information.
  • the first electronic device determines the second volume according to the first shot, the second shot, and the first volume.
  • determining the second volume by the first electronic device according to the first shooting picture, the second shooting picture and the first volume includes: the first electronic device determining the second volume according to the first volume and the zoom factor.
  • F1 is the first focal length corresponding to the first shooting picture
  • F2 is the second focal length corresponding to the second shooting picture
  • V is the first volume
  • V' is the second volume.
  • determining the second volume by the first electronic device according to the first shooting picture, the second shooting picture and the first volume includes: the first electronic device determining the second volume according to the first volume and the size enlargement ratio.
  • x1 ⁇ y1 is the first display size of the first shooting object in the first shooting picture
  • x1′ ⁇ y1′ is the third display size of the first shooting object in the second shooting picture, Scale up for size.
  • the first electronic device determines the second volume according to the first shooting picture, the second shooting picture and the first volume, including: the first electronic device determines the first volume according to the first volume, the size enlargement ratio and the zoom factor.
  • Two volume wherein, F1 is the first focal length corresponding to the first shooting picture, F2 is the second focal length corresponding to the second shooting picture, is the zoom multiplier, x1 ⁇ y1 is the first display size of the first object in the first shot, x1′ ⁇ y1′ is the third display size of the first object in the second shot, and V is the first volume, V' is the second volume, Scale up for size.
  • the first electronic device is in a headphone mode. After the first electronic device collects the second shooting picture and the audio corresponding to the second shooting picture, the method provided by the first aspect further includes: the first electronic device displays the second shooting picture on the preview interface of the application, and uses the recorded video The volume outputs the audio corresponding to the second shooting image to the earphone for playback.
  • the first electronic device can play the recorded video information while recording the video information, and when the user records and broadcasts the video information, the second shot in the video information can match the audio played at the recorded volume, and the user The audio-visual effect is better.
  • the first electronic device is not in the headphone mode.
  • the method provided by the first aspect further includes:
  • the first electronic device In response to the user's stop operation on the preview interface, the first electronic device generates a video file based on the recorded second shooting picture and the audio corresponding to the second shooting picture. In response to the user's operation of opening the video file, the first electronic device displays the second shooting picture on the preview interface of the application, and plays the audio and video information corresponding to the second shooting picture on the speaker of the first electronic device at the recorded volume.
  • the second shot picture in can match the audio played at the recorded volume, and the user's audio-visual effect is better.
  • an embodiment of the present application provides a video recording method, which is applied to a first electronic device.
  • the method provided in the second aspect includes: the first electronic device records a video recording in response to a user's first operation on a preview interface of an application. A picture is shot, and audio corresponding to the first shot picture is recorded at a first volume.
  • the first electronic device in response to the user's zoom-out operation on the first shooting screen, captures the second shooting screen and the audio corresponding to the second shooting screen. Wherein, the first shooting picture and the second shooting picture are continuous.
  • the first electronic device records the second shooting picture, and records the audio corresponding to the second shooting picture at the second volume.
  • the second volume is smaller than the first volume, or the sound amplification rate corresponding to the second volume is smaller than the sound amplification rate of the first volume, and the sound amplification rate refers to the multiplication ratio of the volume output by the first electronic device and the collected volume.
  • the audio corresponding to the first shooting picture includes the audio of the first shooting object
  • the first electronic device collects the second shooting picture and the second shooting picture in response to the user's zoom-out operation on the first shooting picture.
  • the method provided in the second aspect further includes: the first electronic device establishes an image-audio association relationship of the first photographed object.
  • the first electronic device records the audio of the first photographing object as the audio to be adjusted in response to the second operation of the user on the first photographing screen.
  • the first electronic device recording the audio corresponding to the second shooting picture with the second volume includes: the first electronic device recording the audio corresponding to the first shooting object in the second shooting picture with the second volume or a sound amplification rate corresponding to the second volume.
  • the audio corresponding to the first shooting picture includes the audio of the second shooting object.
  • the method provided by the second aspect further includes: establishing, by the first electronic device, an image-audio association relationship of the second photographed object.
  • the first electronic device records the audio corresponding to the second shooting picture with the second volume, further comprising: the first electronic device records the audio corresponding to the second shooting object in the second shooting picture with the first volume or the sound amplification rate corresponding to the first volume .
  • the method provided by the second aspect further includes: when the first electronic device responds to the first operation of the user on the preview interface, respectively communicating with the second electronic device and the third electronic device.
  • the first electronic device establishes an image-audio association relationship of the first photographed object, and establishes an image-audio association relationship of the second photographed object, including: the first electronic device extracts the first facial feature of the first photographed object and the audio frequency a first sound feature vector, and extracting the second facial feature of the second photographed object and the second sound feature vector of the audio.
  • the first electronic device sends the first face feature, the first voice feature vector, the second face feature, and the second voice feature vector to the second electronic device and the third electronic device.
  • the first electronic device receives the association relationship between the first face feature and the first voice feature vector from the second electronic device, and receives the association relationship between the second face feature and the second voice feature vector from the third electronic device.
  • determining the second volume by the first electronic device according to the first shooting picture, the second shooting picture and the first volume includes: the first electronic device determining the second volume according to the first volume and the zoom factor. Or, the first electronic device determines the second volume according to the first volume and the size enlargement ratio. Or, the first electronic device determines the second volume according to the first volume, the size enlargement ratio and the zoom factor.
  • F1 is the first focal length corresponding to the first shooting picture
  • F2 is the second focal length corresponding to the second shooting picture
  • x1 ⁇ y1 is the first display size of the first object in the first shot
  • x1′ ⁇ y1′ is the third display size of the first object in the second shot
  • V is the first volume
  • V' is the second volume, for the size reduction factor.
  • the present application further provides an electronic device, comprising: a memory; one or more processors; and one or more computer programs.
  • One or more computer programs are stored on the memory, and when the computer programs are executed by one or more processors, cause the electronic device to perform the video recording method performed by the first electronic device in the first or second aspect of the present application.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium includes a computer program or an instruction.
  • the computer program or instruction When the computer program or instruction is run on a computer, the computer can execute the first aspect or the second aspect of the present application.
  • the present application also provides a computer program product, the computer program product comprising: a computer program or instruction, when the computer program or instruction is run on a computer, the computer executes as provided in the first aspect or the second aspect of the present application. video recording method.
  • the electronic device provided in the third aspect, the computer-readable storage medium provided in the fourth aspect, and the computer program product provided in the fifth aspect are all used to execute the corresponding methods provided above.
  • the beneficial effects that can be achieved reference may be made to the beneficial effects in the corresponding methods provided above, which will not be repeated here.
  • FIG. 1 is a schematic diagram of a concert scene provided by an embodiment of the present application
  • FIG. 1 is a schematic diagram 1 of a first shooting screen provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a mobile phone provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a video recording method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram 1 of inputting a zoom-in operation in the first shooting screen provided by an embodiment of the present application;
  • FIG. 4 is a schematic diagram 2 of the first shooting screen provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of the principle of sound propagation provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram 1 of inputting a zoom-out operation in the first shooting screen provided by an embodiment of the present application;
  • FIG. 6 is a schematic diagram 3 of the first shooting screen provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of the principle of identifying the facial features in the first shooting picture 203A according to the YOLO model provided by the embodiment of the present application;
  • FIG. 8 is a schematic structural diagram of an audiovisual recognition model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of inputting a selection operation on the image b1 of the performer B in the first shooting screen provided by an embodiment of the present application;
  • FIG. 9 is a schematic diagram of clicking the button on the side of the image b1 of the performer B in FIG. 11;
  • FIG. 10 is a schematic diagram of inputting the selection operation of the image c1 of the performer C in the first shooting screen provided by the embodiment of the present application;
  • FIG. 10 is a schematic diagram of clicking the button on the side of the image c1 of the performer C in FIG. 10;
  • FIG. 11 is a schematic diagram 1 of interaction of a distributed system provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a conference scene provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram 3 of a first shooting screen provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of clicking a prompt button in the first shooting screen provided by an embodiment of the present application.
  • FIG. 15 is a second schematic diagram of interaction of a distributed system provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram three of interaction of a distributed system provided by an embodiment of the present application.
  • FIG. 17 is a schematic diagram of an input selection operation for the image b1 in the first shooting screen provided by an embodiment of the present application;
  • FIG. 17 is a schematic diagram of clicking the button on the side of the image b1 of the performer B in (a) in FIG. 17 ;
  • FIG. 18 is a schematic diagram 2 of inputting a zoom-in operation in the first shooting screen provided by an embodiment of the present application;
  • FIG. 18 is a schematic diagram 4 of the first shooting screen provided by the embodiment of the present application.
  • FIG. 19 is a schematic diagram 1 of inputting a zoom-out operation in the first shooting screen provided by an embodiment of the present application;
  • FIG. 19 is a schematic diagram of a third shooting screen provided by an embodiment of the present application.
  • A/B generally indicates that the related objects before and after are an “or” relationship.
  • A/B can be understood as A or B.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature. In the description of this embodiment, unless otherwise specified, "plurality" means two or more.
  • references to the terms “comprising” and “having” in the description of this application, and any variations thereof, are intended to cover non-exclusive inclusion.
  • a process, method, system, product or device comprising a series of steps or modules is not limited to the listed steps or modules, but may optionally also include other unlisted steps or modules, or optionally also Other steps or modules inherent to these processes, methods, products or devices are included.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiment or design described in this application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, use of words such as “exemplary” or “such as” is intended to present concepts in a specific manner.
  • FIG. 1 Taking a concert scene as an example, as shown in (a) of FIG. 1 , when user A goes to the studio to listen to the concert, performers B and C can be seen singing in the studio. If the user A is interested in the performance in which the performers B and C are singing, the user A can open the camera APP of the mobile phone 300 . Furthermore, as shown in (b) of FIG. 1 , the mobile phone 300 can display the preview interface 200 of the camera APP, and user A can click the record button 201 in the preview interface 200 to collect and record video information in the studio.
  • the video information recorded by the mobile phone 300 may include the first shot 203A collected by the camera 353 and the audio collected in real time by the microphone.
  • the user A can input the zoom in or zoom out operation into the preview interface 200 .
  • the mobile phone 300 can zoom in on the first shooting screen 203A in the preview interface 200 by adjusting the focal length of the camera 353, thereby presenting a closer shooting effect to the user A.
  • the mobile phone 300 when user A clicks the record button 201 in the preview interface 200 to trigger the mobile phone 300 to collect and record the video information in the studio, if the mobile phone 300 detects that the user A has entered the video information in the preview interface 200 of the camera APP Enlargement operation, the mobile phone 300 can amplify the volume of the recorded audio while enlarging the size of the image recorded in the first shooting screen 203A. If the mobile phone 300 detects the zoom-out operation input by the user A on the preview interface 200 of the camera APP, it can reduce the volume of the recorded audio while reducing the size of the image in the recorded first shooting screen 203A.
  • the mobile phone 300 subsequently plays the recorded video information, if the shooting picture in the video information is enlarged, the volume of the audio corresponding to the shooting picture is also increased; correspondingly, if the shooting picture in the video information is When zoomed out, the volume of the audio corresponding to the shooting picture is also reduced, so that the size of the shooting picture in the video information matches the volume of the audio, and the user's audiovisual effect of the recorded video information is improved.
  • a video recording method provided by the embodiments of the present application can be applied to electronic devices, and the electronic devices can be mobile phones, tablet computers, notebook computers, ultra-mobile personal computers (UMPC), handheld computers, netbooks, personal computers, etc.
  • UMPC ultra-mobile personal computers
  • a digital assistant personal digital assistant, PDA
  • a wearable electronic device a virtual reality device, etc., are not limited in this embodiment of the present application.
  • the electronic device in this embodiment of the present application may be a mobile phone 300 .
  • the embodiment will be specifically described below by taking the mobile phone 300 as an example. It should be understood that the illustrated cell phone 300 is only one example of the electronic device described above, and that cell phone 300 may have more or fewer components than those shown, two or more components may be combined, or Different component configurations are possible.
  • the mobile phone 300 includes a processor 301, an internal memory 321, an external memory interface 322, an antenna A, a mobile communication module 331, an antenna B, a wireless communication module 332, an audio module 340, a speaker 340A, a receiver 340B, and a microphone 340C , headphone jack 340D, display screen 351, subscriber identification module (SIM) card interface 352, camera 353, buttons 354, sensor module 360, universal serial bus (USB) interface 370, charging management module 380 , a power management module 381 and a battery 382 .
  • the cell phone 300 may also include a motor, an indicator, and the like.
  • the processor 301 may include one or more processing units.
  • the processor 301 may include an application processor (AP), a modem, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, Digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc.
  • AP application processor
  • GPU graphics processing unit
  • ISP image signal processor
  • controller a video codec
  • Digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the modem can be a processing unit independent of the processor 301, or can be integrated with other processing units (such as AP, ISP, GPU, etc.) in the same device, and can also integrate some or all functions with mobile communication Module 331 is integrated in the same device.
  • Internal memory 321 may be used to store data and/or at least one computer program including instructions.
  • the internal memory 321 may include a program storage area and a data storage area.
  • the program storage area can store at least one computer program.
  • the computer program may include application programs (such as gallery, contacts, etc.), operating systems (such as Android operating system, or IOS operating system, etc.), or other programs, and the like.
  • the storage data area can store at least one of data created during use of the mobile phone 300, data received from other devices (such as other mobile phones, network devices, servers, etc.), or data pre-stored before leaving the factory.
  • the data stored in the internal memory 321 may be at least one of information such as images, files, or logos.
  • internal memory 321 may include high-speed random access memory and/or non-volatile memory.
  • the internal memory 321 includes one or more magnetic disk storage devices, flash memory (flash), or universal flash storage (UFS), or the like.
  • the processor 301 can make the mobile phone 300 realize one or more functions by calling one or more computer programs and/or data stored in the internal memory 321 to meet the needs of the user.
  • the processor 301 may cause the electronic device to execute the video recording method provided in the embodiments of the present application by invoking the instructions and data stored in the internal memory 321.
  • the external memory interface 322 can be used to connect an external memory card (eg, a micro SD card) to expand the storage capacity of the mobile phone 300 .
  • the external memory card communicates with the processor 301 through the external memory interface 322 to realize the data storage function. For example, save files such as images, music, videos, etc. in an external memory card.
  • a cache area may also be set in the processor 301 for saving instructions and/or data that the processor 301 needs to use cyclically. call directly. This helps to avoid repeated accesses, reduces the waiting time of the processor 301, and thus helps to improve the efficiency of the system.
  • the cache area may be implemented by a cache memory.
  • Antenna A and Antenna B are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna A can be multiplexed into the diversity antenna of the wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 331 may be used to implement the communication between the mobile phone 300 and the network device according to the mobile communication technology (eg, 2G, 3G, 4G or 5G, etc.) supported by the mobile phone 300 .
  • the mobile communication technology supported by the mobile phone 300 may include at least one of GSM, GPRS, CDMA, WCDMA, TD-SCDMA, LTE, or NR.
  • the mobile phone 300 supports GSM.
  • the mobile communication module 331 can amplify the modulated signal of the modem and send it to the network device via the antenna A; the mobile communication module 331 can also receive the signal sent by the network device through the antenna A, amplify it, and then send it to the modem, where the The modem demodulates the received signal into a low-frequency baseband signal, and then performs other corresponding processing.
  • the mobile communication module 331 may include a filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the wireless communication module 332 can provide applications on the mobile phone 300 including wireless local area networks (WLAN) (such as wireless-fidelity (Wi-Fi) networks), Bluetooth (BT), global Solutions for wireless communication such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • global Solutions for wireless communication such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi - at least one of zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS), etc.
  • the wireless communication module 332 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 332 can communicate with the corresponding device through the antenna B according to the wireless communication technology (eg, Wi-Fi, Bluetooth, FM or NFC, etc.) supported by itself.
  • the wireless communication technology eg, Wi-Fi, Bluetooth, FM or NFC, etc.
  • the mobile phone 300 can implement audio functions through an audio module 340, a speaker 340A, a receiver 340B, a microphone 340C, an earphone interface 340D, an AP, and the like. Such as music playback, recording, etc.
  • the microphone 340C may be a microphone array.
  • the microphone array may include a plurality of microphones for receiving audio signals from different directions respectively.
  • the microphone array can realize the directional enhancement function and the directional suppression function.
  • the mobile phone 300 may implement a display function through the GPU, the display screen 351, and the AP.
  • the display screen 351 may be used to display images, videos, and the like.
  • the display screen 351 may include a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode, or an active matrix organic light emitting diode (active-matrix organic light).
  • emitting diode, AMOLED organic light-emitting diode
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the mobile phone 300 may include one or N display screens 351 , where N is a positive integer greater than one.
  • the keys 354 may include a power key, a volume key, and the like.
  • the keys 354 may be mechanical keys, virtual buttons, virtual options, or the like.
  • the cell phone 300 can receive key input and generate key signal input related to user settings and function control of the cell phone 300 .
  • Sensor module 360 may include one or more sensors.
  • the sensor module 360 includes an acceleration sensor 360A, a touch sensor 360B, a fingerprint sensor 360C, and the like.
  • the sensor module 360 may also include a pressure sensor, a gyroscope sensor, an environmental sensor, a distance sensor, a proximity light sensor, a bone conduction sensor, and the like.
  • the acceleration sensor 360A can collect the magnitude of the acceleration of the mobile phone 300 in various directions (generally three axes). When the mobile phone 300 is stationary, the magnitude and direction of gravity can be detected. In addition, the acceleration sensor 360A can also be used for recognizing the posture of the mobile phone 300, and is applied to applications such as switching between horizontal and vertical screens, and a pedometer. In some embodiments, the acceleration sensor 360A may be connected to the processor 301 through a microcontroller unit (MCU), thereby helping to save the power consumption of the mobile phone 300 . For example, the acceleration sensor 360A can be connected to the AP and the modem through the MCU. In some embodiments, the MCU may be a general-purpose smart sensor hub (Sensor hub).
  • MCU microcontroller unit
  • the touch sensor 360B may also be referred to as a "touch panel”.
  • the touch sensor 360B may be disposed on the display screen 351 , and the touch sensor 360B and the display screen 351 form a touch screen, also referred to as a “touch screen”.
  • the touch sensor 360B is used to detect a touch operation on or near it.
  • the touch sensor 360B may communicate the detected touch operation to the AP to determine the touch event type.
  • the mobile phone 300 provides visual output related to the touch operation through the display screen 351 according to the determined touch event type.
  • the touch sensor 360B may also be disposed on the surface of the mobile phone 300 , which is different from the position where the display screen 351 is located.
  • the fingerprint sensor 360C is used to collect fingerprints.
  • the mobile phone 300 can use the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take photos with the fingerprint, answer incoming calls with the fingerprint, and the like.
  • the SIM card interface 352 is used to connect a SIM card.
  • the SIM card can be connected to and separated from the mobile phone 300 by inserting into the SIM card interface 352 or pulling out from the SIM card interface 352 .
  • the mobile phone 300 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 352 can support Nano SIM cards, Micro SIM cards, SIM cards, and the like. Multiple cards can be inserted into the same SIM card interface 352 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 352 may also be compatible with different types of SIM cards.
  • the SIM card interface 352 may also be compatible with external memory cards.
  • the mobile phone 300 implements functions such as calling and data communication through the SIM card.
  • the mobile phone 300 may also adopt an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the mobile phone 300 and cannot be separated from the mobile phone 300 .
  • the camera 353 can input the collected image signals to the processor 301, and the processor 301 can process the image signals into image frames.
  • the camera 353 may be a time of flight (TOF) camera.
  • TOF time of flight
  • the TOF camera can collect the spatial coordinates of the object to be photographed, thereby determining the direction of the object to be photographed.
  • the USB interface 370 is an interface that conforms to the USB standard specification, and can specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 370 can be used to connect a charger to charge the mobile phone 300, and can also be used to transmit data between the mobile phone 300 and peripheral devices. It can also be used to connect headphones and play audio on the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the mobile phone 300 .
  • the mobile phone 300 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 380 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the power management module 381 is used for connecting the battery 382 , the charging management module 380 and the processor 301 .
  • the power management module 381 receives input from the battery 382 and/or the charging management module 380, and supplies power to modules such as the processor 301.
  • the power management module 381 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the structure of the mobile phone 300 shown in FIG. 2 is only an example.
  • the mobile phone 300 of the embodiment of the present application may have more or less components than those shown in the figures, may combine two or more components, or may have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • a video recording method provided by an embodiment of the present application applied to a first electronic device, specifically includes:
  • S1004 in response to a user's zoom-in operation on the first photographing image, collect a second photographing image and audio corresponding to the second photographing image, wherein the first photographing image and the second photographing image are continuous.
  • S1006 Record the second shooting picture, and record audio corresponding to the second shooting picture at a second volume.
  • the second volume is greater than the first volume.
  • the sound amplification ratio corresponding to the second volume is greater than the sound amplification ratio of the first volume
  • the sound amplification ratio refers to the multiplication ratio of the volume output by the first electronic device and the collected volume.
  • the volume output by the mobile phone 300 may be the volume output by an earphone or a speaker, and the volume collected by the mobile phone 300 may be the volume collected by a microphone.
  • the sound amplification rate may correspond to the amplification rate of the mobile phone 300 power amplifier.
  • the video recording method 100 according to the embodiment of the present invention will be specifically described below with reference to different examples.
  • the first implementation manner of the video recording method is as follows:
  • the mobile phone 300 may be installed with an application program having a video capture function, such as a camera APP, a WeChat APP, and a Douyin APP.
  • a video capture function such as a camera APP, a WeChat APP, and a Douyin APP.
  • a camera APP as an example, and in conjunction with FIG. 1 to FIG. 8 , in the first implementation manner, how the user triggers the mobile phone 300 to collect and record video information through the camera APP is described in detail.
  • the mobile phone 300 may, in response to the user A's operation of opening the camera APP, call the camera 353 to start capturing the first captured image, and then display the captured first captured image in the preview interface of the camera APP.
  • the preview interface 200 of the camera APP includes a first shooting screen 203A
  • the first shooting screen 203A includes an image of the first shooting object and an image of the second shooting object.
  • the screen 203A includes the image b1 of the player B and the image c1 of the player C.
  • the size of the image b1 of the performer B may be the first display size
  • the size of the image c1 of the performer C may be the second display size.
  • the first display size may be the size of the area occupied by the image b1 of the performer B
  • the second display size may be the size of the area occupied by the image c1 of the performer C.
  • the first display size may also be the size x1 ⁇ y1 of the rectangular frame delineating the image b1 in (b) in FIG. 1
  • the size of the image c1 of the performer C may also be the delineating image in (b) in FIG. 1
  • the size of the rectangular box of c1 is x2 ⁇ y2.
  • the first display size is the size of the rectangular frame delineating the image b1 in (b) in FIG. 1 x1 ⁇ y1
  • the second display size is the size of the rectangular frame delimiting the image c1 in (b) in FIG. 1 x2 ⁇ y2
  • the first display size x1 ⁇ y1 may be 8mm ⁇ 6mm, 8mm ⁇ 12mm, etc.
  • the second display size x2 ⁇ y2 may be 8mm ⁇ 10mm, 10mm ⁇ 16mm, etc., which are not limited herein.
  • the preview interface 200 of the camera APP is further provided with a recording button 201 , and the user A can input a touch operation on the recording button 201 .
  • the mobile phone 300 can start recording video information in response to the touch operation of the recording button 201 by the user A.
  • the video information may include the first shot 203A collected by the camera 353 and the audio corresponding to the first shot 203A collected by the microphone 304C.
  • the audio collected by the microphone 304C may include the audio of the performer B and the audio of the performer C.
  • the first volume of the audio of the performer B recorded by the mobile phone 300 is V1
  • the second volume of the audio of the recorded performer C is V2
  • the sound amplification rate is R0.
  • the first volume V1 may be 30db, 40db, etc.
  • the second volume V2 may be 35db, 45db, etc., which are not limited herein.
  • the sound amplification ratio R0 may be the ratio/multiplier of the first volume V1 and the volume of the audio collected by the microphone of the mobile phone 300 .
  • the sound amplification ratio R0 may also be the corresponding amplification ratio of the current mobile phone 300 power amplifier.
  • the first volume V1 of the audio of the performer B recorded by the mobile phone 300 is positively related to the volume of the audio of the performer B.
  • the louder the audio emitted by the performer B the greater the first volume V1 of the recorded audio of the performer B; conversely, the smaller the volume of the audio emitted by the performer B, the first volume of the recorded audio of the performer B is V1 is also smaller.
  • the second volume V2 of the audio of the performer C recorded by the mobile phone 300 is also positively related to the volume of the audio of the performer C, which will not be repeated here.
  • user A can input a zoom-in operation at any position on the preview interface 200 .
  • the above zoom-in operation may be a long-press operation.
  • the above-mentioned long-press operation can also be replaced by an operation such as an expansion gesture, a double-click operation, or an upward dragging of a scroll bar (not shown in FIG. 4 ), which is not limited herein.
  • the mobile phone 300 captures and records the second shooting picture 203B according to the second focal length F2 . Then, the second shooting screen 203B is displayed on the preview interface 200 .
  • the second photographing screen 203B may include an image b2 of the performer B of the third display size x1' ⁇ y1', and an image c2 of the performer C of the fourth display size x2' ⁇ y2'.
  • the third display size x1′ ⁇ y1′ is larger than the first display size x1 ⁇ y1; the fourth display size x2′ ⁇ y2′ is larger than the second display size by x2 ⁇ y2.
  • the third display size x1′ ⁇ y1′ may be 12mm ⁇ 9mm, 12mm ⁇ 16mm, etc.
  • the fourth display size x2′ ⁇ y2′ may be 12mm ⁇ 15mm, 15mm ⁇ 24mm, etc., which are not limited herein.
  • the mobile phone 300 collects the audio corresponding to the second shooting picture 203B.
  • the audio corresponding to the second shooting picture 203B refers to the audio collected when the second shooting picture 203B is shot.
  • the audio collected when shooting the second shooting picture 203B may include the audio of the performer B in the second shooting picture 203B, and may also include the audio of a sound source located outside the second shooting picture 203B, which is not limited herein.
  • the mobile phone 300 records the audio corresponding to the second shooting screen 203B, the volume of the recorded audio can be increased.
  • raising the first volume V1 of the audio of the performer B is the third volume V1'
  • raising the second volume V2 of the recorded audio of the performer C is the fourth volume V2'
  • the third volume V1' is greater than the first volume V1
  • the fourth volume V2' is greater than the second volume V2.
  • the third volume V1' may be 50db, 60db, etc.
  • the fourth volume V2' may be 55db, 70db, etc., which are not limited herein.
  • the second shooting screen 203B When the mobile phone 300 is playing the recorded video information, the second shooting screen 203B is displayed. Understandably, the second shooting picture 203B includes: the image b2 of the performer B displayed in the third display size x1′ ⁇ y1′ larger than the first display size x1 ⁇ y1, and the image b2 of the performer B in the third display size larger than the second display size x2 ⁇ y2.
  • the image c2 of the performer C displayed in four display sizes x2' x y2'.
  • the mobile phone 300 when playing the audio corresponding to the second shooting screen 203B in the video information, the mobile phone 300 increases the sound amplification rate R1, and plays the audio of the performer B at a third volume V1' higher than the first volume V1, and The audio of the performer C is played at the fourth volume V2' higher than the second volume V2.
  • R1 the sound amplification rate
  • the audio of the performer C is played at the fourth volume V2' higher than the second volume V2.
  • user A can not only visually feel that performer B and performer C are closer to him, but also enable user A to hear that performer B and performer C are closer to him, which improves the recording performance. audio-visual effects of the video information.
  • the mobile phone 300 may capture the first shooting picture 203A according to the first focal length F1. Subsequently, when the mobile phone 300 receives the zoom-in operation input by the user A on the preview interface 200, the mobile phone 300 can capture the second photographing image 203B according to the second focal length F2. After the mobile phone 300 captures the second photographing picture 203B, the size of the image of the object to be photographed in the second photographing picture 203B can be determined according to the second focal length F2.
  • D 2 D 1 ⁇ F2/R
  • the third display size x1 ′ ⁇ y1 ′ of the image b2 , and the fourth display size x2 ′ ⁇ y2 ′ of the image c2 which determines the performer C.
  • the mobile phone 300 When the mobile phone 300 detects the zoom-in operation input by the user A on the preview interface 200, the mobile phone 300 can not only enlarge the first shooting picture 203A, but also increase the recorded second shooting picture according to the first focal length F1 and the second focal length F2 The volume of the audio corresponding to 203B.
  • the mobile phone 300 may determine the second volume according to the first volume and the zoom factor. For example, the mobile phone 300 according to the formula Increase the volume of recorded audio. where V is the volume of the recorded audio, and the volume V is raised to the volume is the zoom factor. Further, the first volume V1 of the recorded audio of the performer B is raised to the third volume V1', and the second volume V2 of the recorded audio of the performer C is raised to the fourth volume V2'. Among them, the third volume V1' satisfies The fourth volume V2' satisfies
  • the mobile phone 300 may also increase the volume of the audio corresponding to the recorded second shooting picture 203B according to the image magnification ratio of any object to be shot in the first shooting picture 203A.
  • the mobile phone 300 can detect the size of any photographed object in the first shooting picture 203A in the first shooting picture 203A. Taking performer B as an example, the mobile phone 300 can detect the first display size x1 ⁇ y1 of the image b1 of performer B in the first shooting screen 203A according to the YOLO (you only live once, YOLO) model. Similarly, after collecting the second shooting picture 203B, the mobile phone 300 can also detect the third display size x1′ ⁇ y1′ of the performer B in the second shooting picture 203B according to the above method.
  • YOLO young only live once, YOLO
  • the mobile phone 300 can also obtain the image enlargement ratio of the image c1 of the performer C according to the same method described above, which will not be repeated here. It can be understood that the image magnification ratio of the image c1 of the performer C and the image b1 of the performer B is the same.
  • the mobile phone 300 may determine the increased volume V' according to the image enlargement ratio B.
  • the cell phone 300 can The volume of the audio corresponding to the recorded second shooting picture 203B is increased.
  • V is the volume of the audio corresponding to the recorded first shooting picture 203A
  • the volume V is increased to the volume V′, Zoom in for the image.
  • the first volume V1 of the audio of the performer B is raised to the third volume V1'
  • the second volume V2 of the audio of the performer C is raised to the fourth volume V2'.
  • the third volume V1' satisfies
  • the fourth volume V2' satisfies
  • the mobile phone 300 may further increase the volume of the audio corresponding to the recorded second shooting picture 203B in combination with the zoom factor and the image magnification ratio.
  • the cell phone 300 can Increase the volume of recorded audio.
  • V is the volume of the audio corresponding to the recorded first shooting picture 203A
  • the volume V is increased to the volume V′
  • the first volume V1 of the audio of the performer B is raised to the third volume V1'
  • the second volume V2 of the audio of the performer C is raised to the fourth volume V2'.
  • the third volume V1' satisfies
  • the fourth volume V2' satisfies
  • FIG. 5 includes a sound source M, a position P1, and a position P2.
  • the distance between the position P2 and the sound source M is d1
  • the distance between the position P1 and the sound source M is d2.
  • the sound emitted by the sound source M can be propagated to the position P1 and the position P2 in sequence. If the volume of the audio received at the position P2 is Y, the volume Y' of the sound from the sound source M propagated to the position P1 can be calculated according to the formula get.
  • the third display size x1′ ⁇ y1′ of the image b2 of the performer B collected with the second focal length F2 is larger than the image of the performer B collected with the first focal length F1
  • the first display size of b1 is x1 ⁇ y1. If the image b2 of the performer B is displayed on the preview interface 200 of the camera APP with the third display size x1′ ⁇ y1′, the user A can visually feel that the image of the performer B is closer to him (essentially distance has not changed). In this way, the relationship between the size of the recorded image of the performer B and the volume of the recorded audio of the performer B can be simulated based on the relationship between the propagation distance of the sound and the volume.
  • the mobile phone 300 can The volume of the audio corresponding to the recorded second shooting picture 203B is increased.
  • the mobile phone 300 when the mobile phone 300 detects the zoom-in operation input by the user A on the preview interface 200 , the mobile phone 300 can increase the sound amplification rate in addition to the zoom-in of the first shooting screen 203A. Specifically, the mobile phone 300 can adjust the sound amplification rate through the change of the focal length and/or the change of the display size. Therefore, the volume of the audio corresponding to the amplification operation is adjusted by the adjusted sound amplification ratio R1.
  • V′ in the above formula can be replaced by R1, and V by R0 to obtain the adjusted sound amplification ratio R1.
  • the mobile phone 300 may further reduce the recorded third shooting screen 203C based on the above method when a zoom-out operation is detected The corresponding audio volume.
  • the third shooting picture 203C is captured according to the third focal length F3, and the third shooting picture 203C is displayed on the preview interface 200 .
  • the third photographing screen 203C includes the image b3 of the performer B displayed in the fifth display size x1" ⁇ y1", and the image c3 of the performer C displayed in the sixth display size x2" ⁇ y2".
  • the fifth display size x1"*y1" is smaller than the first display size x1*y1
  • the sixth display size x2"*y2" is smaller than the second display size x2*y2.
  • the mobile phone 300 reduces the first volume V1 of the recorded audio of the performer B to the fifth volume V1", and reduces the second volume V2 of the recorded audio of the performer C to the sixth volume V2".
  • the mobile phone 300 displays the third shooting screen 203C.
  • the third shooting screen 203C includes the image b3 of the performer B displayed in the fifth display size x1" ⁇ y1" smaller than the first display size x1 ⁇ y1, and the image b3 in the sixth display size x2 smaller than the second display size x2 ⁇ y2
  • the image c3 of the performer C displayed by "xy2".
  • the mobile phone 300 plays the audio of the performer B at a fifth volume V1" lower than the first volume V1, and plays the audio of the performer B at a fifth volume V1" lower than the second volume V2.
  • the six volume V2" plays the audio of performer C.
  • user A can not only visually feel that performer B and performer C are farther away from him, but also enable user A to feel that performer B and performer C are farther away from him auditory, which improves the recording performance. audio-visual effects of the video information.
  • the mobile phone may capture the first shooting picture 203A according to the first focal length F1.
  • the mobile phone 300 receives the zoom-out operation input by the user A on the preview interface 200, the mobile phone 300 can capture the third photographing image 203C according to the third focal length F3.
  • the mobile phone 300 can determine the size of the third shooting picture 203C based on the third focal length F3 according to the same method as above.
  • the mobile phone 300 when the mobile phone 300 detects the zoom-out operation input by the user A on the preview interface 200, the mobile phone 300 can not only display the third shooting screen 203C, but also reduce the The volume of the recorded audio.
  • the cell phone 300 can Decrease the volume of the audio corresponding to the recorded third shooting picture 203C.
  • V is the volume of the audio corresponding to the recorded first shooting picture 203A
  • the volume V is reduced to the volume V′′, is the zoom factor.
  • the mobile phone 300 may also reduce the volume of the audio corresponding to the recorded third shooting picture 203C according to the image reduction ratio. Specifically, the mobile phone 300 can obtain the image reduction ratio based on the same method as described above, which will not be repeated here.
  • the mobile phone 300 can determine the reduced volume V' according to the image reduction ratio. For example, the cell phone 300 can Decrease the volume of the recorded audio. Wherein, V is the volume of the audio corresponding to the recorded first shooting picture 203B, and the volume V is reduced to the volume V′′, Downscale the image.
  • the mobile phone 300 may further reduce the volume of the audio corresponding to the recorded third shooting picture 203C in combination with the first focal length F1, the third focal length F3, and the image reduction ratio.
  • the cell phone 300 can Decrease the volume of the audio corresponding to the recorded third shooting picture 203C.
  • V is the volume of the audio corresponding to the recording of the first shooting picture 203A
  • the volume V is reduced to the volume V′′
  • the zoom factor, Reciprocal of the image reduction ratio is calculated according to the formula
  • the principle of reducing the volume of the audio corresponding to the recorded third shooting picture 203C is the same as the above-mentioned principle, and will not be repeated here.
  • the mobile phone 300 when the mobile phone 300 detects the zoom-out operation input by the user A on the preview interface 200, the mobile phone 300 can obtain the sound magnification ratio R2 in addition to zooming out the first shooting screen 203A. Furthermore, the mobile phone 300 can obtain the volume V" of the recorded audio according to the original volume V0" and the sound amplification ratio R2 of the collected audio corresponding to the third shooting picture 203C. Wherein, V′′ is less than the volume V.
  • the adjustment method of the sound amplification rate corresponding to the zoom-out operation can be the same as the adjustment method of the sound amplification rate corresponding to the zoom-in operation.
  • the mobile phone 300 can change the focal length and/or the display size through the above-mentioned changes. Adjust the sound amplification ratio.
  • the method for specifically adjusting the sound amplification ratio can be replaced by R2 in the above-mentioned formula, and V is replaced by R0 to obtain
  • the specific details of the adjusted sound amplification ratio R2 can be found in the content of the above formula, which will not be repeated here.
  • the manner in which the mobile phone 300 plays the video information is still described by taking the user A inputting the zoom-in operation in the preview interface 200 of the mobile phone 300 as an example.
  • the mobile phone 300 can play the recorded video information while recording the video information.
  • the mobile phone 300 displays the second shooting screen 203B according to the zoom-in operation
  • the mobile phone 300 can increase the volume of the audio corresponding to the recorded second shooting screen 203B according to the above method.
  • the mobile phone 300 is in the headset mode, the mobile phone 300 The recorded audio can be played in the headset according to the volume V' in real time. In this way, when the user A records the video information, the second shot 203B in the video information that the user A enjoys can match the audio played at the volume V', and the audio-visual effect of the user A is better.
  • the mobile phone 300 If the mobile phone 300 is not in the headset mode, after responding to the stop operation triggered by the user A for the record button 201, the mobile phone 300 generates a video file and saves it according to the second shooting screen 203B displayed based on the zoom-in operation and the increased volume V'. Furthermore, when the mobile phone 300 subsequently responds to the opening operation of the video file by the user A, the mobile phone 300 plays the recorded video information. For example, the second shooting picture 203B is displayed on the mobile phone 300 and the audio corresponding to the second shooting picture 203B recorded is played on the speaker with the volume V′. As a result, when user A enjoys the video information, the second shot 203B in the video information can match the audio played at the volume V', and user A's viewing effect is better.
  • the second implementation manner of the embodiment of the present application further provides another video recording method, and the video recording method can also be applied to the mobile phone 300 .
  • the video recording method provided by the second implementation manner of the embodiment of the present application, after the mobile phone 300 detects the zoom-in operation, it only increases the number of selected objects in the first shooting screen 203A.
  • the following describes in detail how user A triggers the mobile phone 300 to collect and record video information through the camera APP in the second implementation manner.
  • the mobile phone 300 may, in response to the user A's operation of opening the camera APP, call the camera 353 to start capturing the first captured image, and then display the captured first captured image in the preview interface of the camera APP.
  • the preview interface 200 of the camera APP includes a first shooting screen 203A
  • the first shooting screen 203A includes an image of the first shooting subject and an image of the second shooting subject.
  • the screen 203A includes the image b1 of the player B and the image c1 of the player C.
  • the size of the image b1 of the performer B is the first display size x1 ⁇ y1
  • the size of the image c1 of the performer C is the second display size x2 ⁇ y2 .
  • the preview interface 200 of the camera APP is further provided with a recording button 201 , and the user A can input a touch operation to the recording button 201 .
  • the mobile phone 300 can start recording video information in response to the touch operation of the recording button 201 by the user A.
  • the video information may include the first shot 203A collected by the camera 353 and the audio corresponding to the first shot 203A collected by the microphone 304C.
  • the audio collected by the microphone 304C may include the audio of the performer B and the audio of the performer C.
  • the first volume of the audio of the performer B recorded by the mobile phone 300 is V1
  • the second volume of the audio of the performer C recorded by the mobile phone 300 is V2.
  • the subsequent volume of all audios collected by the mobile phone 300 is not increased/decreased, but the volume of the object selected by the user A in the first photographing screen 203A is increased/decreased.
  • the volume of the audio For example, when the subject selected by the user A is the performer B or the performer C, the mobile phone 300 can increase/decrease the volume of the audio of the performer B or the volume of the audio of the performer C. Before increasing/decreasing the volume, the mobile phone 300 can recognize the facial features in the first shot 203A, and the mobile phone 300 can also recognize the sound features from the recorded audio. Further, the mobile phone 300 may match the first facial feature of the performer B with the voice feature of the performer B, and the second facial feature of the performer C with the voice feature of the performer C.
  • the following describes how the mobile phone 300 recognizes the facial features in the first shot 203A, and how to recognize the sound features from the recorded audio.
  • the recognition result may include the value of the sliding boundary window 208 (bounding box) and the value of confidence. Specifically, the confidence level is used to indicate whether there is a human face.
  • the confidence value can be a binary number 1; on the contrary, if there is no human face, the confidence value can be a binary number 0.
  • the value of the sliding boundary window 208 is [x, y, w, h], where (x, y) are the coordinates of the center point of the face, and (w, h) are the width and height of the sliding boundary window 208 .
  • the mobile phone 300 can identify the first facial feature of the performer B and the second facial feature of the performer C in the first shot 203A of the camera APP based on the above method.
  • the mobile phone 300 can also identify the first facial feature of the performer B and the second facial feature of the performer C in the first shot screen 203A according to a deep learning-based video analysis algorithm (amazon rekognition video, ARV). , which is not limited here.
  • a deep learning-based video analysis algorithm asmazon rekognition video, ARV.
  • the mobile phone 300 may process the recorded audio into multiple audio frames. Furthermore, the mobile phone 300 can convert the audio signal in the time domain of each audio frame into an audio signal in the frequency domain by using the fast Fourier transform according to the mel frequency cepstrum coefficient (MFCC). Then, the mobile phone 300 may filter the audio signal in the frequency domain, and extract a sound feature vector corresponding to each audio frame from the filtered audio signal in the frequency domain. Next, the mobile phone 300 may determine the similarity between the sound feature vectors corresponding to each audio frame according to the cosine similarity or the Euclidean distance algorithm. Finally, the mobile phone 300 may group sound feature vectors whose similarity is greater than the similarity threshold into a group.
  • MFCC mel frequency cepstrum coefficient
  • the mobile phone 300 can obtain a set of first sound feature vectors corresponding to the audio of the performer B and the audio corresponding to the audio of the performer C according to the above method.
  • a set of second voice feature vectors to identify the voice features of different users in the recorded audio.
  • the mobile phone 300 can establish an association relationship between the first facial feature of the performer B and the first voice feature vector according to the audio-visual recognition model (audio-visual recognition, AVR), and establish the second facial feature of the performer C and the relationship between the second sound feature vector.
  • the audio-visual recognition model audio-visual recognition, AVR
  • the audiovisual recognition model includes a visual recognition network 1001 , an audio recognition network 1002 and a matching network 1003 .
  • the mobile phone 300 can input the first facial feature of the performer B into the visual recognition network 1001 , and input the first sound feature vector into the audio recognition network 1002 .
  • the visual recognition network 1001 determines the first pronunciation feature corresponding to the lip shape according to the lip shape of the first facial feature of the performer B, and inputs the first pronunciation feature into the matching network 1003; the audio recognition network 1002 extracts the first sound feature
  • the second pronunciation feature of the vector is input to the matching network 1003 .
  • the matching network 1003 determines the similarity between the first pronunciation feature and the second pronunciation feature.
  • the mobile phone 300 establishes an association relationship between the first facial feature of the performer B and the first voice feature vector. Similarly, the mobile phone 300 can also establish an association relationship between the second face feature of the performer C and the second voice feature vector according to the audiovisual recognition model, which will not be repeated here.
  • the user A can input a selection operation on the image b1 of the performer B in the first shooting screen 203A (for example, click operation of the image b1), of course, the selection operation can also be double-click, long-press and other operations, which are not limited here.
  • the mobile phone 300 may display an adjustment button 205 and a mask button 206 on the side of the image b1 of the performer B in response to the user A's selection operation on the image b1 of the performer B. Wherein, as shown in (b) of FIG.
  • the mobile phone 300 can respond to the click operation of the adjustment button 205 by the user A, according to the The correlation between the first face feature of the performer B and the first voice feature vector, find out the audio of the performer B corresponding to the first voice feature vector, and record the audio of the performer B as the audio to be adjusted.
  • the mobile phone 300 may add a first identifier (eg, a field "1") to the audio of the performer B, thereby recording the audio of the performer B as the audio to be adjusted.
  • the mobile phone 300 can display the second shooting screen 203B in the preview interface 200, and record the second shooting, similar to the above-mentioned embodiment.
  • the second photographing screen 203B may include an image b2 of the performer B of the third display size x1' ⁇ y1', and an image c2 of the performer C of the fourth display size x2' ⁇ y2'.
  • the third display size x1′ ⁇ y1′ is larger than the first display size x1 ⁇ y1;
  • the fourth display size x2′ ⁇ y2′ is larger than the second display size by x2 ⁇ y2.
  • the first volume V1 of the audio of performer B can be increased to the third volume V1', and the performance If the audio of the performer C is not added with the first identification, the second volume V2 of the audio of the performer C may remain unchanged. Understandably, in the second implementation manner of the present application, the principle that the first volume V1 of the audio of the performer B is increased to be the third volume V1' is different from the principle of increasing the audio volume of the performer B's audio in the first implementation manner of the present application.
  • the principle that the first volume V1 is the third volume V1' is the same, and will not be repeated here. The difference is that, in the first implementation manner, the audio of the performer B is amplified along with the overall amplification of the recorded audio, while in the second implementation manner, the audio of the performer B is individually amplified. Understandably, the third volume V1 ′ is greater than the first volume V1 , thereby making the user A visually feel that the performer B is closer to him.
  • the mobile phone 300 plays the audio of the performer B at the third volume V1 ′ higher than the first volume V1 , and keeps playing the audio of the performer C at the second volume V2 .
  • the mobile phone 300 plays the audio of the performer B at the third volume V1 ′ higher than the first volume V1 , and keeps playing the audio of the performer C at the second volume V2 .
  • the mobile phone 300 plays the audio of the performer B at the third volume V1 ′ higher than the first volume V1 , and keeps playing the audio of the performer C at the second volume V2 .
  • the mobile phone 300 increases the volume of the audio of the performer B when the zoom-in operation is detected as an example.
  • the mobile phone 300 may further reduce the recorded audio of performer B based on the above method when a zoom-out operation is detected The first volume V1.
  • the mobile phone 300 may display the third shooting screen 203C in the preview interface 200, similar to the above-mentioned embodiment.
  • the third photographing screen 203C includes the image b3 of the performer B displayed in the fifth display size x1" ⁇ y1", and the image c3 of the performer C displayed in the sixth display size x2" ⁇ y2".
  • the fifth display size x1"*y1" is smaller than the first display size x1*y1
  • the sixth display size x2"*y2" is smaller than the second display size x2*y2.
  • the mobile phone 300 reduces the first volume V1 of the recorded audio of the performer B to the fifth volume V1".
  • the mobile phone 300 may display the above-mentioned third shooting screen 203C.
  • the third shooting screen 203C includes the image b3 of the performer B displayed in the fifth display size x1" ⁇ y1" smaller than the first display size x1 ⁇ y1, and the image b3 in the sixth display size x2 smaller than the second display size x2 ⁇ y2
  • the image c3 of the performer C displayed by "xy2".
  • the mobile phone 300 plays the audio of the performer B at a fifth volume V1 ′′ lower than the first volume V1; the mobile phone 300 detects the performer C
  • the audio of performer C is not added with the first mark, and the audio of performer C is played at the second volume V2.
  • user A can not only visually feel that performer B is farther away, but also make user A auditory It is also possible to feel that the performer B is farther away from him, thereby improving the audio-visual effect of the recorded video information.
  • the camera 353 of the mobile phone 300 may be a TOF camera, and the mobile phone 300 may use the TOF camera to detect the first propagation direction of the audio of the performer B and the second propagation direction of the audio of the performer C.
  • the first propagation direction of the audio of the performer B includes the pitch angle ⁇ 1 and the azimuth angle of the performer B relative to the mobile phone 300
  • the second propagation direction of the audio of the performer C includes the pitch angle ⁇ 2 and the azimuth angle of the performer C relative to the handset 300
  • the detection process can be as follows: in the space coordinate system with the TOF camera as the coordinate origin, the TOF camera can detect the first coordinates (x1, r1, z1) of the performer B and the second coordinates (x2, z1) of the performer C.
  • the mobile phone 300 can Determine the pitch angle ⁇ 1 and azimuth angle of performer B relative to the handset 300 Similarly, the mobile phone 300 can Determine the pitch angle ⁇ 2 and azimuth angle of the performer C relative to the handset 300 In this way, the mobile phone 300 uses the calculated pitch angle ⁇ 1 and azimuth angle
  • the audio of performer B can be determined, and the mobile phone 300 can determine the pitch angle ⁇ 2 and azimuth angle through the calculated A second direction of propagation of performer C's audio is determined.
  • the microphone 304C of the mobile phone 300 may be an array microphone.
  • the array microphones of the mobile phone 300 can collect audio from different propagation directions. Furthermore, the array microphone of the mobile phone 300 obtains the data from the pitch angle of ⁇ 1 and the azimuth angle of The audio of the performer B is directed in the direction of the performer B, and the audio of the performer B is spatially filtered according to the spatial spectrum characteristics of the audio of the performer B, so as to realize the accurate directional enhancement collection of the audio of the performer B.
  • the array microphone of the mobile phone 300 can collect data from the pitch angle ⁇ 2 and the azimuth angle according to the null position. The audio of the performer C in the direction of , to realize the directional suppression collection of the audio of the performer C.
  • the audio corresponding to the recorded second shooting picture 203B is played, since the audio of the performer B is collected by the directional enhancement of the array microphone of the mobile phone 300, the audio of the played performer B is further clearer; furthermore, The audio of the performer C is collected by the array microphone directional suppression of the mobile phone 300 , and the interference to the audio of the performer B is small, which further improves the audio-visual effect of the user A enjoying the video information.
  • the mobile phone 300 can denoise the audio of the performer C in the second shooting picture 203B. Processing to remove noise from performer C's audio.
  • the audio corresponding to the recorded second shooting picture 203B is played, since the audio of the performer C in the second shooting picture 203B has been noise-reduced, the audio of the performer B in the second shooting picture 203B can be reduced. audio interference.
  • the mobile phone 300 may also respond to the selection operation input by user A on the image c1 of performer C, in The adjustment button 205 and the mask button 206 are displayed on the side of the image c1 of the performer C, and the adjustment button 205 and the mask button 206 are also displayed on the side of the image c1 of the performer C.
  • the mobile phone 300 can respond to the user A's click operation on the mask button 206 displayed on the side of the image c1 of the performer C, and the mobile phone 300 The second facial feature of performer C is detected in image c1.
  • the mobile phone 300 searches out the audio of the performer C according to the correlation between the second facial feature of the performer C and the voice feature. Furthermore, the mobile phone 300 records the audio of the performer C in the first shooting screen 203A as the audio to be shielded. For example, the mobile phone 300 may add a second identifier (eg, field "0") to the audio of the performer C in the first shooting screen 203A, thereby recording the audio of the performer C in the first shooting screen 203A as the audio to be shielded.
  • a second identifier eg, field "0
  • the mobile phone 300 detects that the audio of the performer B in the first shooting screen 203A is added with the first logo, and the mobile phone 300 uses a third volume higher than the first volume V1.
  • V1' records the audio of the performer B corresponding to the second shooting screen 203B;
  • the mobile phone 300 detects that the audio of the performer C in the first shooting screen 203A is added with a second identification, and then records the audio of the performer C corresponding to the second shooting screen 203B.
  • the audio of the player C will not be recorded, thereby shielding the audio of the performer C corresponding to the second shooting picture 203B.
  • the mobile phone 300 plays the audio of the performer B at a third volume V1' higher than the first volume V1; the audio of the performer C is not played, so that Reduced disruption to Performer B's audio.
  • the third implementation manner of the embodiment of the present application further provides another video recording method, and the video recording method can be applied to the mobile phone A.
  • the structure of the mobile phone A is the same as that of the mobile phone 300 in the above-mentioned embodiment, which is not repeated here.
  • the following is still taking the camera APP as an example, combined with the scene of a meeting in a conference room, to describe in detail how in the third implementation manner, user A triggers mobile phone A to collect and record video information through the camera APP.
  • a distributed system can be set up in the conference room.
  • the distributed system includes the mobile phone B held by the presenter B and the mobile phone C held by the host C, and the mobile phone B and the mobile phone C are connected through the local area network. (eg, WIFI, Bluetooth, etc.) communication connection.
  • the local area network eg, WIFI, Bluetooth, etc.
  • the preview interface 200 of the camera APP includes a first shooting screen 203A
  • the first shooting screen 203A includes an image of the first shooting subject and an image of the second shooting subject.
  • the first shooting screen 203A includes a lecturer Image b1 of person B and image c1 of host C.
  • the preview interface 200 of the camera APP is further provided with a recording button 201 , and the user A can input a touch operation on the recording button 201 to trigger the mobile phone A to record video information.
  • the first photographing screen 203A of 203A may also display a prompt button 203 for instructing to join the distributed system.
  • the prompt button 203 when the prompt button 203 is displayed on the mobile phone 300, it may be that the mobile phone A meets the conditions for detecting that the user A uses the camera APP of the mobile phone A for the first time.
  • the trigger condition of the distributed system When the trigger condition of the distributed system is added, the mobile phone A does not need to display the prompt button 203 on the first shooting screen 203A, but automatically joins the distributed system to avoid visual disturbance to the user A.
  • the prompt button 203 may also be displayed on the mobile phone 300 , which is not limited herein.
  • the trigger condition for joining the distributed system may be, but not limited to, that mobile phone A, mobile phone B, and mobile phone C are connected to the same WIFI address. Still as shown in FIG. 14 , user A can click the prompt button 203 , and mobile phone A can respond to user A’s click operation on prompt button 203 . As shown in FIG. 15 , mobile phone A is connected to mobile phone B and mobile phone C respectively. As a result, mobile phone A completes the operation of joining the distributed system, and can interact with mobile phone B and mobile phone C respectively.
  • the mobile phone A starts to record video information in response to the user A's touch operation on the record button 201 .
  • the video information may include the first shot 203A collected by the camera 353 and the audio collected by the microphone 304C.
  • the audio collected by the microphone 304C may include the audio of the presenter B and the audio of the host C.
  • the first volume of the audio of the presenter B recorded by the mobile phone A is V1
  • the second volume of the audio of the host C recorded by the mobile phone A is V2.
  • the mobile phone A detects the zoom-in operation or the zoom-out operation input by the user in the first shooting screen 203A, it does not raise/lower
  • the volume of all the audios collected by A is increased/decreased for the audio volume of the selected subject in the first shooting screen 203A.
  • the mobile phone 300 can increase/decrease the volume of speaker B's audio or the volume of host C's audio. Therefore, before increasing/decreasing the volume, the mobile phone A can identify the first face feature and the second face feature in the first shooting picture 203A and identify the recorded The first sound feature and the second sound feature in the audio will not be repeated here.
  • the mobile phone A can send the first facial feature of the facial image of the presenter B, the second facial feature of the facial image of the presenter C in the first shooting screen 203A to the mobile phone B, and the recording
  • the first sound feature of the audio of the presenter B and the second sound feature of the audio of the host C in the audio, while the mobile phone A can send the mobile phone C the first facial feature in the first shot screen 203A, the second sound feature of the second person The face feature and the first sound feature and the second sound feature in the recorded audio.
  • the mobile phone B can receive the first facial feature and the second facial feature from the mobile phone A, and the first sound feature and the second sound feature in the recorded audio.
  • the mobile phone C can also receive the first facial feature and the second facial feature in the first photographing picture 203A from the mobile phone A, and the first sound feature and the second sound feature in the recorded audio.
  • different facial images and audios may be stored in the mobile phone B, and there is a one-to-one correspondence between the stored facial images and audios.
  • the mobile phone B can compare the first facial features and the second facial features with the stored facial images respectively. Compare and get the corresponding face similarity. If it is recognized that the face similarity between the stored face image A and the first face feature is greater than the set similarity threshold, the mobile phone B can determine that the first face feature matches the stored face image A. Similarly, the mobile phone B can also compare the first sound feature and the second sound feature with the stored audios, respectively, to obtain the corresponding audio similarity.
  • the mobile phone B may determine that the first sound feature matches the stored audio A.
  • the audio A stored in the mobile phone B corresponds to the face image A
  • the mobile phone B can establish an association relationship between the first face feature and the first voice feature, and send the association relationship between the first face feature and the first voice feature to the mobile phone A.
  • the mobile phone A completes the matching between the face image of the speaker B and the audio of the speaker B according to the correlation between the first face feature and the first voice feature.
  • the mobile phone C can complete the matching between the face image of the host C and the audio of the host C according to the above-mentioned same method, and will not repeat them here. .
  • the mobile phone B cannot establish a relationship between the first facial feature and the first sound feature. association relationship. Further, the mobile phone B may send the first prompt information to the mobile phone A, where the first prompt information is used to indicate that the matching fails. Then, after the mobile phone A receives the first prompt information from the mobile phone B, it can establish the first person by using an audio-visual recognition model (audio-visual recognition, AVR) based on the same manner as the second implementation manner of the embodiment of the present application. The relationship between the face feature and the first voice feature. Further, the mobile phone A completes the matching between the face image of the speaker B and the audio of the speaker B according to the correlation between the first face feature and the first voice feature.
  • an audio-visual recognition model audio-visual recognition, AVR
  • the mobile phone C cannot establish a second person.
  • the mobile phone C may also send the first prompt information to the mobile phone A, where the first prompt information is used to indicate that the matching fails.
  • the mobile phone A can establish the relationship between the second face feature and the second voice feature by the audio-visual recognition model (audio-visual recognition, AVR) based on the same manner as the above-mentioned embodiment. connection relation.
  • the mobile phone A completes the matching between the face image of the host C and the audio of the host C according to the correlation between the second facial feature and the second voice feature.
  • the mobile phone A may display an adjustment button 205 and a mask button 206 on the side of the image b1 of the presenter B in response to the user A's selection operation on the image b1 of the presenter B.
  • the selection operation may be operations such as click, double-click, long-press, etc., which is not limited here.
  • the mobile phone A responds to the click operation of the adjustment button 205 by the user A, according to the presenter B
  • the correlation between the facial features of the first voice feature vector and the first voice feature vector find out the audio of the speaker B corresponding to the first voice feature vector, and record the audio of the speaker B as the audio to be adjusted.
  • the mobile phone 300 may add an identification (eg, field "1") to the audio of the speaker B, so as to record the audio of the speaker B as the audio to be adjusted.
  • user A can input a zoom-in operation at any position on the preview interface 200 .
  • the above-mentioned zoom-in operation may be a long-press operation.
  • the mobile phone 300 can display the second shooting screen 203B in the preview interface 200 and record the second shooting screen similar to the above-mentioned embodiment.
  • the second photographing screen 203B may also include an image b2 of the presenter B with a third display size x1' ⁇ y1', and an image c2 of the presenter C with a fourth display size x2' ⁇ y2'.
  • the third display size x1′ ⁇ y1′ is larger than the first display size x1 ⁇ y1;
  • the fourth display size x2′ ⁇ y2′ is larger than the second display size by x2 ⁇ y2.
  • the mobile phone A can increase the first volume V1 of the audio of the speaker B in the second shooting screen 203B to the third volume V1' when detecting the zoom-in operation input by the user A, while the speaker C
  • the second volume of the audio V2 can remain unchanged.
  • the manner of increasing the first volume V1 of the speaker B's audio to the third volume V1' is the same as the second implementation manner of the present application, and will not be repeated here. It can be understood that the third volume V1' is greater than the first volume V1, thereby making the user A visually feel that the speaker B is closer to him.
  • the mobile phone A when playing the recorded audio corresponding to the second shooting screen 203B, plays the audio of the speaker B at a third volume V1' higher than the first volume V1, and keeps the audio
  • the second volume V2 plays the audio of the host C.
  • user A can not only visually feel that speaker B is closer to him, but also can make user A feel that speaker B is closer to him audibly, which improves the audio-visual effect of the recorded video information.
  • the mobile phone 300 may further reduce the recorded third shooting screen 203C based on the above method when a zoom-out operation is detected The second volume of speaker B's audio in .
  • the mobile phone 300 may display the third shooting screen 203C in the preview interface 200, similar to the above-mentioned embodiment.
  • the third photographing screen 203C includes the image b3 of the presenter B displayed in the fifth display size x1" ⁇ y1", and the image c3 of the presenter C displayed in the sixth display size x2" ⁇ y2".
  • the fifth display size x1"*y1" is smaller than the first display size x1*y1
  • the sixth display size x2"*y2" is smaller than the second display size x2*y2.
  • the mobile phone 300 reduces the first volume V1 of the audio of the speaker B in the recorded third shooting screen 203C to the fifth volume V1".
  • the recorded video information is The audiovisual effect during playback is the same as the audiovisual effect when the recorded video information is played after the mobile phone 300 detects the zoom-out operation in the second implementation manner of the embodiment of the present application, and details are not described herein again.
  • the microphone 304C of the cell phone 300 may be an array microphone.
  • the mobile phone 300 can detect the propagation direction of the audio of the presenter B and the propagation direction of the audio of the presenter C based on the same method as in the second implementation manner of the application embodiment, so that it can be Directional enhancement captures the audio of presenter B, and directional suppression captures the audio of presenter C.
  • the mobile phone A may also record the audio of the host C as the audio to be shielded based on the same method as in the second implementation manner of the application embodiment.
  • the mobile phone 300 may add a second identifier (eg, field "0") to the audio of the host C, thereby recording the audio of the host C as the audio to be blocked.
  • the mobile phone 300 detects that the audio of the speaker B in the first shooting picture 203A is added with the first logo, and the mobile phone 300 records the second shooting picture 203B with a third volume V1 ′ higher than the first volume V1
  • the audio of the presenter B in the first shooting screen 203A is added to the audio of the host C in the mobile phone 300, and the audio of the host C in the second shooting screen 203B is not recorded.
  • the mobile phone 300 plays the audio of the presenter B at a third volume V1' higher than the first volume V1; the audio of the presenter C is not played, so that Reduced disruption to Speaker B's audio.
  • the above description is exemplified by raising the first volume V1 of the speaker B's audio to the third volume V1' when the amplification operation is detected.
  • the mobile phone A can also increase the second volume V2 of the audio of the host C to the fourth volume V2' when detecting the amplification operation based on the same method as above, which will not be repeated here.
  • Embodiments of the present application further provide a computer-readable storage medium, where computer program codes are stored in the computer-readable storage medium, and when the processor executes the computer program codes, the electronic device executes the methods in the foregoing embodiments.
  • the embodiments of the present application also provide a computer program product, which, when the computer program product runs on the electronic device, causes the electronic device to execute the method in the foregoing embodiments.
  • Each functional unit in each of the embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • a computer-readable storage medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Studio Devices (AREA)

Abstract

本申请提供一种视频录制方法及电子设备,能够解决电子设备录制视频信息的音量的灵活性差的问题,从而提高播放的视频信息的视听效果。该视频录制方法包括:第一电子设备响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制第一拍摄画面对应的音频。第一电子设备响应于用户在第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频。其中,第一拍摄画面与第二拍摄画面连续。第一电子设备录制第二拍摄画面,且以第二音量录制第二拍摄画面对应的音频。

Description

视频录制方法及电子设备
本申请要求于2021年01月29日提交国家知识产权局、申请号为202110130811.3、申请名称为“视频录制方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端领域,尤其涉及一种视频录制方法及电子设备。
背景技术
目前,终端设备在录制视频时,终端设备录制声音信息的音量大小,通常与终端设备的麦克风采集到的声音信息的音量正相关。用户在录制过程不能灵活调整录制声音,终端设备录制视频信息时的音量调整灵活性较差,视听效果不佳。
发明内容
本申请实施例提供一种视频录制方法及电子设备,能够解决电子设备录制视频信息的音量的灵活性差的问题,从而提高播放的视频信息的视听效果。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请实施例提供一种视频录制方法,应用于第一电子设备。具体地,第一方面提供的方法包括:第一电子设备响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制第一拍摄画面对应的音频。第一电子设备响应于用户在第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频。其中,第一拍摄画面与第二拍摄画面连续。第一电子设备录制第二拍摄画面,且以第二音量录制第二拍摄画面对应的音频。其中,第二音量大于第一音量,或第二音量对应的声音放大率大于第一音量的声音放大率,声音放大率是指第一电子设备输出的音量与采集到的音量的倍率。
本申请第一方面提供的视频录制方法,第二拍摄画面以及第二拍摄画面对应的音频,是用户在第一拍摄画面的放大操作后得到的。在录制第二拍摄画面的同时,以第二音量录制第二拍摄画面对应的音频。这样在播放录制的视频信息时,可以使得用户在视觉上能够感觉第二拍摄画面距离自己更近,同时,还可以使得用户在听觉上也能够感觉第二拍摄画面的距离自己更近,提高录制的视频信息的视听效果。
一种可能的设计方案中,第一拍摄画面对应的音频包括第一拍摄对象的音频,在第一电子设备响应于用户在第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前。第一方面提供的方法还包括:第一电子设备建立第一拍摄对象的图像与音频关联关系。第一电子设备响应于用户在第一拍摄画面上的第二操作,记录第一拍摄对象的音频为待调节音频。第一电子设备以第二音量录制第二拍摄画面对应的音频,包括:第一电子设备以第二音量录制第二拍摄画面中第一拍摄对象对应的音频。如此,第一电子设备可以以第二音量录制选中的第一拍摄对象对应的音频, 灵活性更高。
进一步地,第一拍摄画面对应的音频包括第二拍摄对象的音频,第一方面提供的方法还包括:第一电子设备建立第二拍摄对象的图像与音频关联关系。第一电子设备以第二音量录制第二拍摄画面对应的音频,还包括:第一电子设备以第一音量或第一音量对应的声音放大率录制第二拍摄画面中第二拍摄对象对应的音频。如此,第一电子设备可以仅以第二音量录制选中的第一拍摄对象对应的音频,灵活性更高。
或者,进一步地,第一拍摄画面对应的音频还包括第三拍摄对象的音频。在第一电子设备响应于用户在第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前,第一方面提供的方法还包括:第一电子设备建立第三拍摄对象的图像与音频关联关系。第一电子设备响应于用户在第一拍摄画面上的第三操作,记录第三拍摄对象的音频为待调节音频。第一电子设备以第二音量录制第二拍摄画面对应的音频,还包括:第一电子设备以第二音量或第二音量对应的声音放大率录制第二拍摄画面中第三拍摄对象对应的音频。如此,第一电子设备还可以以第二音量录制选中的第三拍摄对象对应的音频,灵活性进一步地更高。
或者,进一步地,第一拍摄画面对应的音频包括第二拍摄对象的音频。第一方面提供的方法还包括:第一电子设备建立第二拍摄对象的图像与音频关联关系。第一电子设备以第二音量或第二音量对应的声音放大率录制第二拍摄画面对应的音频,还包括:屏蔽第二拍摄画面中第二拍摄对象的图像关联的音频。相应地,在播放录制的第二拍摄画面对应的音频时,第一电子设备以高于第一音量的第三音量播放第一拍摄对象的音频;对第二拍摄对象的音频不予播放,从而减少了对第一拍摄对象的音频的干扰。
进一步地,第一电子设备建立第一拍摄对象的图像与音频关联关系,以及建立第二拍摄对象的图像与音频的关联关系,包括:第一电子设备提取第一拍摄对象的第一人脸特征和音频的第一声音特征向量。第一电子设备根据第一拍摄对象的第一人脸特征的唇形,确定唇形对应的第一发音特征。第一电子设备提取第一声音特征向量的第二发音特征。若第一发音特征和第二发音特征的相似度大于相似度阈值,则第一电子设备建立第一人脸特征与第一声音特征向量的关联关系。
或者,进一步地,第一方面提供的方法还包括:在第一电子设备响应于用户在预览界面的第一操作时,分别与第二电子设备、第三电子设备通信连接。第一电子设备建立第一拍摄对象的图像与音频关联关系,以及建立第二拍摄对象的图像与音频的关联关系,包括:第一电子设备提取第一拍摄对象的第一人脸特征和音频的第一声音特征向量,以及提取第二拍摄对象的第二人脸特征和音频的第二声音特征向量。第一电子设备向第二电子设备、第三电子设备发送第一人脸特征、第一声音特征向量、第二人脸特征以及第二声音特征向量。第一电子设备接收来自第二电子设备的第一人脸特征和第一声音特征向量的关联关系,以及接收来自第三电子设备的第二人脸特征和第二声音特征向量的关联关系。
一种可能的设计方案中,采集第一电子设备采集第二拍摄画面以及第二拍摄画面对应的音频,包括:第一电子设备检测第二拍摄画面中的第一拍摄对象的音频的第一传播方向和第二拍摄对象的音频的第二传播方向。第一电子设备在第一传播方向定向 增强采集第二拍摄画面中的第一拍摄对象的音频,以及在第二传播方向定向抑制采集第二拍摄画面中的第二拍摄对象的音频。在播放录制的第二拍摄画面对应的音频时,由于第一拍摄对象的音频是第一电子设备的阵列式麦克风定向增强采集的,播放的第一拍摄对象的音频也进一步更清晰;再者,第二拍摄对象的音频是第一电子设备的阵列式麦克风定向抑制采集的,对第二拍摄对象的音频的干扰小,进一步提高了用户欣赏视频信息的视听效果。
一种可能的设计方案中,第一电子设备根据第一拍摄画面、第二拍摄画面以及第一音量,确定第二音量。
进一步地,第一电子设备根据第一拍摄画面、第二拍摄画面以及第一音量,确定第二音量,包括:第一电子设备根据第一音量、调焦倍数,确定第二音量。其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
Figure PCTCN2022071129-appb-000001
为调焦倍数,V为第一音量,V′为第二音量。
或者,进一步地,第一电子设备根据第一拍摄画面、第二拍摄画面以及第一音量,确定第二音量,包括:第一电子设备根据第一音量、尺寸放大比例,确定第二音量。其中,x1×y1为第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为第二拍摄画面中第一拍摄对象的第三显示尺寸,
Figure PCTCN2022071129-appb-000002
为尺寸放大比例。
或者,进一步地,第一电子设备根据第一拍摄画面、第二拍摄画面以及第一音量,确定第二音量,包括:第一电子设备根据第一音量、尺寸放大比例以及调焦倍数,确定第二音量。其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
Figure PCTCN2022071129-appb-000003
为调焦倍数,x1×y1为第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为第二拍摄画面中第一拍摄对象的第三显示尺寸,V为第一音量,V′为第二音量,
Figure PCTCN2022071129-appb-000004
为尺寸放大比例。
一种可能的设计方案中,第一电子设备处于耳机模式。在第一电子设备采集第二拍摄画面以及第二拍摄画面对应的音频之后,第一方面提供的方法还包括:第一电子设备在应用程序的预览界面上显示第二拍摄画面,以及以录制的音量输出第二拍摄画面对应的音频至耳机播放。第一电子设备可以在录制视频信息的同时播放录制到的视频信息,进而用户在边录边播的视频信息时,视频信息中的第二拍摄画面能够与以录制的音量播放的音频匹配,用户的视听效果更好。
或者,一种可能的设计方案中,第一电子设备未处于耳机模式。在第一电子设备录制第二拍摄画面,且以第二音量录制第二拍摄画面对应的音频之后,第一方面提供的方法还包括:
第一电子设备响应于用户在预览界面上的停止操作,基于录制的第二拍摄画面以及第二拍摄画面对应的音频生成视频文件。第一电子设备响应于用户对视频文件的打开操作,在应用程序的预览界面上显示第二拍摄画面,以及以录制的音量在第一电子设备的扬声器播放第二拍摄画面对应的音频,视频信息中的第二拍摄画面能够与以录制的音量播放的音频匹配,用户的视听效果更好。
第二方面,本申请实施例提供一种视频录制方法,应用于第一电子设备,第二方面提供的方法包括:第一电子设备响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制第一拍摄画面对应的音频。第一电子设备响应于用户在第一拍摄画面的缩小操作,采集第二拍摄画面以及第二拍摄画面对应的音频。其 中,第一拍摄画面与第二拍摄画面连续。第一电子设备录制第二拍摄画面,且以第二音量录制第二拍摄画面对应的音频。其中,第二音量小于第一音量,或第二音量对应的声音放大率小于第一音量的声音放大率,声音放大率是指第一电子设备输出的音量与采集到的音量的倍率。
一种可能的设计方案中,第一拍摄画面对应的音频包括第一拍摄对象的音频,在第一电子设备响应于用户在第一拍摄画面的缩小操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前,第二方面提供的方法还包括:第一电子设备建立第一拍摄对象的图像与音频关联关系。第一电子设备响应于用户在第一拍摄画面上的第二操作,记录第一拍摄对象的音频为待调节音频。第一电子设备以第二音量录制第二拍摄画面对应的音频,包括:第一电子设备以第二音量或第二音量对应的声音放大率录制第二拍摄画面中第一拍摄对象对应的音频。
进一步地,第一拍摄画面对应的音频包括第二拍摄对象的音频。第二方面提供的方法还包括:第一电子设备建立第二拍摄对象的图像与音频关联关系。第一电子设备以第二音量录制第二拍摄画面对应的音频,还包括:第一电子设备以第一音量或第一音量对应的声音放大率录制第二拍摄画面中第二拍摄对象对应的音频。
进一步地,第二方面提供的方法还包括:在第一电子设备响应于用户在预览界面的第一操作时,分别与第二电子设备、第三电子设备通信连接。第一电子设备建立第一拍摄对象的图像与音频关联关系,以及建立第二拍摄对象的图像与音频的关联关系,包括:第一电子设备提取第一拍摄对象的第一人脸特征和音频的第一声音特征向量,以及提取第二拍摄对象的第二人脸特征和音频的第二声音特征向量。第一电子设备向第二电子设备、第三电子设备发送第一人脸特征、第一声音特征向量、第二人脸特征以及第二声音特征向量。第一电子设备接收来自第二电子设备的第一人脸特征和第一声音特征向量的关联关系,以及接收来自第三电子设备的第二人脸特征和第二声音特征向量的关联关系。
进一步地,第一电子设备根据第一拍摄画面、第二拍摄画面以及第一音量,确定第二音量,包括:第一电子设备根据第一音量、调焦倍数,确定第二音量。或者,第一电子设备根据第一音量、尺寸放大比例,确定第二音量。或者,第一电子设备根据第一音量、尺寸放大比例以及调焦倍数,确定第二音量。其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
Figure PCTCN2022071129-appb-000005
为焦距缩小倍数,x1×y1为第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为第二拍摄画面中第一拍摄对象的第三显示尺寸,V为第一音量,V′为第二音量,
Figure PCTCN2022071129-appb-000006
为尺寸缩小倍数。
第三方面,本申请还提供一种电子设备,包括:存储器;一个或多个处理器;以及一个或多个计算机程序。其中一个或多个计算机程序存储在存储器上,当计算机程序被一个或多个处理器执行时,使得电子设备执行如本申请第一方面或第二方面中第一电子设备执行的视频录制方法。
第四方面,本申请还提供一种计算机可读存储介质,计算机可读存储介质包括计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行如本申请第一方面或第二方面提供的视频录制方法。
第五方面,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机 程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行如本申请第一方面或第二方面提供的视频录制方法。
可以理解地,上述第三方面提供的电子设备、第四方面提供的计算机可读存储介质,以及第五方面提供的计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1中的(a)为本申请实施例提供的音乐会场景的示意图;
图1中的(b)为本申请实施例提供的第一拍摄画面的示意图一;
图2为本申请实施例提供的手机的结构示意图;
图3为本申请实施例提供的视频录制方法的流程图;
图4中的(a)为本申请实施例提供的在第一拍摄画面中输入放大操作的示意图一;
图4中的(b)为本申请实施例提供的第一拍摄画面的示意图二;
图5为本申请实施例提供的声音传播的原理示意图;
图6中的(a)为本申请实施例提供的在第一拍摄画面中输入缩小操作的示意图一;
图6中的(b)为本申请实施例提供的第一拍摄画面的示意图三;
图7为本申请实施例提供的根据YOLO模型识别第一拍摄画面203A中的人脸特征的原理示意图;
图8为本申请实施例提供的视听识别模型的结构示意图;
图9中的(a)为本申请实施例提供的在第一拍摄画面中输入对表演者B的图像b1的选中操作的示意图;
图9中的(b)为点击图11中表演者B的图像b1一侧的按钮的示意图;
图10中的(a)为本申请实施例提供的在第一拍摄画面中输入对表演者C的图像c1的选中操作的示意图;
图10中的(b)为点击图10中的表演者C的图像c1一侧的按钮的示意图;
图11为本申请实施例提供的分布式系统的交互示意图一;
图12为本申请实施例提供的会议场景的示意图;
图13为本申请实施例提供的第一拍摄画面的示意图三;
图14为本申请实施例提供的点击第一拍摄画面中的提示按钮的示意图;
图15为本申请实施例提供的分布式系统的交互示意图二;
图16为本申请实施例提供的分布式系统的交互示意图三;
图17中的(a)为本申请实施例提供的对第一拍摄画面中的图像b1的输入选中操作的示意图;
图17中的(b)为点击图17中的(a)中表演者B的图像b1一侧的按钮的示意图;
图18中的(a)为本申请实施例提供的在第一拍摄画面中输入放大操作的示意图二;
图18中的(b)为本申请实施例提供的第一拍摄画面的示意图四;
图19中的(a)为本申请实施例提供的在第一拍摄画面中输入缩小操作的示意图一;
图19中的(b)为本申请实施例提供的第三拍摄画面的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
本申请中字符“/”,一般表示前后关联对象是一种“或者”的关系。例如,A/B可以理解为A或者B。
术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
此外,本申请的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括其他没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。
另外,在本申请实施例中,“示例性的”、或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”、或者“例如”等词旨在以具体方式呈现概念。
在日常生活中,人们为了记录有价值或者感兴趣的场景,通过会利用具备视频拍摄功能的电子设备录制有价值或者感兴趣的场景,以便日后回放。
以音乐会场景为例,如图1中的(a)所示,当用户A到演播厅聆听音乐会时,可以在演播厅观看到表演者B、表演者C正在唱歌。如果用户A对表演者B、表演者C正在唱歌的表演感兴趣,则用户A可以打开手机300的相机APP。进而,如图1中的(b)所示,手机300可显示相机APP的预览界面200,用户A可以点击预览界面200中的录制按钮201,采集并录制演播厅内的视频信息。其中,手机300录制的视频信息可以包括摄像头353采集到的第一拍摄画面203A和麦克风实时采集到的音频。如果在录制过程中用户需要放大或缩小预览界面200中的第一拍摄画面203A,用户A可以向预览界面200中输入放大或缩小操作。以用户A输入放大操作为例,响应于该放大操作,手机300可以通过调节摄像头353的焦距放大预览界面200中的第一拍摄画面203A,从而向用户A呈现更加近距离的拍摄效果。
在本申请实施例中,当用户A点击预览界面200中的录制按钮201,触发手机300采集并录制演播厅内的视频信息时,若手机300检测到用户A在相机APP的预览界面200输入的放大操作,则手机300在对录制到第一拍摄画面203A中的图像的尺寸进行放大的同时,可以对录制到的音频的音量进行放大。若手机300检测到用户A在相机APP的预览界面200输入的缩小操作,则在对录制到的第一拍摄画面203A中图像的尺寸进行缩小的同时,可以对录制到的音频的音量进行降低。
这样一来,当手机300后续在播放录制的视频信息时,如果视频信息中的拍摄画面被放大,则与拍摄画面对应的音频的音量也被提高;相应的,如果视频信息中的拍 摄画面被缩小,则与拍摄画面对应的音频的音量也被降低,使得视频信息中拍摄画面的大小与音频的音量匹配,提高用户对录制的视频信息的视听效果。
本申请实施例提供的一种视频录制方法可应用于电子设备,该电子设备可以为手机、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、手持计算机、上网本、个人数字助理(personal digital assistant,PDA)、可穿戴电子设备、虚拟现实设备等,本申请实施例对此不做任何限制。
示例性的,如图2所示,本申请实施例中的电子设备可以为手机300。下面以手机300为例对实施例进行具体说明。应该理解的是,图示手机300仅是上述电子设备的一个范例,并且手机300可以具有比图中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。
如图2所示,手机300包括处理器301、内部存储器321、外部存储器接口322、天线A、移动通信模块331、天线B、无线通信模块332、音频模块340、扬声器340A、受话器340B、麦克风340C、耳机接口340D、显示屏351、用户标识模块(subscriber identification module,SIM)卡接口352、摄像头353、按键354、传感器模块360、通用串行总线(universal serial bus,USB)接口370、充电管理模块380、电源管理模块381和电池382。在另一些实施例中,手机300还可以包括马达、指示器等。
其中,处理器301可以包括一个或多个处理单元。例如,处理器301可以包括应用处理器(application processor,AP)、调制解调器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等。需要说明的是,不同的处理单元可以是独立的器件,也可以集成在一个或多个独立的处理器,可以与手机300中的其它模块集成在同一个器件中。以调制解调器为例,调制解调器可以为独立于处理器301的一个处理单元,也可以与其它处理单元(例如AP、ISP、GPU等)集成在同一个器件中,还可以将部分或全部功能与移动通信模块331集成在同一个器件中。
内部存储器321可以用于存储数据和/或至少一个计算机程序,该至少一个计算机程序包括指令。具体的,内部存储器321可以包括存储程序区和存储数据区。其中,存储程序区可以存储至少一个计算机程序。计算机程序可以包括应用程序(比如图库、联系人等)、操作系统(比如Android操作系统、或者IOS操作系统等)、或者其它程序等。存储数据区可存储手机300使用过程中所创建的数据、接收到的来自其它设备(例如其它手机、网络设备、服务器等)数据、或在出厂之前预先存储的数据等中的至少一个。例如,内部存储器321中存储的数据可以为图像、文件、或标识等信息中的至少一个。
在一些实施例中,内部存储器321可以包括高速随机存取存储器和/或非易失性存储器。例如,内部存储器321包括一个或多个磁盘存储器件、闪存(flash)、或者通用闪存存储器(universal flash storage,UFS)等。
其中,处理器301可以通过调用存储在内部存储器321中存储的一个或多个计算机程序和/或数据,从而使得手机300实现一个或多个功能,满足用户的需求。例如,处理器301可以通过调用存储在内部存储器321存储的指令和数据,使得电子设备执 行本申请实施例中所提供的视频录制方法。
外部存储器接口322可以用于连接外部存储卡(例如,micro SD卡),实现扩展手机300的存储能力。外部存储卡通过外部存储器接口322与处理器301通信,实现数据存储功能。例如将图像、音乐、视频等文件保存在外部存储卡中。
在一些实施例中,处理器301中还可以设置缓存区,用于保存处理器301需要循环使用的指令和/或数据,如果处理器301需要再次使用该指令或数据,可从该缓存区中直接调用。从而有助于避免重复存取,降低处理器301的等待时间,从而有助于提高系统的效率。例如,缓存区可以通过高速缓冲存储器实现。
天线A和天线B用于发射和接收电磁波信号。手机300中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线A复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块331可以用于根据手机300支持的移动通信技术(例如2G、3G、4G或5G等)实现手机300与网络设备的通信。示例的,手机300支持的移动通信技术可以包括GSM、GPRS、CDMA、WCDMA、TD-SCDMA、LTE、或NR等中的至少一个。例如,手机300支持GSM,手机300当通过GSM通信系统中的BTS所提供的小区接入网络后,可以在接入的小区的网络信号强度不低于判决门限的情况下,也就是在手机300处于驻网的状态下,通过移动通信模块331实现手机300与BTS的通信。示例的,移动通信模块331可以对调制解调器调制后的信号放大后,经由天线A发送给网络设备;移动通信模块331也可以通过天线A接收网络设备发送的信号、并放大,然后发送给调制解调器,由调制解调器将接收到的信号解调为低频基带信号,然后在进行其它相应的处理。在一些实施例中,移动通信模块331可以包括滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。
无线通信模块332可以提供应用在手机300上的包括无线接入网(wireless local area networks,WLAN)(如无线保真(wireless-fidelity,Wi-Fi)网络)、蓝牙(Bluetooth,BT)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外技术(infrared,IR)等无线通信的解决方案。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS)、全球导航卫星系统(global navigation satellite system,GLONASS)、北斗卫星导航系统(beidou navigation satellite system,BDS)、准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)等中的至少一个。示例的,无线通信模块332可以是集成至少一个通信处理模块的一个或多个器件。其中,无线通信模块332可以根据自身支持的无线通信技术(例如Wi-Fi、蓝牙、FM或者NFC等)通过天线B实现与相应的设备通信的。
手机300可以通过音频模块340、扬声器340A、受话器340B、麦克风340C、耳机接口340D以及AP等实现音频功能。例如音乐播放、录音等。其中,麦克风340C可以为麦克风阵列。该麦克风阵列可以包含多个麦克风,分别用于接收来自不同方向的音频信号。该麦克风阵列可以实现定向增强收音功能和定向抑制收音功能。
手机300可以通过GPU、显示屏351、以及AP等实现显示功能。显示屏351可以用于显示图像、视频等。显示屏351可以包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、Miniled、MicroLed、Micro-oLed、量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机300可以包括1个或N个显示屏351,N为大于1的正整数。
按键354可以包括开机键、音量键等。按键354可以是机械按键,也可以是虚拟按钮或虚拟选项等。手机300可以接收按键输入,产生与手机300的用户设置以及功能控制有关的键信号输入。
传感器模块360可以包括一个或多个传感器。例如,传感器模块360包括加速度传感器360A、触摸传感器360B、指纹传感器360C等。在一些实施例中,传感器模块360还可以包括压力传感器、陀螺仪传感器、环境传感器、距离传感器、接近光传感器、骨传导传感器等。
加速度传感器(acceleration sensor,ACC sensor)360A可采集手机300在各个方向上(一般为三轴)加速度的大小。当手机300静止时可检测出重力的大小及方向。此外,加速度传感器360A还可以用于识别手机300的姿态,应用于横竖屏切换、计步器等应用。在一些实施例中,加速度传感器360A可以通过微控制单元(micro controller unit,MCU)实现与处理器301连接,从而有助于节省手机300的功耗。例如,加速度传感器360A可以通过MCU与AP、调制解调器连接。在一些实施例中,MCU可以为通用智能传感集线器(Sensor hub)。
触摸传感器360B,也可称为“触控面板”。触摸传感器360B可以设置于显示屏351,由触摸传感器360B与显示屏351组成触摸屏,也称“触控屏”。触摸传感器360B用于检测作用于其上或附近的触摸操作。触摸传感器360B可以将检测到的触摸操作传递给AP,以确定触摸事件类型。然后,手机300根据确定的触摸事件类型,通过显示屏351提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器360B也可以设置于手机300的表面,与显示屏351所处的位置不同。
指纹传感器360C用于采集指纹。手机300可以利用采集的指纹特性实现指纹解锁,访问应用锁、指纹拍照、指纹接听来电等。
SIM卡接口352用于连接SIM卡。SIM卡可以通过插入SIM卡接口352,或从SIM卡接口352拔出,实现和手机300的接触和分离。手机300可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口352可以支持Nano SIM卡、Micro SIM卡、SIM卡等。同一个SIM卡接口352可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口352也可以兼容不同类型的SIM卡。在一些实施例中,SIM卡接口352也可以兼容外部存储卡。手机300通过SIM卡实现通话以及数据通信等功能。在一些实施例中,手机300还可以采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在手机300中,不能和手机300分离。
摄像头353,可将采集到的图像信号输入至处理器301,处理器301可将图像信号 处理为图像帧。其中,摄像头353可以为飞行时间(time of flight,TOF)摄像头。在TOF摄像头为坐标原点的空间坐标系中,TOF摄像头可以采集被拍摄对象体的空间坐标,从而确定被拍摄对象体的方向。
USB接口370是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口370可以用于连接充电器为手机300充电,也可以用于手机300与外围设备之间传输数据。也可以用于连接耳机,在耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对手机300的结构限定。在本申请另一些实施例中,手机300也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块380用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块381用于连接电池382、充电管理模块380与处理器301。电源管理模块381接收电池382和/或充电管理模块380的输入,为处理器301等模块供电。在一些实施例中,电源管理模块381还可以用于监测电池容量、电池循环次数、电池健康状态(漏电、阻抗)等参数。
应理解,图2所示的手机300的结构仅是一个示例。本申请实施例的手机300可以具有比图中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
应当指出的是,本申请实施例中的方案还可以应用于其他电子设备中,相应的名称也可以用其他电子设备中的对应功能的名称进行替代。
下面将结合图3对本申请实施例提供的视频录制方法进行具体阐述。如图3所示,本申请实施例提供的一种视频录制方法,应用于第一电子设备,具体包括:
S1002,响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制所述第一拍摄画面对应的音频。
S1004,响应于用户在第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频,其中,第一拍摄画面与第二拍摄画面连续。
S1006,录制所述第二拍摄画面,且以第二音量录制所述第二拍摄画面对应的音频。
其中,第二音量大于第一音量。或者,第二音量对应的声音放大率大于第一音量的声音放大率,声音放大率是指第一电子设备输出的音量与采集到的音量的倍率。
以第一电子设备为手机300为例,手机300输出的音量可以是耳机或扬声器输出的音量,手机300采集到的音量可以是麦克风采集到的音量。在一种设计中,声音放大率可以对应手机300功放的放大率。
以下结合不同的实例来对本发明实施例的视频录制方法100进行具体的描述。
具体地,该视频录制方法的第一种实现方式如下:
一般,手机300中可以安装相机APP、微信APP、抖音APP等具有采集视频功能的应用程序。下面以相机APP为例,并结合图1-图8,详细说明第一种实现方式中,用户如何通过相机APP触发手机300采集并录制视频信息。
仍以音乐会场景为例,当用户A到演播厅聆听音乐会时,用户A可以打开手机的 相机APP录制视频信息。例如,手机300可以响应于用户A打开相机APP的操作,调用摄像头353开始采集第一拍摄画面,进而将采集到的第一拍摄画面显示在相机APP的预览界面中。如图1中的(b)所示,相机APP的预览界面200中包括第一拍摄画面203A,第一拍摄画面203A中包括第一拍摄对象的图像和第二拍摄对象的图像,如第一拍摄画面203A中包括表演者B的图像b1、表演者C的图像c1。
示例性的,在相机APP的第一拍摄画面203A中,表演者B的图像b1的大小可以为第一显示尺寸,表演者C的图像c1的大小可以为第二显示尺寸。其中,第一显示尺寸可以为表演者B的图像b1占据区域的尺寸,第二显示尺寸可以为表演者C的图像c1占据区域的尺寸。另外,第一显示尺寸还可以为图1中的(b)中圈定图像b1的矩形框的尺寸x1×y1,表演者C的图像c1的大小还可以为图1中的(b)中圈定图像c1的矩形框的尺寸x2×y2。
下面,以第一显示尺寸为图1中的(b)中圈定图像b1的矩形框的尺寸x1×y1,第二显示尺寸为图1中的(b)中圈定图像c1矩形框的尺寸x2×y2举例说明。示例性地,第一显示尺寸x1×y1可以为8mm×6mm、8mm×12mm等,第二显示尺寸x2×y2可以为8mm×10mm、10mm×16mm等,在此不做限定。
请继续参考图1中的(b),相机APP的预览界面200还设置有录制按钮201,用户A可以对录制按钮201输入触控操作。手机300响应于用户A对录制按钮201的触控操作,可以开始录制视频信息。示例性地,视频信息可以包括摄像头353采集到的第一拍摄画面203A以及麦克风304C采集到第一拍摄画面203A对应的音频。进一步地,麦克风304C采集到的音频中可以包括表演者B的音频、表演者C的音频。其中,手机300录制的表演者B的音频的第一音量为V1、录制的表演者C的音频的第二音量为V2,声音放大率为R0。示例性地,第一音量V1可以为30db,40db等,第二音量V2可以为35db,45db等,在此不做限定。声音放大率R0可以是第一音量V1与手机300麦克风采集到的音频的音量的比值/倍率。声音放大率R0也可以是对应的当前手机300功放的放大率。
可以理解地,在手机300的音量调节按钮204未被调节的情况下,手机300录制的表演者B的音频的第一音量V1,与表演者B发出的音频的音量正相关。如表演者B发出的音频越大,录制的表演者B的音频的第一音量V1也越大;反之,表演者B发出的音频的音量越小,录制的表演者B的音频的第一音量V1也越小。同样地,手机300录制的表演者C的音频的第二音量V2,与表演者C发出的音频的音量也正相关,在此不再赘述。
后续,若用户A在观看第一拍摄画面203A时,在视觉上感觉表演者B、表演者C距离自己较远,和/或,在听觉上感觉表演者B、表演者C距离自己较远,视听效果差。如图4中的(a)所示,用户A可以在预览界面200上的任一位置输入放大操作。上述放大操作可以为长按操作。另外,上述的长按操作还可以被替换为扩张手势、双击操作、或向上拖动滚动条(图4中未示出)等操作,在此不做限定。
如图4中的(b)所示,手机300检测到用户A输入的放大操作后,按照第二焦距F2采集第二拍摄画面203B以及录制第二拍摄画面203B。然后,在预览界面200显示第二拍摄画面203B。在第二拍摄画面203B中,也可以包括第三显示尺寸x1′×y1′ 的表演者B的图像b2,第四显示尺寸x2′×y2′的表演者C的图像c2。其中,第三显示尺寸x1′×y1′大于第一显示尺寸x1×y1;第四显示尺寸x2′×y2′大于第二显示尺寸为x2×y2。示例性地,第三显示尺寸x1′×y1′可以为12mm×9mm、12mm×16mm等,第四显示尺寸x2′×y2′可以为12mm×15mm、15mm×24mm等,在此不做限定。
同时,手机300检测到用户A输入的放大操作后,采集第二拍摄画面203B对应的音频。一种可能的实施方式中,第二拍摄画面203B对应的音频是指:拍摄第二拍摄画面203B时采集到的音频。具体地,拍摄第二拍摄画面203B时采集到的音频可以包括第二拍摄画面203B中的表演者B的音频,也可以包括位于第二拍摄画面203B以外的声源的音频,在此不作限定。手机300在录制第二拍摄画面203B对应的音频时,可提高录制的音频的音量。例如,提高表演者B的音频的第一音量V1为第三音量V1′、以及提高录制的表演者C的音频的第二音量V2为第四音量V2′。可以理解地,第三音量V1′大于第一音量V1,第四音量V2′大于第二音量V2。示例性地,第三音量V1′可以为50db、60db等,第四音量V2′可以为55db、70db等,在此不做限定。
在手机300在播放录制的视频信息时,显示第二拍摄画面203B。可以理解地,第二拍摄画面203B包括:以大于第一显示尺寸x1×y1的第三显示尺寸x1′×y1′显示的表演者B的图像b2,以大于第二显示尺寸x2×y2的第四显示尺寸x2′×y2′显示的表演者C的图像c2。由此,当用户A观看第二拍摄画面203B时,在视觉上感觉表演者B、表演者C距离自己更近。
相应地,在播放视频信息中与第二拍摄画面203B对应的音频时,手机300增大声音放大率为R1,以高于第一音量V1的第三音量V1′播放表演者B的音频,以高于第二音量V2的第四音量V2′播放表演者C的音频。这样,不仅使得用户A在视觉上能够感觉表演者B、表演者C距离自己更近,同时,还可以使得用户A在听觉上也能够感觉表演者B、表演者C距离自己更近,提高录制的视频信息的视听效果。
下面,具体介绍手机300如何在检测到放大操作后,提高录制到第二拍摄画面203B对应的音频的音量。
示例性的,手机300在采集上述的第一拍摄画面203A时,可以按照第一焦距F1采集第一拍摄画面203A。后续,当手机300接收到用户A在预览界面200上输入的放大操作时,可以按照第二焦距F2采集第二拍摄画面203B。手机300采集到第二拍摄画面203B后,可以根据第二焦距F2确定第二拍摄画面203B中被拍摄对象的图像的尺寸。
示例地,手机300采集第二拍摄画面203B时,会获取摄像头353与被拍摄对象的实际距离,以及被拍摄对象的实际尺寸。进而,手机300可以根据算式D 2=D 1×F2/R,确定第二拍摄画面203B中的被拍摄对象的图像的尺寸。其中,R为摄像头353与被拍摄对象的实际距离,D 1为被拍摄对象的实际尺寸,F2为第二焦距,D 2为第二拍摄画面203B中的被拍摄对象的显示尺寸。在第二拍摄画面203B中的被拍摄对象的图像包括:表演者B的图像b2,表演者C的图像c2时,手机300可以根据算式D 2=D 1×F2/R,确定表演者B的图像b2的第三显示尺寸x1′×y1′,以及确定表演者C的图像c2的第四显示尺寸x2′×y2′。
当手机300检测到用户A在预览界面200上输入的放大操作时,手机300除了放 大第一拍摄画面203A之外,还可以根据第一焦距F1和第二焦距F2,提高录制的第二拍摄画面203B对应的音频的音量。
具体地,手机300可以根据第一音量、调焦倍数,确定第二音量。例如,手机300根据算式
Figure PCTCN2022071129-appb-000007
提高录制的音频的音量。其中,V为录制的音频的音量,音量V被提高为音量
Figure PCTCN2022071129-appb-000008
为调焦倍数。进而,录制的表演者B的音频的第一音量V1被提高为第三音量V1′,录制的表演者C的音频的第二音量V2被提高为第四音量V2′。其中,第三音量V1′满足
Figure PCTCN2022071129-appb-000009
第四音量V2′满足
Figure PCTCN2022071129-appb-000010
在另一些实施例中,手机300还可以根据第一拍摄画面203A中任一被拍摄对象的图像放大比例,提高录制的第二拍摄画面203B对应的音频的音量。
例如,手机300在采集到第一拍摄画面203A后,可以检测第一拍摄画面203A中任一被拍摄对象在第一拍摄画面203A中的尺寸。以被拍摄对象为表演者B为例,手机300可以根据YOLO(you only live once,YOLO)模型检测到表演者B的图像b1在第一拍摄画面203A中的第一显示尺寸x1×y1。类似的,手机300在采集到第二拍摄画面203B后,也可以按照上述方法检测表演者B在第二拍摄画面203B中的第三显示尺寸x1′×y1′。此时,第二拍摄画面203B中表演者B的图像b2的图像放大比例
Figure PCTCN2022071129-appb-000011
另外,手机300还可以根据上述同样的方法得到表演者C的图像c1的图像放大比例,在此不再赘述。可以理解地,表演者C的图像c1与表演者B的图像b1的图像放大比例相同。
进而,手机300可以根据图像放大比例B,确定提高后的音量V'。例如,手机300可以根据算式
Figure PCTCN2022071129-appb-000012
提高录制的第二拍摄画面203B对应的音频的音量。其中,V为录制的第一拍摄画面203A对应的音频的音量,音量V被提高为音量V′,
Figure PCTCN2022071129-appb-000013
为图像放大比例。进而,表演者B的音频的第一音量V1被提高为第三音量V1′,表演者C的音频的第二音量V2被提高为第四音量V2′。其中,第三音量V1′满足
Figure PCTCN2022071129-appb-000014
Figure PCTCN2022071129-appb-000015
第四音量V2′满足
Figure PCTCN2022071129-appb-000016
在另一些实施例中,手机300还可以结合调焦倍数、以及图像放大比例,提高录制的第二拍摄画面203B对应的音频的音量。
例如,手机300可以根据算式
Figure PCTCN2022071129-appb-000017
提高录制的音频的音量。其中,V为录制的第一拍摄画面203A对应的音频的音量,音量V被提高为音量V′,
Figure PCTCN2022071129-appb-000018
为调焦倍数,
Figure PCTCN2022071129-appb-000019
为图像放大比例的倒数。进而,表演者B的音频的第一音量V1被提高为第三音量V1′,表演者C的音频的第二音量V2被提高为第四音量V2′。其中,第三音量V1′满足
Figure PCTCN2022071129-appb-000020
第四音量V2′满足
Figure PCTCN2022071129-appb-000021
Figure PCTCN2022071129-appb-000022
下面,将结合具体示例详细介绍手机300根据算式
Figure PCTCN2022071129-appb-000023
提高录制的表演者B的音频的音量原理。
通常,声音传播的距离越大,音频的音量越小;反之,声音传播的距离越小,声音的音量越大。如图5所示,图5中包括声源M、位置P1以及位置P2。位置P2与声源M的距离为d1,位置P1与声源M的距离为d2。声源M发出的声音可依次向位置P1和位置P2传播。若在位置P2接收到的音频的音量为Y,则声源M发出的声音传播到位置P1的音量Y′可以根据算式
Figure PCTCN2022071129-appb-000024
得到。
在本申请实施例的第一种实现方式中,以第二焦距F2采集的表演者B的图像b2的第三显示尺寸x1′×y1′,大于以第一焦距F1采集的表演者B的图像b1的第一显示尺寸x1×y1。若以第三显示尺寸x1′×y1′在相机APP的预览界面200上显示表演者B的图像b2,则可以使得用户A在视觉上感觉表演者B的图像离自己的距离自己更近(本质上距离没有发生变化)。由此,可以根据上述声音的传播距离与音量的关系,模拟录制的表演者B的图像的尺寸与录制的表演者B的音频的音量的关系。
具体地,可以根据上述算式
Figure PCTCN2022071129-appb-000025
可以模拟出上述算式
Figure PCTCN2022071129-appb-000026
Figure PCTCN2022071129-appb-000027
进而,手机300可以根据算式
Figure PCTCN2022071129-appb-000028
提高录制的第二拍摄画面203B对应的音频的音量。
在另一些实施例中,当手机300检测到用户A在预览界面200上输入的放大操作时,手机300除了放大第一拍摄画面203A之外,还可以增大声音放大率。具体的,手机300可以通过上述焦距的变化,和/或显示尺寸的变化,来调整声音放大率。从而通过调整后的声音放大率R1来调整放大操作对应的音频的音量。具体调整声音放大率的方法,可以将上述算式中的V′替换为R1,V替换为R0来得到调整后的声音放大率R1,具体细节可参见上述算式的内容,此处不再赘述。
另外,本申请实施例的第一种实现方式中,是以检测到放大操作时,提高录制到的第二拍摄画面203B对应的音频的音量举例说明的。另一种实施例中,如图6中的(a)及图6中的(b)所示,手机300还可以基于上述的方法,在检测到缩小操作时,降低录制的第三拍摄画面203C对应的音频的音量。其中,在检测到缩小操作后,按照第三焦距F3采集第三拍摄画面203C,在预览界面200显示第三拍摄画面203C。示例性地,第三拍摄画面203C包括:以第五显示尺寸x1″×y1″显示的表演者B的图像b3,以第六显示尺寸x2″×y2″显示的表演者C的图像c3。其中,第五显示尺寸x1″×y1″小于第一显示尺寸x1×y1,第六显示尺寸x2″×y2″小于第二显示尺寸x2×y2。同时,手机300降低录制的表演者B的音频的第一音量V1为第五音量V1″、以及降低录制的表演者C的音频的第二音量V2为第六音量V2″。
在后续播放录制的视频信息时,在手机300显示第三拍摄画面203C。第三拍摄画面203C包括:以小于第一显示尺寸x1×y1的第五显示尺寸x1″×y1″显示的表演者B的图像b3,以小于第二显示尺寸x2×y2的第六显示尺寸x2″×y2″显示的表演者C的图像c3。由此,使得用户A在视觉上感觉表演者B、表演者C距离自己更远。
相应地,在播放视频信息中与第三拍摄画面203C对应的音频时,手机300以低于第一音量V1的第五音量V1″播放表演者B的音频,以低于第二音量V2的第六音量V2″播放表演者C的音频。这样,不仅使得用户A在视觉上能够感觉表演者B、表演者C距离自己更远,同时,还可以使得用户A在听觉上也能够感觉表演者B、表演者C距离自己更远,提高录制的视频信息的视听效果。
下面,具体介绍手机300如何在检测到缩小操作后,降低录制到的第三拍摄画面203C对应的音频的音量。
示例性的,手机在采集上述的第一拍摄画面203A时,可以按照第一焦距F1采集第一拍摄画面203A。后续,当手机300接收到用户A在预览界面200上输入的缩小操作时,可以按照第三焦距F3采集第三拍摄画面203C。手机300采集到第三拍摄画 面203C后,手机300可以根据上述同样的方法,基于第三焦距F3确定第三拍摄画面203C的尺寸。
在一些实施例中,当手机300检测到用户A在预览界面200上输入的缩小操作时,手机300除了显示第三拍摄画面203C之外,还可以根据第一焦距F1和第三焦距F3,降低录制的音频的音量。例如,手机300可以根据算式
Figure PCTCN2022071129-appb-000029
降低录制的第三拍摄画面203C对应的音频的音量。其中,V为录制的第一拍摄画面203A对应的音频的音量,音量V被降低为音量V″,
Figure PCTCN2022071129-appb-000030
为调焦倍数。
在另一些实施例中,手机300还可以根据图像缩小比例,降低录制的第三拍摄画面203C对应的音频的音量。具体地,手机300可以基于上述同样的方法,获取图像缩小比例,在此不再赘述。
进而,手机300可以根据图像缩小比例,确定降低后的音量V'。例如,手机300可以根据算式
Figure PCTCN2022071129-appb-000031
降低录制的音频的音量。其中,V为录制的第一拍摄画面203B对应的音频的音量,音量V被降低为音量V″,
Figure PCTCN2022071129-appb-000032
为图像缩小比例。
在另一些实施例中,手机300还可以结合第一焦距F1、第三焦距F3,及图像缩小比例,降低录制的第三拍摄画面203C对应的音频的音量。例如,手机300可以根据算式
Figure PCTCN2022071129-appb-000033
降低录制的第三拍摄画面203C对应的音频的音量。其中,V为录制第一拍摄画面203A对应的音频的音量,音量V被降低为音量V″,
Figure PCTCN2022071129-appb-000034
为调焦倍数,
Figure PCTCN2022071129-appb-000035
为图像缩小比例的倒数。其中,手机300根据算式
Figure PCTCN2022071129-appb-000036
Figure PCTCN2022071129-appb-000037
降低录制的第三拍摄画面203C对应的音频的音量原理与上述的原理相同,在此不再赘述。
在另一些实施例中,当手机300检测到用户A在预览界面200上输入的缩小操作时,手机300除了缩小第一拍摄画面203A之外,还可以获取声音放大率R2。进而,手机300可以根据采集到的第三拍摄画面203C对应的音频的原始音量V0″、声音放大率R2,获取录制的音频的音量V″。其中,V″小于音量V。缩小操作对应的声音放大率的调整方法可以与放大操作对应的声音放大率的调整方法相同。手机300可以通过上述焦距的变化,和/或显示尺寸的变化,来调整声音放大率。从而通过调整后的声音放大率R2来调整缩小操作对应的音频的音量。具体调整声音放大率的方法,可以将上述算式中的V″替换为R2,V替换为R0来得到调整后的声音放大率R2,具体细节可参见上述算式的内容,此处不再赘述。
另外,仍以用户A在手机300的预览界面200中输入放大操作为例,说明手机300播放视频信息的方式。若手机300在录制视频信息时处于耳机模式,手机300可以在录制视频信息的同时播放录制到的视频信息。例如,在手机300根据放大操作显示第二拍摄画面203B的同时,手机300可按照上述方法提高录制的第二拍摄画面203B对应的音频的音量,此时,如果手机300处于耳机模式,则手机300可以实时的按照音量V'在耳机播放录制到的音频。这样,用户A在录制视频信息时,欣赏到的视频信息中的第二拍摄画面203B能够与以音量V′播放的音频匹配,用户A的视听效果更好。
若手机300未处于为耳机模式,手机300在响应到用户A针对录制按钮201触发的停止操作后,根据基于放大操作显示的第二拍摄画面203B与提高的音量V′,生成视 频文件保存。进而,在后续手机300响应到用户A对视频文件的打开操作时,手机300播放录制到的视频信息。例如,在手机300显示第二拍摄画面203B以及在扬声器以音量V′播放录制的第二拍摄画面203B对应的音频。由此,用户A在欣赏视频信息时,视频信息中的第二拍摄画面203B能够与以音量V′播放的音频匹配,用户A的视听效果更好。
本申请实施例的第二种实现方式还提供另一种视频录制方法,该视频录制方法也可以应用于手机300中。与本申请实施例第一种实现方式不同的是,本申请实施例第二种实现方式提供的视频录制方法,手机300可以在检测到放大操作后,仅提高第一拍摄画面203A中选中的被拍摄对象的音频,或者,手机300在检测到缩小操作后,仅降低第一拍摄画面203A中选中的被拍摄对象的音频。
下面仍以相机APP为例,并结合音乐会场景,详细说明第二种实现方式中,用户A如何通过相机APP触发手机300采集并录制视频信息。
当用户A到演播厅聆听音乐会时,用户A可以打开手机300的相机APP录制视频信息。例如,手机300可以响应于用户A打开相机APP的操作,调用摄像头353开始采集第一拍摄画面,进而将采集到的第一拍摄画面显示在相机APP的预览界面中。仍如图1中的(b)所示,相机APP的预览界面200中包括第一拍摄画面203A,第一拍摄画面203A中包括第一拍摄对象的图像和第二拍摄对象的图像,第一拍摄画面203A中包括表演者B的图像b1、表演者C的图像c1。
基于与本申请实施例第一种实现方式同样的方法,可以得到表演者B的图像b1的大小为第一显示尺寸x1×y1,表演者C的图像c1的大小为第二显示尺寸x2×y2。
仍如图1中的(b)所示,相机APP的预览界面200还设置有录制按钮201,用户A可以对录制按钮201输入触控操作。手机300响应于用户A对录制按钮201的触控操作,可以开始录制视频信息。示例性地,视频信息可以包括摄像头353采集到的第一拍摄画面203A以及麦克风304C采集到的第一拍摄画面203A对应的音频。进一步地,麦克风304C采集到的音频可以包括表演者B的音频、表演者C的音频。其中,手机300录制的表演者B的音频的第一音量为V1、录制的表演者C的音频的第二音量为V2。
此外,在本申请实施例的第二种实现方式中,后续并不是提高/降低手机300采集到的所有音频的音量,而是提高/降低第一拍摄画面203A中用户A选中的被拍摄对象的音频的音量。如,当用户A选中的被拍摄对象为表演者B或表演者C时,手机300可提高/降低表演者B的音频的音量或表演者C的音频的音量。在提高/降低音量之前,手机300可以识别第一拍摄画面203A中的人脸特征,并且,手机300还可以从录制的音频中识别声音特征。进而,手机300可以将表演者B的第一人脸特征与表演者B的声音特征匹配、以及将表演者C的第二人脸特征与表演者C的声音特征匹配。
下面介绍手机300如何识别第一拍摄画面203A中识别人脸特征,以及如何从录制的音频中识别声音特征。
对于识别人脸特征而言,如图7所示,一种实施方式中,手机300可以根据YOLO(you only live once,YOLO)模型,将第一拍摄画面203A划分为S 1*S 2个网格207(图7中S 1=4,S 2=7)。然后,手机300可以依次在每个网格207中通过设置的滑动 边界窗208进行滑动卷积,以提取图像特征进行识别。识别结果可以包括滑动边界窗208(bounding box)的取值,置信度(confidence)的取值。具体地,置信度用于指示是否存在人脸。例如,如果存在人脸,则置信度的取值可以为二进制数1;反之,如果不存在人脸,则置信度的取值可以为二进制数0。此外,滑动边界窗208的取值为[x,y,w,h],其中,(x,y)为人脸中心点的坐标,(w,h)为滑动边界窗208的宽度和高度。由此,手机300可以基于上述的方式识别出相机APP的第一拍摄画面203A中的表演者B的第一人脸特征以及表演者C的第二人脸特征。
或者,手机300还可以根据基于深度学习的视频分析算法(amazon rekognition video,ARV),识别出第一拍摄画面203A中的表演者B的第一人脸特征以及表演者C的第二人脸特征,在此不做限定。
对于从录制的音频中识别声音特征而言,首先,手机300可以将录制的音频处理为多个音频帧。进而,手机300可根据梅尔频率倒谱系数(mel frequency cepstrum coefficient,MFCC),利用快速傅里叶变换,将每个音频帧在时域上的音频信号转化为频域上的音频信号。然后,手机300可对频域上的音频信号滤波,并从滤波后的频域上的音频信号提取每个音频帧对应的声音特征向量。接着,手机300可以根据余弦相似度或者欧式距离算法,确定各个音频帧对应的声音特征向量之间的相似度。最后,手机300可以将相似度大于相似度阈值的声音特征向量归为一组。当录制的音频中包括表演者B的音频、表演者C的音频时,手机300按照上述方法可以得到与表演者B的音频对应的一组第一声音特征向量、与表演者C的音频对应的一组第二声音特征向量,从而在录制的音频中识别出不同用户的声音特征。
然后,手机300可以根据视听识别模型(audio-visual recognition,AVR)建立表演者B的第一人脸特征与第一声音特征向量之间的关联关系,以及建立表演者C的第二人脸特征与第二声音特征向量之间的关联关系。
具体地,如图8所示,视听识别模型包括视觉识别网络1001、音频识别网络1002以及匹配网络1003。手机300可以将表演者B的第一人脸特征输入视觉识别网络1001,并将第一声音特征向量输入音频识别网络1002。进而,视觉识别网络1001根据表演者B的第一人脸特征的唇形,确定唇形对应的第一发音特征,将第一发音特征输入到匹配网络1003;音频识别网络1002提取第一声音特征向量的第二发音特征,将第二发音特征输入到匹配网络1003。匹配网络1003确定第一发音特征与第二发音特征的相似度。若相似度大于相似度阈值,则手机300建立表演者B的第一人脸特征与第一声音特征向量的关联关系。类似地,手机300还可以根据视听识别模型,建立表演者C的第二人脸特征与第二声音特征向量的关联关系,在此不再赘述。
若用户A仅对表演者B的表演感兴趣,如图9中的(a)所示,用户A可以在第一拍摄画面203A中输入对表演者B的图像b1的选中操作(例如,单击图像b1的操作),当然,该选中操作也可以为双击、长按等操作,在此不作限定。手机300可以响应于用户A对表演者B的图像b1的选中操作,在表演者B的图像b1的一侧显示调节按钮205和屏蔽按钮206。其中,如图9中的(b)所示,若用户A点击显示于表演者B的图像b1的一侧的调节按钮205,则手机300可响应于用户A对调节按钮205的点击操作,根据表演者B的第一人脸特征与第一声音特征向量的关联关系,查找出 与第一声音特征向量对应的表演者B的音频,并记录表演者B的音频为待调节音频。例如,手机300可以在表演者B的音频添加第一标识(如字段“1”),从而记录表演者B的音频为待调节音频。
后续,如图4中的(a)所示,若用户A在观看第一拍摄画面203A时,在视觉上感觉表演者B距离自己较远,或者,在听觉上感觉表演者B距离自己较远,视听效果差,用户A可以在预览界面200上的任一位置输入放大操作。图4中的(a)中,上述放大操作可以为长按操作。
如图4中的(b)所示,手机300检测到用户A输入的放大操作后,与上述实施例类似的,手机300可在预览界面200中显示第二拍摄画面203B,以及录制第二拍摄画面203B。在第二拍摄画面203B中,也可以包括第三显示尺寸x1′×y1′的表演者B的图像b2,第四显示尺寸x2′×y2′的表演者C的图像c2。其中,第三显示尺寸x1′×y1′大于第一显示尺寸x1×y1;第四显示尺寸x2′×y2′大于第二显示尺寸为x2×y2。
同时,手机300检测到用户A输入的放大操作后,检测到表演者B的音频被添加有第一标识,可将表演者B的音频的第一音量V1提高为第三音量V1′,而表演者C的音频未被添加有第一标识,则表演者C的音频的第二音量V2可以保持不变。可以理解地,本申请第二种实现方式中,提高表演者B的音频的第一音量V1为第三音量V1′的原理,与本申请第一种实现方式中,提高表演者B的音频的第一音量V1为第三音量V1′的原理相同,在此不再赘述。不同的是,在第一种实现方式,表演者B的音频是随着录制的音频的整体放大而被放大,而在第二种实现方式中,表演者B的音频是被单独放大的。可以理解地,第三音量V1′大于第一音量V1,由此,使得用户A在视觉上感觉表演者B距离自己更近。
相应地,在播放录制与第二拍摄画面203B对应的音频时,手机300以高于第一音量V1的第三音量V1′播放表演者B的音频,并保持以第二音量V2播放表演者C。这样,不仅使得用户A在视觉上能够感觉表演者B距离自己更近,同时,还可以使得用户A在听觉上也能够感觉表演者B距离自己更近,提高录制的视频信息的视听效果。
另外,上述实施例中是以检测到放大操作时,手机300提高表演者B的音频的音量举例说明的。另一种实施例中,如图6中的(a)、图6中的(b)所示,手机300还可以基于上述的方法,在检测到缩小操作时,降低录制的表演者B的音频的第一音量V1。其中,在检测到缩小操作后,与上述实施例类似的,手机300可在预览界面200中显示第三拍摄画面203C。第三拍摄画面203C包括:以第五显示尺寸x1″×y1″显示的表演者B的图像b3,以第六显示尺寸x2″×y2″显示的表演者C的图像c3。其中,第五显示尺寸x1″×y1″小于第一显示尺寸x1×y1,第六显示尺寸x2″×y2″小于第二显示尺寸x2×y2。同时,手机300降低录制的表演者B的音频的第一音量V1为第五音量V1″。
后续,手机300在播放录制的视频信息时,可以显示上述第三拍摄画面203C。第三拍摄画面203C包括:以小于第一显示尺寸x1×y1的第五显示尺寸x1″×y1″显示的表演者B的图像b3,以小于第二显示尺寸x2×y2的第六显示尺寸x2″×y2″显示的表演者C的图像c3。由此,使得用户A在视觉上感觉表演者B距离自己更远。
相应地,在手机300播放视频信息中与第三拍摄画面203C对应的音频时,手机 300以低于第一音量V1的第五音量V1″播放表演者B的音频;手机300检测到表演者C的音频未被添加第一标识,保持以第二音量V2播放表演者C的音频。这样,不仅使得用户A在视觉上能够感觉表演者B自己更远,同时,还可以使得用户A在听觉上也能够感觉表演者B距离自己更远,提高录制的视频信息的视听效果。
示例性的,手机300的摄像头353可以为TOF摄像头,手机300可以使用TOF摄像头检测表演者B的音频的第一传播方向、表演者C的音频的第二传播方向。其中,表演者B的音频的第一传播方向包括表演者B相对于手机300的俯仰角θ 1和方位角
Figure PCTCN2022071129-appb-000038
表演者C的音频的第二传播方向包括表演者C相对于手机300的俯仰角θ 2和方位角
Figure PCTCN2022071129-appb-000039
具体地,检测过程可以为:在以TOF摄像头为坐标原点的空间坐标系中,TOF摄像头可以检测表演者B的第一坐标(x1,r1,z1)、表演者C的第二坐标(x2,r2,z2)。进而,手机300可以根据算式
Figure PCTCN2022071129-appb-000040
确定表演者B相对于手机300的俯仰角θ 1和方位角
Figure PCTCN2022071129-appb-000041
同样地,手机300可以根据算式
Figure PCTCN2022071129-appb-000042
Figure PCTCN2022071129-appb-000043
确定表演者C相对于手机300的俯仰角θ 2和方位角
Figure PCTCN2022071129-appb-000044
这样,手机300通过计算出的俯仰角θ 1和方位角
Figure PCTCN2022071129-appb-000045
可以确定表演者B的音频,手机300通过计算出的俯仰角θ 2和方位角
Figure PCTCN2022071129-appb-000046
确定表演者C的音频的第二传播方向。
示例性的,手机300的麦克风304C可以为阵列式麦克风。手机300的阵列式麦克风可以采集来自不同传播方向的音频。进而,手机300的阵列式麦克风以获取来自俯仰角为θ 1和方位角为
Figure PCTCN2022071129-appb-000047
的方向的表演者B的音频,并根据表演者B的音频的空间谱特性对表演者B的音频空域滤波,以实现对表演者B的音频进行精准的定向增强采集。并且,为了减少后续表演者C的音频,对表演者B的音频造成干扰,手机300的阵列式麦克风可以根据零陷位置采集来自俯仰角为θ 2、方位角为
Figure PCTCN2022071129-appb-000048
的方向的表演者C的音频,以实现对表演者C的音频定向抑制采集。
相应地,在播放录制的第二拍摄画面203B对应的音频时,由于表演者B的音频是手机300的阵列式麦克风定向增强采集的,播放的表演者B的音频也进一步更清晰;再者,表演者C的音频是手机300的阵列式麦克风定向抑制采集的,对表演者B的音频的干扰小,进一步提高了用户A欣赏视频信息的视听效果。
可选地,在上述的方法中,手机300在第一电子设备采集第二拍摄画面203B以及第二拍摄画面203B对应的音频后,可以对第二拍摄画面203B中的表演者C的音频降噪处理,以消除表演者C的音频中的噪声。相应地,在播放录制的第二拍摄画面203B对应的音频时,由于第二拍摄画面203B中的表演者C的音频被降噪处理过,可以减少对第二拍摄画面203B中的表演者B的音频的干扰。
在另一些实施例中,如图10中的(a)所示,手机300在检测到用户A输入的放大操作之前,还可以响应于用户A对表演者C的图像c1输入的选中操作,在表演者C的图像c1的一侧显示调节按钮205和屏蔽按钮206,以及在表演者C的图像c1的一侧也显示调节按钮205和屏蔽按钮206。如图10中的(b)所示,手机300可响应用户A对显示于表演者C的图像c1的一侧的屏蔽按钮206的点击操作,则手机300可根据上述的方法从表演者C的图像c1中检测出表演者C的第二人脸特征。进而,手机300根据表演者C的第二人脸特征与声音特征的关联关系,查找出表演者C的音频。进而,手机300记录第一拍摄画面203A中的表演者C的音频为待屏蔽音频。例如,手机300可以在第一拍摄画面203A中的表演者C的音频添加第二标识(如字段“0”), 从而记录第一拍摄画面203A中的表演者C的音频为待屏蔽音频。进而,在录制第二拍摄画面203B对应的音频时,手机300检测到第一拍摄画面203A中的表演者B的音频被添加有第一标识,手机300以高于第一音量V1的第三音量V1′录制第二拍摄画面203B对应的表演者B的音频;手机300检测到第一拍摄画面203A中的表演者C的音频添加有第二标识,则对第二拍摄画面203B对应的表演者C的音频不予录制,进而屏蔽第二拍摄画面203B对应的表演者C的音频。
相应地,在播放录制的第二拍摄画面203B对应的音频时,手机300以高于第一音量V1的第三音量V1′播放表演者B的音频;对表演者C的音频不予播放,从而减少了对表演者B的音频的干扰。
本申请实施例的第三种实现方式还提供另一种视频录制方法,该视频录制方法可以应用于手机A中。其中,手机A的结构与上述实施例中的手机300相同,在此不再赘述。下面仍以相机APP为例,并结合会议室开会的场景,详细说明在第三种实现方式中,用户A如何通过相机APP触发手机A采集并录制视频信息。需要说明的是,如图11所示,会议室中可设置分布式系统,分布式系统包括主讲人B持有的手机B、主持人C持有的手机C,且手机B、手机C通过局域网(如,WIFI、蓝牙等)通信连接。
如图12所示,当参会者A在会议室的参加会议过程中,观看到前方的主讲人B在做会议报告,主持人C在主讲人B的一侧旁白。用户A可以打开手机A的相机APP录制视频信息。例如,手机A可以响应于用户A打开相机APP的操作,调用摄像头353开始采集第一拍摄画面,进而,将采集到的第一拍摄画面显示在相机APP的预览界面中。如图13所示,相机APP的预览界面200中包括第一拍摄画面203A,第一拍摄画面203A中包括第一拍摄对象的图像和第二拍摄对象的图像,如第一拍摄画面203A中包括主讲人B的图像b1、主持人C的图像c1。
请继续参考图13,相机APP的预览界面200还设置有录制按钮201,用户A可以对录制按钮201输入触控操作,以触发手机A录制视频信息。
与本申请实施例第二种实现方式不同的是,若手机A满足加入分布式系统的触发条件,则在手机A检测用户A对录制按钮201的点击操作后,如图14所示,相机APP的第一拍摄画面203A还可以显示用于指示加入分布式系统的提示按钮203。
需要说明的是,在手机300显示提示按钮203,可以是手机A满足检测到用户A首次使用手机A的相机APP的条件,这样,在手机A加入过一次分布式系统后,若在后续再次满足加入分布式系统的触发条件,则手机A可以不必在第一拍摄画面203A显示提示按钮203,而是自动加入分布式系统,以避免对用户A造成视觉干扰。另外,在手机A满足检测到用户A第二次、第三次使用手机A的相机APP的条件时,也可以在手机300显示提示按钮203,在此不作限定。
其中,加入分布式系统的触发条件可以为但不限于手机A与手机B、手机C连接上同一WIFI地址。仍如图14所示,用户A可以点击提示按钮203,手机A可响应于用户A对提示按钮203的点击操作,如图15所示,手机A分别与手机B、手机C通信连接。由此,手机A完成加入分布式系统的操作,可以分别与手机B、手机C数据交互。
同时,手机A响应于用户A对录制按钮201的触控操作,开始录制视频信息。示例性地,视频信息可以包括摄像头353采集到的第一拍摄画面203A以及麦克风304C采集到的音频。进一步地,麦克风304C采集到的音频可以包括主讲人B的音频、主持人C的音频。其中,手机A录制的主讲人B的音频的第一音量为V1、录制的主持人C的音频的第二音量为V2。
此外,在本申请实施例的第三种实现方式中,与第二种实现方式类似的,手机A检测到用户在第一拍摄画面203A中输入的放大操作或缩小操作后,不是提高/降低手机A采集到的所有音频的音量,而是针对第一拍摄画面203A中选中的被拍摄对象的音频的音量提高/降低。如,当用户A选中的被拍摄对象为主讲人B或主持人C时,手机300可提高/降低主讲人B的音频的音量或主持人C的音频的音量。因此,在提高/降低音量之前,手机A可以根据与本申请实施例第二种实现方式同样的方法,识别第一拍摄画面203A中的第一人脸特征、第二人脸特征以及识别录制的音频中的第一声音特征、第二声音特征,在此不再赘述。
进而,如图16所示,手机A可以向手机B发送第一拍摄画面203A中主讲人B的人脸图像的第一人脸特征、主持人C的人脸图像的第二人脸特征以及录制的音频中主讲人B的音频的第一声音特征、主持人C的音频的第二声音特征,同时,手机A可以向手机C发送第一拍摄画面203A中的第一人脸特征、第二人脸特征以及录制的音频中的第一声音特征、第二声音特征。
如图16所示,手机B可以接收来自手机A的第一人脸特征、第二人脸特征以及录制的音频中的第一声音特征、第二声音特征。同样地,手机C也可以接收来自手机A的第一拍摄画面203A中的第一人脸特征、第二人脸特征以及录制的音频中的第一声音特征、第二声音特征。
示例地,手机B中可以存储不同的人脸图像和音频,且存储的人脸图像和音频之间是一一对应的。手机B接收到其他设备发送的人脸特征(例如上述第一人脸特征和第二人脸特征)后,可分别将第一人脸特征、第二人脸特征分别与存储的各个人脸图像进行对比,得到对应的人脸相似度。若识别到存储的人脸图像A与第一人脸特征的人脸相似度大于设定的相似度阈值,则手机B可以确定第一人脸特征与存储的人脸图像A匹配。类似地,手机B还可以分别将第一声音特征、第二声音特征分别与存储的各个音频进行对比,得到对应的音频相似度。若识别到存储的音频A与第一声音特征的音频相似度大于设定的相似度阈值,则手机B可确定第一声音特征与存储的音频A匹配。当手机B中存储的音频A与人脸图像A对应时,说明手机B接收到的第一人脸特征与第一声音特征也是对应的。如此,手机B可以建立第一人脸特征与第一声音特征之间的关联关系,并向手机A发送第一人脸特征与第一声音特征之间的关联关系。进而,手机A根据第一人脸特征与第一声音特征之间的关联关系,完成主讲人B的人脸图像与主讲人B的音频的匹配。
上述的匹配的过程中,由于手机A获取到的主讲人B的人脸图像与主讲人B的音频之间的关联关系,是手机B根据已经建立关联关系的人脸图像和音频确定的,因此,主讲人B的人脸图像与主讲人B的音频匹配的精确度高。
类似地,若手机C中存储有一个或多个人脸图像和音频,则手机C可以根据上述 同样的方式,完成主持人C的人脸图像与主持人C的音频的匹配,此处不再赘述。
示例地,若在手机B中检测到未存储有人脸图像和/或音频,或者存储的人脸图像和音频不存在关联关系,则手机B无法建立第一人脸特征与第一声音特征之间的关联关系。进而,手机B可以向手机A发送第一提示信息,第一提示信息用于指示匹配失败。然后,手机A在接收到来自手机B的第一提示信息后,可以基于与本申请实施例的第二种实现方式同样的方式,通过视听识别模型(audio-visual recognition,AVR)建立第一人脸特征与第一声音特征之间的关联关系。进而,手机A根据第一人脸特征与第一声音特征之间的关联关系,完成主讲人B的人脸图像与主讲人B的音频的匹配。
类似地,若在手机C中检测到未存储有主持人C的人脸图像和/或主持人C的音频,或者存储的人脸图像和音频不存在关联关系,导致手机C无法建立第二人脸特征与第二声音特征之间的关联关系。进而,手机C也可以向手机A发送第一提示信息,第一提示信息用于指示匹配失败。进而,手机A接收来自手机C的第一提示信息后,可以基于上述实施例同样的方式,通过视听识别模型(audio-visual recognition,AVR)建立第二人脸特征与第二声音特征之间的关联关系。进而,手机A根据第二人脸特征与第二声音特征之间的关联关系,完成主持人C的人脸图像与主持人C的音频的匹配。
用户A仅对主讲人B的表演感兴趣,用户A可以在第一拍摄画面203A中输入对主讲人B的图像b1的选中操作(例如,单击图像b1的操作),当然,该选中操作也可以为双击、长按等操作,在此不作限定。如图17中的(a)所示,手机A可以响应于用户A对主讲人B的图像b1的选中操作,在主讲人B的图像b1的一侧显示调节按钮205和屏蔽按钮206。其中,选中操作可以为点击、双击、长按等操作,在此不作限定。如图17中的(b)所示,若用户A点击显示于主讲人B的图像b1的一侧的调节按钮205,则手机A响应于用户A对调节按钮205的点击操作,根据主讲人B的人脸特征与第一声音特征向量的关联关系,查找出第一声音特征向量对应的主讲人B的音频,并记录主讲人B的音频为待调节音频。例如,手机300可以在主讲人B的音频添加标识(如字段“1”),从而记录主讲人B的音频为待调节音频。
且若用户A在观看第一拍摄画面203A时,在视觉上感觉主讲人B距离自己较远,或者,在听觉上感觉主讲人B距离自己较远,视听效果差。如图18中的(a)所示,用户A可以在预览界面200上的任一位置输入放大操作。仍如图18中的(a)所示,上述的放大操作可以为长按操作。
如图18中的(b)所示,手机A检测到用户A输入的放大操作后,与上述实施例类似的,手机300可在预览界面200中显示第二拍摄画面203B以及录制第二拍摄画面203B。在第二拍摄画面203B中,也可以包括第三显示尺寸x1′×y1′的主讲人B的图像b2,第四显示尺寸x2′×y2′的主持人C的图像c2。其中,第三显示尺寸x1′×y1′大于第一显示尺寸x1×y1;第四显示尺寸x2′×y2′大于第二显示尺寸为x2×y2。
同时,与上述实施例类似的,手机A可以在检测用户A输入的放大操作时,提高第二拍摄画面203B中主讲人B的音频的第一音量V1为第三音量V1′,而主持人C的音频的第二音量V2可以保持不变。可以理解地,本申请第三种实现方式中,提高主讲人B的音频的第一音量V1为第三音量V1′的方式,与本申请第二种实现方式相同,在此不再赘述。可以理解地,第三音量V1′大于第一音量V1,由此,使得用户A在视觉 上感觉主讲人B距离自己更近。
相应地,与上述实施例类似的,在播放录制的与第二拍摄画面203B对应的音频时,手机A以高于第一音量V1的第三音量V1′播放主讲人B的音频,并保持以第二音量V2播放主持人C的音频。这样,不仅使得用户A在视觉上能够感觉主讲人B距离自己更近,同时,还可以使得用户A在听觉上也能够感觉主讲人B距离自己更近,提高录制的视频信息的视听效果。
另外,本申请实施例的第三种实现方式中,是以检测到放大操作时,提高主讲人B的音频的第一音量V1为第三音量V1′举例说明的。另一种实施例中,如图19中的(a)-图19中的(b)所示,手机300还可以基于上述的方法,在检测到缩小操作时,降低录制的第三拍摄画面203C中的主讲人B的音频的第二音量。其中,在检测到缩小操作后,与上述实施例类似的,手机300可在预览界面200中显示第三拍摄画面203C。第三拍摄画面203C包括:以第五显示尺寸x1″×y1″显示的主讲人B的图像b3,以第六显示尺寸x2″×y2″显示的主持人C的图像c3。其中,第五显示尺寸x1″×y1″小于第一显示尺寸x1×y1,第六显示尺寸x2″×y2″小于第二显示尺寸x2×y2。同时,手机300降低录制的第三拍摄画面203C中的主讲人B的音频的第一音量V1为第五音量V1″。可以理解地,手机A在检测到缩小操作后,录制的视频信息在被播放时的视听效果,与本申请实施例第二种实现方式中手机300检测到缩小操作后,录制的视频信息在被播放时的视听效果相同,在此不再赘述。
类似地,手机300的麦克风304C可以为阵列式麦克风。手机300可以在响应到用户A输入的选中操作时,基于与申请实施例第二种实现方式中同样的方法,检测主讲人B的音频的传播方向、主持人C的音频的传播方向,以便可以定向增强采集主讲人B的音频,并定向抑制采集主持人C的音频。
类似地,手机A还可以基于与申请实施例第二种实现方式中同样的方法,记录主持人C的音频为待屏蔽音频。例如,手机300可以在主持人C的音频添加第二标识(如字段“0”),从而记录主持人C的音频为待屏蔽音频。进而,在录制音频时,手机300检测到第一拍摄画面203A中主讲人B的音频被添加有第一标识,手机300以高于第一音量V1的第三音量V1′录制第二拍摄画面203B中主讲人B的音频;手机300检测到第一拍摄画面203A中主持人C的音频添加有第二标识,则对第二拍摄画面203B中主持人C的音频不予录制。
相应地,在播放录制的第二拍摄画面203B对应的音频时,手机300以高于第一音量V1的第三音量V1′播放主讲人B的音频;对主持人C的音频不予播放,从而减少了对主讲人B的音频的干扰。
类似地,上述是以检测到放大操作时,提高主讲人B的音频的第一音量V1为第三音量V1′举例说明的。手机A还可以基于与上述同样的方法,在检测到放大操作时,提高主持人C的音频的第二音量V2为第四音量V2′,在此不再赘述。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序代码,当处理器执行该计算机程序代码时,电子设备执行上述实施例中的方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在电子设备上 运行时,使得电子设备执行上述实施例中的方法。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种视频录制方法,其特征在于,应用于第一电子设备,所述方法包括:
    所述第一电子设备响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制所述第一拍摄画面对应的音频;
    所述第一电子设备响应于用户在所述第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频,其中,所述第一拍摄画面与所述第二拍摄画面连续;
    所述第一电子设备录制所述第二拍摄画面,且以第二音量录制所述第二拍摄画面对应的音频,其中,所述第二音量大于所述第一音量,或所述第二音量对应的声音放大率大于所述第一音量的声音放大率,所述声音放大率是指所述第一电子设备输出的音量与采集到的音量的倍率。
  2. 根据权利要求1所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第一拍摄对象的音频,在所述第一电子设备响应于用户在所述第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前,所述方法还包括:
    所述第一电子设备建立所述第一拍摄对象的图像与音频关联关系;
    所述第一电子设备响应于用户在所述第一拍摄画面上的第二操作,记录所述第一拍摄对象的音频为待调节音频;
    所述第一电子设备以第二音量录制所述第二拍摄画面对应的音频,包括:所述第一电子设备以第二音量录制所述第二拍摄画面中所述第一拍摄对象对应的音频。
  3. 根据权利要求2所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第二拍摄对象的音频,所述方法还包括:
    所述第一电子设备建立所述第二拍摄对象的图像与音频关联关系;
    所述第一电子设备以第二音量录制所述第二拍摄画面对应的音频,还包括:所述第一电子设备以所述第一音量或所述第一音量对应的声音放大率录制所述第二拍摄画面中所述第二拍摄对象对应的音频。
  4. 根据权利要求2所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第三拍摄对象的音频,在所述第一电子设备响应于用户在所述第一拍摄画面的放大操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前,所述方法还包括:
    所述第一电子设备建立所述第三拍摄对象的图像与音频关联关系;
    所述第一电子设备响应于用户在所述第一拍摄画面上的第三操作,记录所述第三拍摄对象的音频为待调节音频;
    所述第一电子设备以第二音量录制所述第二拍摄画面对应的音频,还包括:所述第一电子设备以所述第二音量或所述第二音量对应的声音放大率录制所述第二拍摄画面中所述第三拍摄对象对应的音频。
  5. 根据权利要求2所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第二拍摄对象的音频,所述方法还包括:
    所述第一电子设备建立所述第二拍摄对象的图像与音频关联关系;
    所述第一电子设备以第二音量或所述第二音量对应的声音放大率录制所述第二拍摄画面对应的音频,还包括:
    屏蔽所述第二拍摄画面中所述第二拍摄对象的图像关联的音频。
  6. 根据权利要求2所述的方法,其特征在于,所述第一电子设备建立所述第一拍摄对象的图像与音频关联关系,包括:
    所述第一电子设备提取所述第一拍摄对象的第一人脸特征和音频的第一声音特征向量;
    所述第一电子设备根据第一拍摄对象的第一人脸特征的唇形,确定唇形对应的第一发音特征;
    所述第一电子设备提取第一声音特征向量的第二发音特征;
    若所述第一发音特征和所述第二发音特征的相似度大于相似度阈值,则所述第一电子设备建立所述第一人脸特征与所述第一声音特征向量的关联关系。
  7. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    在所述第一电子设备响应于用户在所述预览界面的第一操作时,分别与第二电子设备、第三电子设备通信连接;
    所述第一电子设备建立所述第一拍摄对象的图像与音频关联关系,以及建立所述第二拍摄对象的图像与音频的关联关系,包括:
    所述第一电子设备提取所述第一拍摄对象的第一人脸特征和音频的第一声音特征向量,以及提取所述第二拍摄对象的第二人脸特征和音频的第二声音特征向量;
    所述第一电子设备向所述第二电子设备、所述第三电子设备发送所述第一人脸特征、所述第一声音特征向量、所述第二人脸特征以及所述第二声音特征向量;
    所述第一电子设备接收来自所述第二电子设备的所述第一人脸特征和所述第一声音特征向量的关联关系,以及接收来自所述第三电子设备的所述第二人脸特征和所述第二声音特征向量的关联关系。
  8. 根据权利要求3-6任一所述的方法,其特征在于,所述采集所述第一电子设备采集第二拍摄画面以及第二拍摄画面对应的音频,包括:
    所述第一电子设备检测所述第二拍摄画面中的第一拍摄对象的音频的第一传播方向和第二拍摄对象的音频的第二传播方向;
    所述第一电子设备在所述第一传播方向定向增强采集所述第二拍摄画面中的第一拍摄对象的音频,以及在所述第二传播方向定向抑制采集所述第二拍摄画面中的第二拍摄对象的音频。
  9. 根据权利要求1所述的方法,其特征在于,在所述第一电子设备录制所述第二拍摄画面,且以所述第二音量录制所述第二拍摄画面对应的音频之前,所述方法还包括:
    所述第一电子设备根据所述第一拍摄画面、所述第二拍摄画面以及所述第一音量,确定第二音量。
  10. 根据权利要求9所述的方法,其特征在于,所述第一电子设备根据所述第一拍摄画面、所述第二拍摄画面以及所述第一音量,确定第二音量,包括:
    所述第一电子设备根据所述第一音量、调焦倍数,确定第二音量,其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
    Figure PCTCN2022071129-appb-100001
    为调焦倍数,V为第一音量,V′为第二音量。
  11. 根据权利要求9所述的方法,其特征在于,所述第一电子设备根据所述第一拍 摄画面、所述第二拍摄画面以及所述第一音量,确定第二音量,包括:
    所述第一电子设备根据第一音量、尺寸放大比例,确定第二音量,其中,x1×y1为所述第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为所述第二拍摄画面中所述第一拍摄对象的第三显示尺寸,
    Figure PCTCN2022071129-appb-100002
    为尺寸放大比例。
  12. 根据权利要求9所述的方法,其特征在于,所述第一电子设备根据所述第一拍摄画面、所述第二拍摄画面以及所述第一音量,确定第二音量,包括:
    所述第一电子设备根据第一音量、尺寸放大比例以及调焦倍数,确定第二音量,其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
    Figure PCTCN2022071129-appb-100003
    为调焦倍数,x1×y1为所述第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为所述第二拍摄画面中所述第一拍摄对象的第三显示尺寸,V为第一音量,V′为第二音量,
    Figure PCTCN2022071129-appb-100004
    为尺寸放大比例。
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述第一电子设备处于耳机模式,在所述第一电子设备采集第二拍摄画面以及第二拍摄画面对应的音频之后,所述方法还包括:
    所述第一电子设备在应用程序的预览界面上显示第二拍摄画面,以及以录制的音量输出所述第二拍摄画面对应的音频至耳机播放。
  14. 根据权利要求1-12任一项所述的方法,其特征在于,所述第一电子设备未处于耳机模式,在所述第一电子设备录制所述第二拍摄画面,且以第二音量录制所述第二拍摄画面对应的音频之后,所述方法还包括:
    所述第一电子设备响应于用户在所述预览界面上的停止操作,基于录制的所述第二拍摄画面以及所述第二拍摄画面对应的音频生成视频文件;
    所述第一电子设备响应于用户对所述视频文件的打开操作,在应用程序的预览界面上显示第二拍摄画面,以及以录制的音量在第一电子设备的扬声器播放所述第二拍摄画面对应的音频。
  15. 一种视频录制方法,其特征在于,应用于第一电子设备,所述方法包括:
    所述第一电子设备响应于用户在应用程序的预览界面的第一操作录制第一拍摄画面,且以第一音量录制所述第一拍摄画面对应的音频;
    所述第一电子设备响应于用户在所述第一拍摄画面的缩小操作,采集第二拍摄画面以及第二拍摄画面对应的音频,其中,所述第一拍摄画面与所述第二拍摄画面连续;
    所述第一电子设备录制所述第二拍摄画面,且以第二音量录制所述第二拍摄画面对应的音频,其中,所述第二音量小于所述第一音量,或所述第二音量对应的声音放大率小于所述第一音量的声音放大率,所述声音放大率是指所述第一电子设备输出的音量与采集到的音量的倍率。
  16. 根据权利要求15所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第一拍摄对象的音频,在所述第一电子设备响应于用户在所述第一拍摄画面的缩小操作,采集第二拍摄画面以及第二拍摄画面对应的音频之前,所述方法还包括:
    所述第一电子设备建立所述第一拍摄对象的图像与音频关联关系;
    所述第一电子设备响应于用户在所述第一拍摄画面上的第二操作,记录所述第一拍摄对象的音频为待调节音频;
    所述第一电子设备以第二音量录制所述第二拍摄画面对应的音频,包括:所述第一电子设备以所述第二音量或所述第二音量对应的声音放大率录制所述第二拍摄画面中所述第一拍摄对象对应的音频。
  17. 根据权利要求16所述的方法,其特征在于,所述第一拍摄画面对应的音频包括第二拍摄对象的音频,所述方法还包括:
    所述第一电子设备建立所述第二拍摄对象的图像与音频关联关系;
    所述第一电子设备以第二音量录制所述第二拍摄画面对应的音频,还包括:所述第一电子设备以所述第一音量或所述第一音量对应的声音放大率录制所述第二拍摄画面中所述第二拍摄对象对应的音频。
  18. 根据权利要求16所述的方法,其特征在于,所述第一拍摄画面对应的音频还包括第二拍摄对象的音频,所述方法还包括:
    在所述第一电子设备响应于用户在所述预览界面的第一操作时,分别与第二电子设备、第三电子设备通信连接;
    所述第一电子设备建立所述第一拍摄对象的图像与音频关联关系,以及建立所述第二拍摄对象的图像与音频的关联关系,包括:
    所述第一电子设备提取所述第一拍摄对象的第一人脸特征和音频的第一声音特征向量,以及提取所述第二拍摄对象的第二人脸特征和音频的第二声音特征向量;
    所述第一电子设备向所述第二电子设备、所述第三电子设备发送所述第一人脸特征、所述第一声音特征向量、所述第二人脸特征以及所述第二声音特征向量;
    所述第一电子设备接收来自所述第二电子设备的所述第一人脸特征和所述第一声音特征向量的关联关系,以及接收来自所述第三电子设备的所述第二人脸特征和所述第二声音特征向量的关联关系。
  19. 根据权利要求15-18任一所述的方法,其特征在于,所述第一电子设备根据所述第一拍摄画面、所述第二拍摄画面以及所述第一音量,确定第二音量,包括:
    所述第一电子设备根据所述第一音量、调焦倍数,确定第二音量;
    或者,所述第一电子设备根据第一音量、尺寸放大比例,确定第二音量;
    或者,所述第一电子设备根据第一音量、尺寸放大比例以及调焦倍数,确定第二音量;
    其中,F1为第一拍摄画面对应的第一焦距、F2为第二拍摄画面对应的第二焦距,
    Figure PCTCN2022071129-appb-100005
    为焦距缩小倍数,x1×y1为所述第一拍摄画面中的第一拍摄对象的第一显示尺寸,x1′×y1′为所述第二拍摄画面中所述第一拍摄对象的第三显示尺寸,V为第一音量,V′为第二音量,
    Figure PCTCN2022071129-appb-100006
    为尺寸缩小倍数。
  20. 一种电子设备,其特征在于,包括:
    存储器;
    一个或多个处理器;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序存储在所述存储器上,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行如权利要求1-19中任一项中所述第一电子设备执行的视频录制方法。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机 程序或指令,当所述计算机程序或指令在计算机上运行时,使得所述计算机执行如权利要求1-19中任一项所述的视频录制方法。
  22. 一种计算机程序产品,其特征在于,所述计算机程序产品包括:计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得所述计算机执行如权利要求1-19中任一项所述的视频录制方法。
PCT/CN2022/071129 2021-01-29 2022-01-10 视频录制方法及电子设备 WO2022161146A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/263,376 US20240111478A1 (en) 2021-01-29 2022-01-10 Video Recording Method and Electronic Device
EP22745024.4A EP4270937A4 (en) 2021-01-29 2022-01-10 VIDEO RECORDING METHOD AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130811.3 2021-01-29
CN202110130811.3A CN114827448A (zh) 2021-01-29 2021-01-29 视频录制方法及电子设备

Publications (1)

Publication Number Publication Date
WO2022161146A1 true WO2022161146A1 (zh) 2022-08-04

Family

ID=82526092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071129 WO2022161146A1 (zh) 2021-01-29 2022-01-10 视频录制方法及电子设备

Country Status (4)

Country Link
US (1) US20240111478A1 (zh)
EP (1) EP4270937A4 (zh)
CN (1) CN114827448A (zh)
WO (1) WO2022161146A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105183A (zh) * 2017-04-28 2017-08-29 宇龙计算机通信科技(深圳)有限公司 录音音量调节方法及装置
CN108933911A (zh) * 2018-07-27 2018-12-04 深圳市广和通无线股份有限公司 音量调节方法、装置、设备及存储介质
WO2020059447A1 (ja) * 2018-09-18 2020-03-26 富士フイルム株式会社 音声信号処理装置、音声信号処理方法、音声信号処理プログラム、音声信号処理システム及び撮影装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004040491A (ja) * 2002-07-03 2004-02-05 Sony Ericsson Mobilecommunications Japan Inc 映像通信装置
JP2011215222A (ja) * 2010-03-31 2011-10-27 Daiichikosho Co Ltd カラオケ歌唱者映像及び歌唱音声記録システム
JP2011215221A (ja) * 2010-03-31 2011-10-27 Daiichikosho Co Ltd カラオケ歌唱者映像及び歌唱音声記録システム
KR101997449B1 (ko) * 2013-01-29 2019-07-09 엘지전자 주식회사 이동 단말기 및 이의 제어 방법
CN103888703B (zh) * 2014-03-28 2015-11-25 努比亚技术有限公司 增强录音的拍摄方法和摄像装置
EP3073747A1 (en) * 2015-03-26 2016-09-28 Thomson Licensing Method and device for adapting an audio level of a video
CN111724823B (zh) * 2016-03-29 2021-11-16 联想(北京)有限公司 一种信息处理方法及装置
CN106162206A (zh) * 2016-08-03 2016-11-23 北京疯景科技有限公司 全景录制、播放方法及装置
CN110740259B (zh) * 2019-10-21 2021-06-25 维沃移动通信有限公司 视频处理方法及电子设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107105183A (zh) * 2017-04-28 2017-08-29 宇龙计算机通信科技(深圳)有限公司 录音音量调节方法及装置
CN108933911A (zh) * 2018-07-27 2018-12-04 深圳市广和通无线股份有限公司 音量调节方法、装置、设备及存储介质
WO2020059447A1 (ja) * 2018-09-18 2020-03-26 富士フイルム株式会社 音声信号処理装置、音声信号処理方法、音声信号処理プログラム、音声信号処理システム及び撮影装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4270937A4

Also Published As

Publication number Publication date
CN114827448A (zh) 2022-07-29
EP4270937A1 (en) 2023-11-01
US20240111478A1 (en) 2024-04-04
EP4270937A4 (en) 2024-06-12

Similar Documents

Publication Publication Date Title
WO2021078116A1 (zh) 视频处理方法及电子设备
WO2021104197A1 (zh) 对象跟踪方法及电子设备
JP7244666B2 (ja) 画面制御方法、電子デバイス、および記憶媒体
WO2020078237A1 (zh) 音频处理方法和电子设备
US11889180B2 (en) Photographing method and electronic device
CN108401124B (zh) 视频录制的方法和装置
KR101874895B1 (ko) 증강 현실 제공 방법 및 이를 지원하는 단말기
CN110865754B (zh) 信息展示方法、装置及终端
WO2018058899A1 (zh) 一种智能终端的音量调节方法及其装置
WO2021057673A1 (zh) 一种图像显示方法及电子设备
JP2016531362A (ja) 肌色調整方法、肌色調整装置、プログラム及び記録媒体
WO2021036623A1 (zh) 显示方法及电子设备
CN108418916A (zh) 基于双面屏的图像拍摄方法、移动终端及可读存储介质
CN112788359B (zh) 直播处理方法、装置、电子设备及存储介质
CN108156374A (zh) 一种图像处理方法、终端及可读存储介质
CN114724055A (zh) 视频切换方法、装置、存储介质及设备
CN111613213B (zh) 音频分类的方法、装置、设备以及存储介质
CN112269559A (zh) 音量调整方法、装置、电子设备及存储介质
CN114466283A (zh) 音频采集方法、装置、电子设备及外设组件方法
WO2022161146A1 (zh) 视频录制方法及电子设备
US20240144948A1 (en) Sound signal processing method and electronic device
CN113301444B (zh) 视频处理方法、装置、电子设备及存储介质
CN113645510B (zh) 一种视频播放方法、装置、电子设备及存储介质
WO2023202431A1 (zh) 一种定向拾音方法及设备
WO2024027374A1 (zh) 隐藏信息显示方法、设备、芯片系统、介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745024

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18263376

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2022745024

Country of ref document: EP

Effective date: 20230725

NENP Non-entry into the national phase

Ref country code: DE