WO2021078116A1

WO2021078116A1 - Video processing method and electronic device

Info

Publication number: WO2021078116A1
Application number: PCT/CN2020/122176
Authority: WO
Inventors: 孙华伟
Original assignee: 维沃移动通信有限公司
Priority date: 2019-10-21
Filing date: 2020-10-20
Publication date: 2021-04-29
Also published as: CN110740259B; CN110740259A

Abstract

Disclosed are a video processing method and an electronic device. The method comprises: when a video recording operation is received, starting a camera for image acquisition and starting a microphone for sound acquisition; extracting feature information of photographed objects contained in an acquired image; extracting feature information of sound source objects contained in acquired sound; matching the photographed objects and the sound source objects on the basis of the feature information of the photographed objects and the sound source objects to obtain a matching relationship between the photographed objects and the sound source objects; receiving a selection operation for the photographed objects, and selecting a first photographed object from the photographed objects contained in the acquired image; determining a first sound source object matching the first photographed object from the sound source objects contained in the acquired sound according to the matching relationship; and performing preset first anti-interference processing on a sound track corresponding to a second sound source object contained in the acquired sound, and performing synthesis processing on the sound obtained by the preset first anti-interference processing and the acquired image to obtain a target video.

Description

Video processing method and electronic equipment

This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office on October 21, 2019, the application number is 201911002660.2, and the application name is "Video Processing Method and Electronic Equipment", the entire content of which is incorporated into this application by reference .

Technical field

This application relates to the field of multimedia technology, in particular to a video processing method and electronic equipment.

Background technique

In recent years, with the rapid development of Internet technology and the upgrading of equipment hardware configuration, the functions of electronic equipment have become more and more abundant. More and more users use electronic equipment to carry out entertainment activities, such as the use of electronic equipment for live video and vlog( video weblog, video podcast) shooting and other video recording activities. At present, in the video recording process, some noise is often included. In related technologies, the recorded video is edited by professional equipment to filter out the noise, which results in high cost and cumbersome operation.

Summary of the invention

The embodiments of the present application provide a video processing method and electronic equipment to solve the technical problems of high video processing cost and cumbersome operation in related technologies.

In order to solve the above technical problems, the embodiments of the present application are implemented as follows:

In the first aspect, an embodiment of the present application provides a video processing method applied to an electronic device, and the method includes:

When a video recording operation is received, turning on the camera of the electronic device for image collection, and turning on the microphone of the electronic device for sound collection;

Determine the subject included in the image collected by the camera, and extract characteristic information of the subject; and determine the sound source object included in the sound collected by the microphone, and extract the characteristic information of the sound source object, where , Different sound source objects correspond to different audio tracks;

Matching the photographed object and the sound source object based on the characteristic information of the photographed object and the characteristic information of the sound source object to obtain a matching relationship between the photographed object and the sound source object;

Receiving a selection operation for the shooting object;

In response to the selection operation, select a first photographic subject from the photographic subjects contained in the image collected by the camera;

Determine, according to the matching relationship, a first sound source object that matches the first shooting object among sound source objects included in the sound collected by the microphone;

Perform preset first anti-interference processing on the sound track corresponding to the second sound source object contained in the sound collected by the microphone, and perform the preset first anti-interference processing on the sound obtained by the preset first anti-interference processing and the image collected by the camera The synthesis process is performed to obtain the target video, wherein the second sound source object is a sound source object other than the first sound source object among the sound source objects included in the sound collected by the microphone.

In the second aspect, an embodiment of the present application also provides an electronic device, the electronic device including:

The opening unit is configured to, when a video recording operation is received, turn on the camera of the electronic device for image collection, and turn on the microphone of the electronic device for sound collection;

The first extraction unit is configured to determine the photographic subject contained in the image collected by the camera, and extract characteristic information of the photographic subject;

The second extraction unit is configured to determine the sound source object contained in the sound collected by the microphone, and extract characteristic information of the sound source object, where different sound source objects correspond to different sound tracks;

The matching unit is configured to match the photographed object and the sound source object based on the characteristic information of the photographed object and the characteristic information of the sound source object to obtain a relationship between the photographed object and the sound source object. The matching relationship;

A receiving unit, configured to receive a selection operation for the photographed object;

The selection unit is configured to respond to the selection operation and select a first photographic subject from the photographic subjects contained in the image collected by the camera;

A determining unit, configured to determine, according to the matching relationship, a first sound source object that matches the first shooting object among sound source objects included in the sound collected by the microphone;

The first processing unit is configured to perform preset first anti-interference processing on the sound track corresponding to the second sound source object contained in the sound collected by the microphone;

The second processing unit is configured to synthesize the sound obtained by the preset first anti-interference processing and the image collected by the camera to obtain a target video, wherein the second sound source object is collected by the microphone The sound source objects included in the received sound are sound source objects other than the first sound source object.

In a third aspect, an embodiment of the present application also provides an electronic device, including a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and the computer program is executed by the processor. The steps of the above video processing method are realized when executed.

In a fourth aspect, an embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the above-mentioned video processing method.

In the embodiment of the present application, during the video recording process, the matching relationship between the shooting object in the recorded video screen and the sound source object in the recorded video sound can be established. When the user selects a specific shooting object in the video screen, According to the specific shooting object and the above matching relationship, the specific sound source object matching the specific shooting object is determined, and the sound track of the sound source object other than the specific sound source object in the recorded video sound is subjected to anti-interference processing, based on the anti-interference processing The obtained sound and the recorded video screen generate the target video, so that the purer video that the user wants can be obtained without the need for post-editing through professional equipment, which reduces the video processing cost and simplifies the video processing operation.

Description of the drawings

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application;

Fig. 2 is an example diagram of the polar coordinates of a video recording object according to an embodiment of the present application;

FIG. 3 is an example diagram of the polar coordinates of a video recording object according to another embodiment of the present application;

Fig. 4 is an application scene diagram of a video processing method of an embodiment of the present application;

Fig. 5 is a structural block diagram of an electronic device according to an embodiment of the present application;

Fig. 6 is a schematic diagram of the hardware structure of an electronic device that implements each embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

With the rapid development of Internet technology and the explosive growth of social networks and short videos, people will have a lot of time for video recording when using electronic devices, such as video shooting or live video. However, during video recording, if there is noise or the voice of multiple users in the recorded video, it is necessary to use professional equipment to edit the recorded video later to filter out noise or other users’ voices. The cost is higher and the operation is cumbersome.

In order to solve the foregoing technical problems, embodiments of the present application provide a video processing method and electronic equipment.

The following first introduces a video processing method provided by an embodiment of the present application.

It should be noted that the video processing method provided in the embodiments of this application is applicable to electronic devices. In practical applications, the electronic devices may include mobile terminals such as smart phones, tablet computers, personal digital assistants, etc., and may also include: laptop computers, Computer devices such as desktop computers and desktop computers are not limited in the embodiment of the present application.

Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application. As shown in Fig. 1, the method may include the following steps: step 101, step 102, step 103, step 104, step 105, and step 106, where,

In step 101, when a video recording operation is received, the camera of the electronic device is turned on for image collection, and the microphone of the electronic device is turned on for sound collection.

In the embodiment of the present application, the video recording operation may be an operation used to trigger video shooting, or may be an operation used to trigger a live video broadcast.

In the embodiments of this application, the user can manually input video recording operations on the electronic device, for example, click the camera icon on the operation interface of the electronic device; or open the video recording software, enter the interface of the video recording software, and click the The video recording icon/button on the interface; or, the user can also input video recording operations on the electronic device through voice calling; or, the user can also input video recording operations on the electronic device through gestures or shaking the electronic device This embodiment of the application does not limit this.

In the embodiment of the present application, during the process of video recording by the electronic device, the camera of the electronic device is used for image capture, and the microphone of the electronic device is used for sound collection, that is, the camera and microphone of the electronic device work at the same time.

In step 102, determine the subject contained in the image collected by the camera, and extract characteristic information of the subject; and determine the sound source object contained in the sound collected by the microphone, and extract the characteristic information of the sound source object. The sound source objects correspond to different audio tracks.

In the embodiments of the present application, the shooting object and the sound source object are essentially: different manifestations of the recorded object (ie, objectively existing object) in the video recording scene. Specifically, the shooting object is the recorded object in the video recording scene. In the form of expression in the video picture, the sound source object is the expression form of the recorded object in the video sound.

For example, user D uses a mobile phone to conduct a live video broadcast. In this case, user D is the recorded object, user D in the live video screen is the shooting object, and user D in the live video sound is the sound source object.

In the embodiment of the present application, the feature information of the shooting object and the feature information of the sound source object are used to determine the matching relationship between the shooting object and the sound source object, that is, to determine which shooting object and which sound source object belong to the same recorded object.

In an example, the video recording scene includes three recorded objects, namely: user A, user B, and user C. During the video recording process, for example, the image captured by the camera includes three subjects, which are: Shooting subject 1, shooting subject 2, and shooting subject 3. The sound collected by the microphone includes four sound source objects, namely: sound source object 1, sound source object 2, sound source object 3, and sound source object 4. Extract and shoot The feature information of object 1 to object 3 and the feature information of sound source object 1 to sound source object 4, the purpose of which is to determine which of object 1 to object 3 and sound source object 1 to sound source object 4 Which of the sound source objects in belong to user A, determine which of the photographic subjects 1 to 3 and which of the sound source objects 1 to 4 belong to user B, and determine which of the photographic subjects 1 to 3 to the subject of sound belongs to user B, and determine which of subjects 1 to 3 to subject to Which of the shooting objects in 3 and which of the sound source objects 1 to 4 belongs to the user C.

In the embodiment of the present application, the characteristic information of the photographed object may include: the spatial position information of the photographed object relative to the electronic device, and correspondingly, the characteristic information of the sound source object may include: the spatial position information of the sound source object relative to the electronic device; or The feature information of the shooting object may include: the external image of the shooting object. Correspondingly, the feature information of the sound source object may include: the sound track attribute of the sound source object, where the sound track attribute includes at least one of the following: timbre, Tempo and volume.

Specifically, when the feature information of the photographed object includes: the spatial position information of the photographed object relative to the electronic device, the object recognition technology can be used to identify each photographed object contained in the image collected by the camera, and then according to the image of each photographed object Depth information, to determine the spatial position information of each shooting object; when the characteristic information of the sound source object includes: the spatial position information of the sound source object relative to the electronic device, it can be identified based on the timbre and rhythm of the sound collected by the microphone Each sound source object contained in the sound collected by the microphone is extracted, and then the spatial position information of each sound source object is determined according to the sound wave information of each sound source object.

When the feature information of the subject includes: the external image of the subject, object recognition technology can be used to identify each subject contained in the image collected by the camera, and then face recognition technology is used to extract the external image of each subject Image, for example, age, gender, etc.; when the characteristic information of the sound source object includes: the sound track attribute of the sound source object, the timbre and rhythm of the sound collected by the microphone can be used to identify the sound collected by the microphone Include each sound source object, and extract the track attribute of each sound source object.

In the embodiments of this application, considering that the spatial position information of the shooting object relative to the electronic device is obtained based on the image collected by the camera of the electronic device, the spatial position information of the sound source object relative to the electronic device is collected based on the microphone of the electronic device. Therefore, specifically, the spatial position information of the photographic object relative to the electronic device may include: the polar coordinates (x1, α1) of the photographic object in the spatial coordinate system with the camera as the coordinate origin; correspondingly, the sound source The spatial position information of the object relative to the electronic device includes the polar coordinates (y1, β1) of the sound source object in the spatial coordinate system with the microphone as the origin of the coordinates.

In order to facilitate intuitive understanding, the spatial coordinate system with the camera as the origin of the coordinates and the spatial coordinate system with the microphone as the origin of the coordinates are put together in a diagram for description.

In an example, the recorded object is between the camera and the microphone. As shown in Figure 2, O1 represents the camera and O2 represents the microphone. The polar coordinates of the recorded object in the spatial coordinate system with O1 as the origin of the coordinates are (x1, α1), that is, the polar coordinates (x1, α1) of the photographed object in the spatial coordinate system with the camera as the coordinate origin, and the polar coordinates of the recorded object in the spatial coordinate system with O2 as the coordinate origin are (y1, β1), That is, the polar coordinates (y1, β1) of the sound source object in the spatial coordinate system with the microphone as the coordinate origin, where x1 is the distance from the recorded object to the camera, y1 is the distance from the recorded object to the microphone, and L is the distance from the microphone to the For the distance of the camera, the value ranges of α1 and β1 are both (-90°, 90°).

In another example, the recorded object is on the side of the camera or microphone, as shown in Figure 3, O1 represents the camera, O2 represents the microphone, and the polar coordinates of the recorded object in the spatial coordinate system with O1 as the origin of the coordinates are ( x1, α1), that is, the polar coordinates (x1, α1) of the subject in the spatial coordinate system with the camera as the origin of the coordinates, and the polar coordinates of the recorded object in the spatial coordinate system with the origin of O2 as the coordinates (y1, β1) ), that is, the polar coordinates (y1, β1) of the sound source object in the spatial coordinate system with the microphone as the coordinate origin, where x1 is the distance from the recorded object to the camera, y1 is the distance from the recorded object to the microphone, and L is For the distance between the microphone and the camera, the value ranges of α1 and β1 are both (-90°, 90°).

In step 103, based on the feature information of the shooting object and the feature information of the sound source object, the shooting object and the sound source object are matched to obtain the matching relationship between the shooting object and the sound source object.

In the embodiments of this application, if the photographed object matches the sound source object, it means that the photographed object and the sound source object belong to the same recorded object. If the photographed object does not match the sound source object, it means that the photographed object and the sound source object do not belong to The same object being recorded. The information recorded in the matching relationship between the shooting object and the sound source object is: which shooting object and which sound source object belong to the same recorded object.

In the embodiment of the present application, when the characteristic information of the photographed object includes: the spatial position information of the photographed object relative to the electronic device, and the characteristic information of the sound source object includes: the spatial position information of the sound source object relative to the electronic device, the above step 103 is specifically The following steps may be included: if the spatial position information of the photographic object relative to the electronic device and the spatial position information of the sound source object relative to the electronic device overlap or are not much different, determining that the photographic object matches the sound source object.

More specifically, the feature information of the shooting object is: the polar coordinates (x1, α1) of the shooting object in the spatial coordinate system with the camera as the origin of the coordinates, and the feature information of the sound source object is: In the case of the polar coordinates (y1, β1) in the spatial coordinate system of the coordinate origin, considering that (x1, α1) is obtained in the coordinate system with the camera as the coordinate origin, (y1, β1) is obtained with the microphone as the origin The coordinate origin is obtained under the coordinate system, and the camera and the microphone are located at different positions of the electronic device. Therefore, in order to ensure the accuracy of the subsequent matching results, it is necessary to eliminate the deviation caused by the different coordinate origin, that is, the shooting object and the sound source The objects are converted to the same coordinate system.

When eliminating the deviation caused by the different coordinate origin, the camera can be used as the unified origin, and the shooting object and the sound source object can be converted to the coordinate system with the camera as the coordinate origin; or the microphone can be used as the unified origin to combine the shooting object and The sound source object is converted to a coordinate system with the microphone as the origin of the coordinates; alternatively, the third position other than the camera and the microphone can be used as the unified origin, and the shooting object and the sound source object can be converted to the third position as the origin of the coordinate system. In the coordinate system, the embodiment of the present application does not limit this.

When the camera is used as the unified origin, and the shooting object and the sound source object are converted to the coordinate system with the camera as the coordinate origin, the above step 103 may specifically include the following steps (not shown in the figure): step 1031, step 1032, and step 1033, of which,

In step 1031, when (x1, α1) and (y1, β1) are located between the two coordinate origins, according to (y1, β1) and the preset first coordinate conversion formula

Calculate the polar coordinates (x2, α2) of the sound source object in the spatial coordinate system with the camera as the coordinate origin; among them, the two coordinate origins include: the camera as the coordinate origin and the microphone as the coordinate origin, and L is the distance from the microphone to the camera distance;

In this step, the unknown quantity (x2, α2) is solved by the known quantity (x1, α1), L and the first coordinate conversion formula.

In step 1032, when (x1, α1) and (y1, β1) are located on the same side of the two coordinate origins, according to (y1, β1) and the preset second coordinate conversion formula

Calculate the polar coordinates (x2, α2) of the sound source object in the spatial coordinate system with the camera as the coordinate origin;

In this step, the unknown quantity (x2, α2) is solved by the known quantity (x1, α1), L and the second coordinate conversion formula.

In step 1033, according to (x1, α1) and (x2, α2), the degree of matching between the shooting object and the sound source object is calculated. For each shooting object, the sound source object with the highest degree of matching with each shooting object is determined as The matched sound source object obtains the corresponding matching relationship.

In an embodiment, the above step 1033 may specifically include the following steps:

Calculate the distance value between (x1, α1) and (x2, α2), and determine the matching degree between the shooting object and the sound source object according to the distance value, where the distance value is inversely proportional to the matching degree.

In another embodiment, considering that for the same recorded object, when determining the object in the image collected by the camera and the sound source object in the sound collected by the microphone, taking the recorded object as a human as an example, the image The center of measurement is the eyes of the recorded object, and the center of sound measurement is the mouth of the recorded object. In order to ensure the accuracy of the subsequent matching results, it is necessary to eliminate errors caused by the difference between the image measurement center and the sound measurement center. When eliminating errors caused by the difference between the image measurement center and the sound measurement center, an error correction parameter can be introduced, and the error correction can be performed through the error correction parameter. At this time, the above step 1033 may specifically include the following steps (not shown in the figure): Step 10331, step 10332, and step 10333, among which,

In step 10331, multiply (x2, α2) and the preset error correction parameter δ to obtain the corrected polar coordinates (δ*x2, δ*α2);

In the embodiments of this application, error correction parameters can be set separately for the shooting object and the sound source object. In practical applications, the error correction parameters can be set by technicians based on experience, and can be obtained by training a large number of samples. The implementation of this application The example does not limit this.

In step 10332, according to (x1, α1), (δ*x2, δ*α2) and the formula for the distance between two points in the polar coordinate system

Calculate the distance between (x1,α1) and (δ*x2,δ*α2);

In step 10333, the degree of matching between the shooting object and the sound source object is determined according to the distance value, where the distance value is inversely proportional to the degree of matching.

Since the spatial position information of the photographed object and the spatial position information of the sound source object can reflect the relative positional relationship between the photographed object and the sound source object to a large extent, in this embodiment of the present application, the spatial position information of the photographed object and The spatial position information of the sound source object, matching the shooting object and the sound source object, can ensure the accuracy of the matching result.

In the embodiment of the present application, when the feature information of the shooting object includes the external image of the shooting object, and the feature information of the sound source object includes: the soundtrack attribute of the sound source object, the above step 103 may specifically include the following steps: If the external image of is matched with the soundtrack attribute of the sound source object, it is determined that the shooting object matches the sound source object.

In an example, there are two shooting objects. The external images of the two shooting objects are: a boy and a girl. The sound source objects also include two. The sound track attributes of the two sound source objects are: a female voice. With a male voice, at this time, it can be determined that the subject with the external image of "boy" matches the sound source object with the soundtrack attribute of "male", and the subject with the external image of "girl" and the audio track attribute are determined to be " The sound source object of "female voice" matches.

In step 104, a selection operation for the shooting object is received, and in response to the selection operation, a first shooting object is selected from the shooting objects included in the image collected by the camera.

In the embodiment of the present application, when the user wants to record the final video to include only the sound of a certain one or a few shooting objects, he can input a focus object selection operation on the electronic device. In practical applications, the user can input the focus object selection operation on the electronic device through voice or manual operation.

In an example, as shown in FIG. 4, the user 42 is using the electronic device 40 for video recording. The video recording screen 41 of the electronic device 40 contains three shooting objects. The user 42 can "long press" on the video recording screen 41 To select the subject as the target subject.

In step 105, according to the determined matching relationship, a first sound source object matching the first shooting object among the sound source objects included in the sound collected by the microphone is determined.

In step 106, perform the preset first anti-interference processing on the sound track corresponding to the second sound source object contained in the sound collected by the microphone, and perform the preset first anti-interference processing on the sound obtained by the preset first anti-interference processing and the image collected by the camera Synthesizing processing to obtain the target video, where the second sound source object is a sound source object other than the first sound source object among the sound source objects included in the sound collected by the microphone.

In the embodiment of the present application, the sound track other than the sound track corresponding to the first sound source object (that is, the sound track corresponding to the second sound source object) among the sounds collected by the microphone can be subjected to the preset first anti-interference processing to Obtain audio that only contains the audio track corresponding to the first sound source object or mainly contains the audio track corresponding to the first sound source object, where the preset first anti-interference processing may be noise cancellation processing.

In the embodiment of the present application, when the target video is synthesized, all the shooting objects in the image collected by the camera may be retained, or only the first shooting object may be retained.

When only the first shooting object is retained, the above step 106 may specifically include the following steps:

Perform a preset second anti-interference process on the image area where the second subject is contained in the image collected by the camera, and perform a synthesis process on the image obtained by the preset second anti-interference process and the sound obtained by the first preset anti-interference process , To obtain the target video, where the second shooting object is a shooting object other than the first shooting object among the shooting objects included in the image collected by the camera.

In practical applications, the preset second anti-interference processing may include mosaic processing or blurring processing.

It can be seen from the above embodiment that in this embodiment, during the video recording process, the matching relationship between the shooting object in the recorded video screen and the sound source object in the recorded video sound can be established. When the user selects the When specifying a subject, according to the specific subject and the above matching relationship, determine the specific sound source object that matches the specific subject, and perform anti-interference treatment on the sound track of the sound source object other than the specific sound source object in the recorded video sound , Based on the sound obtained by anti-interference processing and the recorded video screen to generate the target video, so that you can get the purer video that the user wants without the need for professional equipment to perform post-editing, which reduces the video processing cost and simplifies the video Processing operation.

FIG. 5 is a structural block diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 5, the electronic device 500 may include: an opening unit 501, a first extracting unit 502, a second extracting unit 503, a matching unit 504, and a receiving unit. The unit 505, the selection unit 506, the determination unit 507, the first processing unit 508, and the second processing unit 509, wherein:

The turning on unit 501 is configured to turn on the camera of the electronic device for image collection and turn on the microphone of the electronic device for sound collection when a video recording operation is received;

The first extraction unit 502 is configured to determine a photographic subject contained in the image collected by the camera, and extract characteristic information of the photographic subject;

The second extraction unit 503 is configured to determine the sound source object contained in the sound collected by the microphone, and extract characteristic information of the sound source object, where different sound source objects correspond to different sound tracks;

The matching unit 504 is configured to match the photographed object and the sound source object based on the characteristic information of the photographed object and the characteristic information of the sound source object to obtain the difference between the photographed object and the sound source object. Matching relationship between;

The receiving unit 505 is configured to receive a selection operation for the photographed object;

The selection unit 506 is configured to respond to the selection operation and select a first photographic subject from the photographic subjects contained in the image collected by the camera;

The determining unit 507 is configured to determine, according to the matching relationship, a first sound source object that matches the first shooting object among the sound source objects included in the sound collected by the microphone;

The first processing unit 508 is configured to perform preset first anti-interference processing on the sound track corresponding to the second sound source object included in the sound collected by the microphone;

The second processing unit 509 is configured to synthesize the sound obtained by the preset first anti-interference processing and the image collected by the camera to obtain a target video, wherein the second sound source object is the microphone Among the sound source objects included in the collected sound, sound source objects other than the first sound source object are included.

Optionally, as an embodiment, the feature information of the shooting object includes: spatial position information of the shooting object relative to the electronic device, and the feature information of the sound source object includes: The spatial location information of the electronic device.

Optionally, as an embodiment, the spatial position information of the photographic object relative to the electronic device is: polar coordinates (x1, α1) of the photographic object in a spatial coordinate system with the camera as the origin of the coordinates ；

The spatial position information of the sound source object relative to the electronic device is: the polar coordinates (y1, β1) of the sound source object in a spatial coordinate system with the microphone as the origin of the coordinates.

Optionally, as an embodiment, the matching unit 504 may include:

The first calculation subunit is used for when the (x1, α1) and the (y1, β1) are located between the two coordinate origins, according to the (y1, β1) and the preset first coordinate conversion formula

Calculate the polar coordinates (x2, α2) of the sound source object in a space coordinate system with the camera as the origin of the coordinates;

The second calculation subunit is used for when the (x1, α1) and the (y1, β1) are located on the same side of the two coordinate origins, according to the (y1, β1) and the preset second coordinate conversion formula

Calculate the polar coordinates (x2, α2) of the sound source object in a spatial coordinate system with the camera as the origin of the coordinates, where the two coordinate origins include: taking the camera as the coordinate origin and taking the microphone as the origin As the origin of coordinates, L is the distance from the microphone to the camera;

The third calculation subunit is used to calculate the degree of matching between the shooting object and the sound source object according to the (x1, α1) and the (x2, α2), and for each shooting object, it will match the The sound source object with the highest matching degree of each shooting object is determined as the matched sound source object, and the corresponding matching relationship is obtained.

Optionally, as an embodiment, the third calculation subunit may include:

The coordinate correction module is used for multiplying the (x2, α2) and the preset error correction parameter δ to obtain the corrected polar coordinates (δ*x2, δ*α2);

The distance calculation module is used to calculate the distance between two points in the polar coordinate system according to the (x1,α1), the (δ*x2,δ*α2)

Calculate the distance between the (x1, α1) and the (δ*x2, δ*α2);

The matching degree determining module is configured to determine the matching degree between the shooting object and the sound source object according to the distance value, wherein the distance value is in inverse proportion to the matching degree.

Optionally, as an embodiment, the second processing unit 509 may include:

The video synthesis subunit is configured to perform a preset second anti-interference process on the image area where the second photographic object contained in the image collected by the camera is located, and perform a preset second anti-interference process on the image obtained by the preset second anti-interference process and the The sound obtained by the first preset anti-interference processing is synthesized to obtain the target video, wherein the second shooting object is a shooting object other than the first shooting object included in the image collected by the camera. Object.

FIG. 6 is a schematic diagram of the hardware structure of an electronic device that implements each embodiment of the present application. As shown in FIG. 6, the electronic device 600 includes, but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, and an input unit 604 , Sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, processor 610, power supply 611 and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 6 does not constitute a limitation on the electronic device. The electronic device may include more or fewer components than those shown in the figure, or a combination of certain components, or different components. Layout. In the embodiments of the present application, electronic devices include, but are not limited to, mobile phones, tablet computers, notebook computers, palmtop computers, vehicle-mounted terminals, wearable devices, and pedometers.

Wherein, the processor 610 is configured to, when a video recording operation is received, turn on the camera of the electronic device for image collection, and turn on the microphone of the electronic device for sound collection; determine that the image collected by the camera includes shooting Object, and extract the characteristic information of the shooting object; and determine the sound source object contained in the sound collected by the microphone, and extract the characteristic information of the sound source object, wherein different sound source objects correspond to different sound tracks Based on the feature information of the shooting object and the feature information of the sound source object, matching the shooting object and the sound source object to obtain the matching relationship between the shooting object and the sound source object; Receiving a selection operation for the shooting object; in response to the selection operation, selecting a first shooting object from the shooting objects contained in the image collected by the camera; determining that the sound collected by the microphone contains The first sound source object matching the first shooting object among the sound source objects; the first anti-interference processing is performed on the sound track corresponding to the second sound source object contained in the sound collected by the microphone, and The sound obtained by the preset first anti-interference processing and the image collected by the camera are synthesized and processed to obtain a target video, wherein the second sound source object is a sound source object contained in the sound collected by the microphone Sound source objects other than the first sound source object.

Optionally, as an embodiment, the photographing object and the sound source object are matched based on the characteristic information of the photographing object and the characteristic information of the sound source object to obtain the photographing object and the sound source object. Describe the matching relationship between the sound source objects, including:

When the (x1, α1) and the (y1, β1) are located between the two coordinate origins, according to the (y1, β1) and the preset first coordinate conversion formula

When the (x1, α1) and the (y1, β1) are located on the same side of the two coordinate origins, according to the (y1, β1) and the preset second coordinate conversion formula

According to the (x1, α1) and the (x2, α2), the degree of matching between the shooting object and the sound source object is calculated, and for each shooting object, the highest matching degree with each shooting object The sound source object is determined as the matched sound source object, and the corresponding matching relationship is obtained.

Optionally, as an embodiment, the degree of matching between the shooting object and the sound source object is calculated according to the (x1, α1) and the (x2, α2), and for each shooting object, The sound source object with the highest degree of matching with each shooting object is determined as the matched sound source object, and the corresponding matching relationship is obtained, including:

Perform a product operation on the (x2, α2) and the preset error correction parameter δ to obtain the corrected polar coordinates (δ*x2, δ*α2);

According to the (x1, α1), the (δ*x2, δ*α2) and the formula for the distance between two points in the polar coordinate system

Calculate the distance between the (x1, α1) and the (δ*x2, δ*α2);

According to the distance value, the degree of matching between the shooting object and the sound source object is determined, wherein the distance value is inversely proportional to the degree of matching.

Optionally, as an embodiment, the synthesizing the sound obtained by the preset first anti-interference processing and the image collected by the camera to obtain the target video includes:

Perform a preset second anti-interference process on the image area where the second photographic object contained in the image collected by the camera is located, and perform a preset second anti-interference process on the image obtained by the preset second anti-interference process and the first preset anti-interference process The obtained sound is synthesized to obtain a target video, wherein the second shooting object is a shooting object other than the first shooting object among the shooting objects included in the image collected by the camera.

It should be understood that, in the embodiment of the present application, the radio frequency unit 601 can be used for receiving and sending signals in the process of sending and receiving information or talking. Specifically, after receiving the downlink data from the base station, it is sent to the processor 610 for processing; Uplink data is sent to the base station. Generally, the radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 601 can also communicate with the network and other devices through a wireless communication system.

The electronic device provides users with wireless broadband Internet access through the network module 602, such as helping users to send and receive emails, browse web pages, and access streaming media.

The audio output unit 603 can convert the audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into audio signals and output them as sounds. Moreover, the audio output unit 603 may also provide audio output related to a specific function performed by the electronic device 600 (for example, call signal reception sound, message reception sound, etc.). The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used to receive audio or video signals. The input unit 604 may include a graphics processing unit (GPU) 6041 and a microphone 6042. The graphics processor 6041 is configured to monitor images of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. The data is processed. The processed image may be displayed on the display unit 606. The image processed by the graphics processor 6041 may be stored in the memory 609 (or other storage medium) or sent via the radio frequency unit 601 or the network module 602. The microphone 6042 can receive sound, and can process such sound into audio data. The processed audio data can be converted into a format that can be sent to the mobile communication base station via the radio frequency unit 601 for output in the case of a telephone call mode.

The electronic device 600 further includes at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 6061 according to the brightness of the ambient light. The proximity sensor can close the display panel 6061 and the display panel 6061 when the electronic device 600 is moved to the ear. / Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes), and can detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of electronic devices (such as horizontal and vertical screen switching, related games) , Magnetometer attitude calibration), vibration recognition related functions (such as pedometer, percussion), etc.; sensor 605 can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, Infrared sensors, etc., will not be repeated here.

The display unit 606 is used to display information input by the user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.

The user input unit 607 may be used to receive inputted numeric or character information, and generate key signal input related to user settings and function control of the electronic device. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. The touch panel 6071, also called a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 6071 or near the touch panel 6071. operating). The touch panel 6071 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 610, the command sent by the processor 610 is received and executed. In addition, the touch panel 6071 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 6071, the user input unit 607 may also include other input devices 6072. Specifically, other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick, which will not be repeated here.

Further, the touch panel 6071 can cover the display panel 6061. When the touch panel 6071 detects a touch operation on or near it, it transmits it to the processor 610 to determine the type of the touch event, and then the processor 610 determines the type of touch event according to the touch. The type of event provides corresponding visual output on the display panel 6061. Although in FIG. 6, the touch panel 6071 and the display panel 6061 are used as two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 6071 and the display panel 6061 can be integrated The implementation of the input and output functions of the electronic device is not specifically limited here.

The interface unit 608 is an interface for connecting an external device and the electronic device 600. For example, the external device may include a wired or wireless headset port, an external power source (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) port, video I/O port, headphone port, etc. The interface unit 608 can be used to receive input (for example, data information, power, etc.) from an external device and transmit the received input to one or more elements in the electronic device 600 or can be used to connect the electronic device 600 to an external device. Transfer data between devices.

The memory 609 can be used to store software programs and various data. The memory 609 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones, etc. In addition, the memory 609 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The processor 610 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device, runs or executes the software programs and/or modules stored in the memory 609, and calls the data stored in the memory 609. , Perform various functions of electronic equipment and process data, so as to monitor the electronic equipment as a whole. The processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc., and the modem The processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 610.

The electronic device 600 may also include a power supply 611 (such as a battery) for supplying power to various components. Preferably, the power supply 611 may be logically connected to the processor 610 through a power management system, so as to manage charging, discharging, and power consumption management through the power management system And other functions.

In addition, the electronic device 600 includes some functional modules not shown, which will not be repeated here.

Preferably, an embodiment of the present application also provides an electronic device, including a processor 610, a memory 609, a computer program stored in the memory 609 and running on the processor 610, and the computer program is executed by the processor 610 Each process of the foregoing video processing method embodiment can be realized at a time, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here.

The embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned video processing method embodiment is realized, and the same Technical effects, in order to avoid repetition, I will not repeat them here. Wherein, the computer-readable storage medium, such as read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), magnetic disk, or optical disk, etc.

It should be noted that in this specification, the terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements , But also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or device that includes the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the related technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) ) Includes several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.

The embodiments of the application are described above with reference to the accompanying drawings, but the application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Those of ordinary skill in the art are Under the enlightenment of this application, many forms can be made without departing from the purpose of this application and the scope of protection of the claims, all of which fall within the protection of this application.

Claims

A video processing method applied to an electronic device, the method including:

When a video recording operation is received, turning on the camera of the electronic device for image collection, and turning on the microphone of the electronic device for sound collection;

Determine the subject included in the image collected by the camera, and extract characteristic information of the subject; and determine the sound source object included in the sound collected by the microphone, and extract the characteristic information of the sound source object, where , Different sound source objects correspond to different audio tracks;

Matching the photographed object and the sound source object based on the characteristic information of the photographed object and the characteristic information of the sound source object to obtain a matching relationship between the photographed object and the sound source object;

Receiving a selection operation for the shooting object;

In response to the selection operation, select a first photographic subject from the photographic subjects contained in the image collected by the camera;

Determine, according to the matching relationship, a first sound source object that matches the first shooting object among sound source objects included in the sound collected by the microphone;

Perform preset first anti-interference processing on the sound track corresponding to the second sound source object contained in the sound collected by the microphone, and perform the preset first anti-interference processing on the sound obtained by the preset first anti-interference processing and the image collected by the camera The synthesis process is performed to obtain the target video, wherein the second sound source object is a sound source object other than the first sound source object among the sound source objects included in the sound collected by the microphone.
The method according to claim 1, wherein the characteristic information of the photographed object comprises: spatial position information of the photographed object relative to the electronic device, and the characteristic information of the sound source object comprises: the sound source object Relative to the spatial location information of the electronic device.
The method according to claim 2, wherein the spatial position information of the photographic object relative to the electronic device is: polar coordinates (x1, α1);

The spatial position information of the sound source object relative to the electronic device is: the polar coordinates (y1, β1) of the sound source object in a spatial coordinate system with the microphone as the origin of the coordinates.
The method according to claim 3, wherein the matching of the shooting object and the sound source object is performed based on the feature information of the shooting object and the feature information of the sound source object to obtain the shooting object The matching relationship with the sound source object includes:

When the (x1, α1) and the (y1, β1) are located between the two coordinate origins, according to the (y1, β1) and the preset first coordinate conversion formula
Calculate the polar coordinates (x2, α2) of the sound source object in a space coordinate system with the camera as the origin of the coordinates;

When the (x1, α1) and the (y1, β1) are located on the same side of the two coordinate origins, according to the (y1, β1) and the preset second coordinate conversion formula
Calculate the polar coordinates (x2, α2) of the sound source object in the spatial coordinate system with the camera as the coordinate origin, where the two coordinate origins include: taking the camera as the coordinate origin and taking the microphone as the origin As the origin of coordinates, L is the distance from the microphone to the camera;

According to the (x1, α1) and the (x2, α2), the degree of matching between the shooting object and the sound source object is calculated, and for each shooting object, the highest matching degree with each shooting object The sound source object is determined as the matched sound source object, and the corresponding matching relationship is obtained.
The method according to claim 4, wherein the calculation of the matching degree between the shooting object and the sound source object according to the (x1, α1) and the (x2, α2) is for each shooting object , Determining the sound source object with the highest degree of matching with each of the shooting objects as the matching sound source object, and obtaining the corresponding matching relationship, including:

Perform a product operation on the (x2, α2) and the preset error correction parameter δ to obtain the corrected polar coordinates (δ*x2, δ*α2);

According to the (x1, α1), the (δ*x2, δ*α2) and the formula for the distance between two points in the polar coordinate system
Calculate the distance between the (x1, α1) and the (δ*x2, δ*α2);

According to the distance value, the degree of matching between the shooting object and the sound source object is determined, wherein the distance value is inversely proportional to the degree of matching.
The method according to claim 1, wherein the synthesizing the sound obtained by the preset first anti-interference processing and the image collected by the camera to obtain the target video comprises:

Perform a preset second anti-interference process on the image area where the second photographic object contained in the image collected by the camera is located, and perform a preset second anti-interference process on the image obtained by the preset second anti-interference process and the first preset anti-interference process The obtained sound is synthesized to obtain a target video, wherein the second shooting object is a shooting object other than the first shooting object among the shooting objects included in the image collected by the camera.
An electronic device, the electronic device comprising:

The opening unit is configured to, when a video recording operation is received, turn on the camera of the electronic device for image collection, and turn on the microphone of the electronic device for sound collection;

The first extraction unit is configured to determine the photographic subject contained in the image collected by the camera, and extract characteristic information of the photographic subject;

The second extraction unit is configured to determine the sound source object contained in the sound collected by the microphone, and extract characteristic information of the sound source object, where different sound source objects correspond to different sound tracks;

The matching unit is configured to match the photographed object and the sound source object based on the characteristic information of the photographed object and the characteristic information of the sound source object to obtain a relationship between the photographed object and the sound source object. The matching relationship;

A receiving unit, configured to receive a selection operation for the photographed object;

The selection unit is configured to respond to the selection operation and select a first photographic subject from the photographic subjects contained in the image collected by the camera;

A determining unit, configured to determine, according to the matching relationship, a first sound source object that matches the first shooting object among sound source objects included in the sound collected by the microphone;

The first processing unit is configured to perform preset first anti-interference processing on the sound track corresponding to the second sound source object contained in the sound collected by the microphone;

The second processing unit is configured to synthesize the sound obtained by the preset first anti-interference processing and the image collected by the camera to obtain a target video, wherein the second sound source object is collected by the microphone The sound source objects included in the received sound are sound source objects other than the first sound source object.
8. The electronic device according to claim 7, wherein the characteristic information of the photographic object comprises: spatial position information of the photographic object relative to the electronic device, and the characteristic information of the sound source object comprises: the sound source The spatial position information of the object relative to the electronic device.
8. The electronic device according to claim 8, wherein the spatial position information of the photographic object relative to the electronic device is: polar coordinates (x1) of the photographic object in a spatial coordinate system with the camera as the origin of the coordinates. ,Α1);

The spatial position information of the sound source object relative to the electronic device is: the polar coordinates (y1, β1) of the sound source object in a spatial coordinate system with the microphone as the origin of the coordinates.
The electronic device according to claim 9, wherein the matching unit comprises:

The first calculation subunit is used for when the (x1, α1) and the (y1, β1) are located between the two coordinate origins, according to the (y1, β1) and the preset first coordinate conversion formula
Calculate the polar coordinates (x2, α2) of the sound source object in a space coordinate system with the camera as the origin of the coordinates;

The second calculation subunit is used for when the (x1, α1) and the (y1, β1) are located on the same side of the two coordinate origins, according to the (y1, β1) and the preset second coordinate conversion formula
Calculate the polar coordinates (x2, α2) of the sound source object in a spatial coordinate system with the camera as the origin of the coordinates, where the two coordinate origins include: taking the camera as the coordinate origin and taking the microphone as the origin As the origin of coordinates, L is the distance from the microphone to the camera;

The third calculation subunit is used to calculate the degree of matching between the shooting object and the sound source object according to the (x1, α1) and the (x2, α2), and for each shooting object, it will match the The sound source object with the highest matching degree of each shooting object is determined as the matched sound source object, and the corresponding matching relationship is obtained.
The electronic device according to claim 10, wherein the third calculation subunit comprises:

The coordinate correction module is used for multiplying the (x2, α2) and the preset error correction parameter δ to obtain the corrected polar coordinates (δ*x2, δ*α2);

The distance calculation module is used to calculate the distance between two points in the polar coordinate system according to the (x1,α1), the (δ*x2,δ*α2)
Calculate the distance between the (x1, α1) and the (δ*x2, δ*α2);

The matching degree determining module is configured to determine the matching degree between the shooting object and the sound source object according to the distance value, wherein the distance value is in inverse proportion to the matching degree.
The electronic device according to claim 11, wherein the second processing unit comprises:

The video synthesis subunit is configured to perform a preset second anti-interference process on the image area where the second photographic object contained in the image collected by the camera is located, and perform a preset second anti-interference process on the image obtained by the preset second anti-interference process and the The sound obtained by the first preset anti-interference processing is synthesized to obtain the target video, wherein the second shooting object is a shooting object other than the first shooting object included in the image collected by the camera. Object.
An electronic device comprising a processor, a memory, and a computer program stored on the memory and capable of running on the processor. The computer program is executed by the processor to implement any one of claims 1 to 6 The steps of the video processing method described in the item.
A computer-readable storage medium storing a computer program on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the video processing method according to any one of claims 1 to 6 are realized.