CN112995541A

CN112995541A - Method for eliminating video echo and computer storage medium

Info

Publication number: CN112995541A
Application number: CN202110451342.5A
Authority: CN
Inventors: 谢炜航
Original assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Current assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-06-18
Anticipated expiration: 2041-04-26
Also published as: CN112995541B

Abstract

The embodiment of the invention provides a method for eliminating video echo and a computer storage medium, which are applied to a system player for video playing of terminal equipment. Wherein the method comprises the following steps: acquiring an audio track object of a video file to be played in the system player; performing sound mixing operation on the audio track associated with the audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file; determining audio reference data for video echo elimination processing of the video file according to audio playing data corresponding to the audio track of the video file; and carrying out video echo elimination processing on audio acquisition data obtained by an audio acquisition device in the terminal equipment by using the audio reference data so as to eliminate the audio playing data in the audio acquisition data. By the embodiment of the invention, the video echo can be simply and conveniently eliminated under the condition of not influencing the size of the download packet of the video application program.

Description

Method for eliminating video echo and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of audio processing, in particular to a method for eliminating video echo and a computer storage medium.

Background

Under the online education scene, the teacher often uses the video as teaching demonstration in the courseware, and when playing the video, the teacher explains the content related to the video. If the teacher uses the system player of the terminal device to directly play the video, and the teacher does not wear the earphone, the microphone of the terminal device collects the explanation sound of the teacher and the sound of the video played by the loudspeaker of the terminal device, and codes and transmits the collected sound signals to the terminal device held by the student. Thus, the student can hear two kinds of sounds, one sound is a sound of a video played by the lecturer, and the other sound is a speech sound accompanying the sound of the video played. The problem of video echo of the terminal equipment held by students is caused because the sound signal collected by the microphone is mixed with the sound of the played video and the explaining sound of the speaker teacher.

In the prior art, a video track and an audio track of a video are separated, and video echo cancellation processing is performed on an audio bare file obtained by separation. However, this approach requires manual alignment of the audio and video tracks and also incorporates a video interactive control interface, greatly increasing the complexity of video echo cancellation. In addition, the elimination of video echo can be realized by using a custom player instead of a system player. However, this approach can greatly increase the size of the download packets for the video application of the terminal device. Taking the third-party player IJKPlayer most commonly used in the industry at present as an example, the size of the download package of the video application program can be increased by about 4M, and in the era of mobile terminals, the size of the download package of the video application program is one of the key factors affecting the growth of users.

Therefore, how to simply eliminate video echo becomes a technical problem to be solved at present under the condition of not influencing the size of a download packet of a video application program.

Disclosure of Invention

In view of the above, an objective of the present invention is to provide a method for canceling video echo and a computer storage medium, so as to solve at least one of the above problems.

The embodiment of the invention provides a method for eliminating video echo, which is applied to a system player for video playing of terminal equipment. The method comprises the following steps: acquiring an audio track object of a video file to be played in the system player; performing sound mixing operation on the audio track associated with the audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file; determining audio reference data for video echo elimination processing of the video file according to audio playing data corresponding to the audio track of the video file; and carrying out video echo elimination processing on audio acquisition data obtained by an audio acquisition device in the terminal equipment by using the audio reference data so as to eliminate the audio playing data in the audio acquisition data.

An embodiment of the present invention further provides a computer storage medium, where a readable program is stored in the computer storage medium, where the readable program is applied to a system player of a terminal device for playing a video, and the readable program includes: instructions for obtaining an audio track object of a video file to be played in the system player; instructions for performing a mixing operation on a sound track associated with a sound track object of the video file to obtain audio playing data corresponding to the sound track of the video file; instructions for determining audio reference data for video echo cancellation processing of the video file according to audio play data corresponding to a sound track of the video file; and the instruction is used for carrying out video echo elimination processing on the audio acquisition data obtained by the audio acquisition device in the terminal equipment by using the audio reference data so as to eliminate the audio playing data in the audio acquisition data.

The video echo elimination scheme provided by the embodiment of the invention is applied to a system player for video playing of the terminal equipment, so that the video application program of the terminal equipment does not need to use a user-defined player or a third-party player to eliminate the video echo, but directly uses the system player to eliminate the video echo, and the size of a download packet of the video application program is not influenced.

In addition, in the solution for eliminating video echo provided in the embodiment of the present invention, the audio mixing operation is performed on the audio track associated with the audio track object of the obtained video file to be played, the audio reference data used for the video echo elimination processing is determined according to the audio playing data corresponding to the audio track of the video file obtained by the audio mixing operation, and then the audio reference data is used to perform the video echo elimination processing on the audio collecting data obtained by the audio collecting device in the terminal device, so that the audio playing data in the audio collecting data can be simply eliminated, that is, the video echo can be simply eliminated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

FIG. 1 shows a schematic diagram of teacher end and student end interaction in the prior art;

fig. 2 is a flowchart illustrating steps of a method for canceling video echo according to a first embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a method for eliminating video echo according to a first embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Fig. 1 shows a schematic diagram of teacher end and student end interaction in the prior art. As shown in fig. 1, in an online education scene, a teacher often uses a video as a teaching demonstration on a teacher side (a terminal device held by the teacher). When the teacher executes the video playing operation at the teacher end, the teacher end sends the video playing signaling to the student end (the terminal device held by the student) so that the student end and the teacher end synchronously play the video. When the video is played, the teacher explains the content related to the video. If the teacher uses the system player at the teacher end to directly play the video, and the teacher does not wear the earphones, the microphone at the teacher end collects the explaining audio of the teacher and the audio of the video played by the loudspeaker at the teacher end, and pushes the collected audio to the student end in the form of audio stream. Thus, the speaker of the student end plays two kinds of audio, one audio is the audio of the local video of the student end, and the other audio is the far-end audio pushed by the teacher end. The far-end audio is the explanation audio accompanied with the audio of the video, which causes the speaker of the student end to repeatedly play the audio of the video, that is, the student end has the problem of video echo.

In the prior art, a video track and an audio track of a video can be separated, and video echo cancellation processing is performed on an audio bare file obtained by separation. However, this approach requires manual alignment of the audio and video tracks and also incorporates a video interactive control interface, greatly increasing the complexity of video echo cancellation. In addition, the elimination of video echo can be realized by using a custom player instead of a system player. However, this approach can greatly increase the size of the download package for the teacher-side video application. Taking the third-party player IJKPlayer most commonly used in the industry at present as an example, the size of the download package of the video application program can be increased by about 4M, and in the era of mobile terminals, the size of the download package of the video application program is one of the key factors affecting the growth of users. Therefore, how to simply eliminate video echo becomes a technical problem to be solved at present under the condition of not influencing the size of a download packet of a video application program.

Based on the technical problems in the prior art, embodiments of the present invention provide a method for eliminating video echo, which can simply eliminate video echo without affecting the size of a download packet of a video application program. The following describes the method for eliminating video echo according to the embodiment of the present invention in detail.

Example one

Referring to fig. 2, a flowchart of steps of a method for eliminating video echo according to a first embodiment of the present invention is shown.

Specifically, the method for eliminating video echo provided by the embodiment of the present invention is applied to a system player for video playing of a terminal device, and includes the following steps:

in step S101, a track object of a video file to be played in the system player is acquired.

In the present embodiment, the system player may be understood as a native player of an operating system of the terminal device for video playing. For example, when the operating system of the terminal device is an iOS system, the system player may be an AVPlayer player. The track may be understood as a track defining audio properties, such as the timbre, the timbre library, the number of channels, the input/output ports, the volume, etc. of the audio. The audio track object may be understood as a code object formed after a program code object is assigned to the audio track.

In some optional embodiments, when an audio track object of a video file to be played in the system player is acquired, a resource management object of the video file to be played is initialized according to resource data of the video file to be played; calling the resource management object to play the video file and monitoring the playing state of the video file; and traversing the media track object of the video file to obtain the audio track object of the video file when the monitored playing state of the video file is the ready-to-play state. Therefore, the video file is played by calling the resource management object of the video file, and when the playing state of the video file is monitored to be a ready-to-play state, the media track object of the video file is traversed, and the audio track object of the video file can be accurately obtained.

In a specific example, when the resource management object of the video file to be played is initialized according to the resource data of the video file to be played, the resource management object of the video file to be played is initialized according to the resource data of the video file to be played through an initialization interface in the system player. The resource data of the video file to be played includes, but is not limited to, a link address of a local video file, a link address of a remote video, and the like. And when the resource management object is called to play the video file, calling the resource management object through a video playing interface in the system player to play the video file. Monitoring the playing state of the video file through a playing state monitoring interface in the system player when monitoring the playing state of the video file; and traversing the media track object of the video file to obtain the audio track object of the video file when the playing state monitoring interface monitors that the playing state of the video file is a ready-to-play state. Wherein the media track object is understood to be a code object formed after a program code object is assigned to a media track. The media tracks include video tracks (video tracks) and audio tracks (audio tracks). Accordingly, the media track object includes a video track object and an audio track object.

In step S102, a mixing operation is performed on the audio track associated with the audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file.

In this embodiment, the mixing operation may be an operation of changing or modifying audio playing data stored in a memory corresponding to the system player. The audio playing data may be audio sampling points of an audio data frame to be played.

In some optional embodiments, when mixing an audio track associated with an audio track object of the video file, the audio track associated with the audio track object of the video file is mixed through an audio track mixing interface of the system player, so as to obtain audio playing data corresponding to the audio track of the video file. Therefore, through the audio track mixing interface of the system player, the audio track associated with the audio track object of the video file is mixed, and the audio playing data corresponding to the audio track of the video file can be obtained quickly and effectively.

In some optional embodiments, when a mixing operation is performed on a track associated with a track object of the video file through a track mixing interface of the system player, reading currently rendered audio playing data of the track according to a storage location of the currently rendered audio playing data of the track in an internal memory of the system player through a rendering callback function in the track mixing interface; and storing the read audio playing data currently rendered of the audio track into an audio data queue through the rendering callback function so as to obtain the audio playing data corresponding to the audio track, and setting the audio playing data currently rendered of the audio track stored in the storage position to zero. Therefore, the currently rendered audio playing data of the audio track are read through the rendering callback function, and the read currently rendered audio playing data of the audio track are stored in the audio data queue, so that the audio playing data corresponding to the audio track can be effectively stored, and preparation is made for obtaining audio reference data for video echo elimination processing subsequently. In addition, by setting the currently rendered audio playing data of the audio track stored in the storage location to zero through the rendering callback function, the audio playing data stored in the memory corresponding to the system player can be muted, so that the effect of video echo cancellation processing is effectively improved.

In a specific example, if the currently rendered audio playback data read by the rendering callback function each time includes 4096 audio sampling points, the rendering callback function stores the read 4096 audio sampling points in an audio data queue according to a first-in first-out sequence, and sets 4096 audio sampling points stored in the storage location to zero.

In some optional embodiments, before reading, by a rendering callback function in the soundtrack mixing interface, the currently rendered audio playback data of the soundtrack according to a storage location of the currently rendered audio playback data of the soundtrack in the memory of the system player, the method further includes: and configuring the rendering callback function, associating the configured rendering callback function with an audio processing object in the audio track and sound mixing interface, and associating the audio processing object with a resource management object of the video file. Therefore, the audio mixing operation of the audio track associated with the audio track object of the video file can be automatically realized by associating the configured rendering callback function with the audio processing object in the audio track mixing interface and associating the audio processing object with the resource management object of the video file.

In step S103, audio reference data used for video echo cancellation processing of the video file is determined according to audio playing data corresponding to the audio track of the video file.

In this embodiment, the audio reference data may be audio sampling points of an audio data frame used for an echo cancellation process.

In some optional embodiments, when determining the audio reference data for the video echo cancellation processing of the video file according to the audio playing data corresponding to the audio track of the video file, the audio reference determination interface of the system player determines the audio reference data for the video echo cancellation processing of the video file according to the audio playing data corresponding to the audio track of the video file. When the operating system of the terminal device is an iOS system, the audio reference determination interface of the system player may be an AudioUnit. Therefore, the audio reference data used for the video echo elimination processing of the video file can be determined quickly and effectively through the audio reference determination interface of the system player.

In some optional embodiments, when audio reference data used for video echo cancellation processing of the video file is determined according to audio playing data corresponding to an audio track of the video file through an audio reference determination interface of the system player, the audio playing data in the audio data queue is stored in a memory corresponding to the audio reference determination interface through an input callback function of the audio reference determination interface and according to a storage position of the memory corresponding to the audio reference determination interface; and reading audio playing data corresponding to the audio track according to the storage position of the memory corresponding to the audio reference determination interface through the audio reference determination interface, and determining the read audio playing data corresponding to the audio track as the audio reference data. Therefore, the audio playing data in the audio data queue can be quickly and effectively stored in the memory corresponding to the audio reference determination interface through the input callback function of the audio reference determination interface, so that preparation is made for the subsequent audio reference determination interface to determine the audio reference data. In addition, through the audio reference determination interface, the read audio playing data corresponding to the audio track can be quickly and effectively determined to be the audio reference data.

In a specific example, if the audio playing data stored in the audio data queue by the rendering callback function each time includes 4096 audio sampling points and each audio data frame includes 1024 audio sampling points (the sampling rate of the audio playing data is 48KHz, and the audio buffer of the system player is set to 0.02 sec), the input callback function stores the audio playing data in the audio data queue in the memory corresponding to the audio reference determination interface in units of 1024 audio sampling points included in each audio data frame. Specifically, since 4096 audio sampling points, which are stored into the audio data queue by the rendering callback function each time, are integer multiples of 1024 audio sampling points included in an audio data frame, the input callback function can completely store the audio playing data in the audio data queue into the memory corresponding to the audio reference determination interface.

If the audio playing data stored in the audio data queue by the rendering callback function each time comprises 4096 audio sampling points and each audio data frame comprises 940 audio sampling points (the sampling rate of the audio playing data is 44.1KHz, and the audio buffer of the system player is set to 0.02 second), the input callback function stores the audio playing data in the audio data queue in the memory corresponding to the audio reference determination interface by taking 940 audio sampling points included in each audio data frame as units. Because 4096 audio sampling points of the rendering callback function stored in the audio data queue each time are not integral multiples of 940 audio sampling points included in the audio data frame, the input callback function needs to form an audio data frame by storing the rendering callback function in the last 336 continuous audio sampling points of the audio data queue and the previous 604 continuous audio sampling points of the rendering callback function stored in the audio data queue next time, and storing the audio data frame in the memory corresponding to the audio reference determination interface. If the rendering callback function stores the audio playing data into the audio data queue for the last time, the input callback function needs to store an audio data frame formed by 336 continuous audio sampling points and 604 continuous audio sampling points with zero padding, which are stored into the audio data queue by the rendering callback function, into the memory corresponding to the audio reference determination interface.

In some optional embodiments, before determining, through an audio reference determination interface of the system player, audio reference data for a video echo cancellation process of the video file according to audio playing data corresponding to an audio track of the video file, the method further includes: setting the working mode of the audio reference determination interface to be an audio reference determination mode, and setting the audio streaming media format of the audio reference determination interface to be the same as the audio streaming media format of the video file. The audio streaming media format comprises an audio sampling rate, the number of audio data frames contained in each audio data packet, the number of audio channels contained in each audio data frame, the number of bits contained in each audio channel, the number of bytes contained in each audio data frame, the number of bytes contained in each audio data packet, whether audio data are stored in an interlaced manner or not and the like. Thereby, by setting the operation mode of the audio reference determination interface to an audio reference determination mode, provision is made for the audio reference determination interface to determine the audio reference data subsequently. In addition, the audio streaming media format of the audio reference determination interface is set to be the same as the audio streaming media format of the video file, so that the audio reference determination interface can transmit the audio playing data stored in the memory corresponding to the audio reference determination interface to the audio playing device of the terminal device for audio playing.

In some optional embodiments, the method further comprises: and transmitting the audio playing data stored in the memory corresponding to the audio reference determination interface to an audio playing device of the terminal equipment for audio playing through the audio reference determination interface. Wherein, the audio playing device can be a loudspeaker. Therefore, the audio playing data stored in the memory corresponding to the audio reference determination interface can be effectively transmitted to the audio playing device of the terminal equipment for audio playing through the audio reference determination interface.

In step S104, using the audio reference data, a video echo cancellation process is performed on the audio collecting data obtained by the audio collecting device in the terminal device, so as to cancel the audio playing data in the audio collecting data.

In this embodiment, the terminal device may be a mobile phone terminal, a tablet computer, a notebook computer, or the like. The audio acquisition device may be a single microphone, a unidirectional microphone array, an omnidirectional microphone array, or the like. The video Echo Cancellation processing may be Acoustic Echo Cancellation (AEC), Line Echo Cancellation (LEC), or the like. Taking AEC as an example, the audio reference data is used as a reference signal, the audio acquisition data is used as an input signal, and then AEC processing is performed to eliminate the audio playing data in the audio acquisition data.

In a specific example, as shown in fig. 3, a specific implementation process of the method for eliminating video echo provided by this embodiment is as follows: initializing a resource management object of the video file to be played according to the resource data of the video file to be played through an initialization interface in the system player; calling the resource management object through a video playing interface in the system player to play the video file, and monitoring the playing state of the video file through a playing state monitoring interface in the system player; when the playing state monitoring interface monitors that the playing state of the video file is a ready-to-play state, traversing a media track object of the video file to obtain a track object of the video file; configuring a rendering callback function, associating the configured rendering callback function with an audio processing object in a sound mixing interface of a sound track, and associating the audio processing object with a resource management object of the video file; reading the audio playing data currently rendered of the audio track according to the storage position of the audio playing data currently rendered of the audio track in the memory of the system player through the rendering callback function; storing the read audio playing data currently rendered of the audio track into an audio data queue through the rendering callback function so as to obtain the audio playing data corresponding to the audio track, and setting the audio playing data currently rendered of the audio track stored in the storage position to zero; setting the working mode of an audio reference determination interface to be an audio reference determination mode, and setting the audio streaming media format of the audio reference determination interface to be the same as the audio streaming media format of the video file; through the input callback function of the audio reference determination interface, according to the storage position of the memory corresponding to the audio reference determination interface, storing the audio playing data in the audio data queue into the memory corresponding to the audio reference determination interface; and reading audio playing data corresponding to the audio track according to the storage position of the memory corresponding to the audio reference determination interface through the audio reference determination interface, and determining the read audio playing data corresponding to the audio track as the audio reference data. In addition, the audio playing data stored in the memory corresponding to the audio reference determination interface can be transmitted to an audio playing device of the terminal equipment for audio playing through the audio reference determination interface.

The method for eliminating the video echo provided by the embodiment of the invention is applied to a system player for playing the video of the terminal equipment, so that the video application program of the terminal equipment does not need to use a user-defined player or a third-party player to eliminate the video echo, but directly uses the system player to eliminate the video echo, and the size of a download packet of the video application program is not influenced.

In addition, the method for eliminating video echo provided by the embodiment of the present invention performs a sound mixing operation on an object associated with an obtained sound track object of a video file to be played, determines audio reference data used for video echo elimination processing according to audio play data corresponding to a sound track of the video file obtained by the sound mixing operation, and performs video echo elimination processing on audio acquisition data obtained by an audio acquisition device in a terminal device by using the audio reference data, so that the audio play data in the audio acquisition data can be simply eliminated, that is, video echo can be simply eliminated.

The method for eliminating video echo provided by the present embodiment may be executed by any suitable device with data processing capability, including but not limited to: cameras, terminals, mobile terminals, PCs, servers, in-vehicle devices, entertainment devices, advertising devices, Personal Digital Assistants (PDAs), tablet computers, notebook computers, handheld game consoles, smart glasses, smart watches, wearable devices, virtual display devices or display enhancement devices (such as Google Glass, Oculus rise, Hololens, Gear VR), and the like.

Example two

Optionally, the instruction for acquiring an audio track object of a video file to be played in the system player includes: the instruction is used for initializing the resource management object of the video file to be played according to the resource data of the video file to be played; the instruction is used for calling the resource management object to play the video file and monitoring the playing state of the video file; and instructions for traversing the media track object of the video file to obtain the audio track object of the video file when the monitored play state of the video file is a ready-to-play state.

Optionally, the instruction for performing a mixing operation on a sound track associated with a sound track object of the video file to obtain audio playing data corresponding to the sound track of the video file includes: and performing mixing operation on the audio track associated with the audio track object of the video file through an audio track mixing interface of the system player to obtain audio playing data corresponding to the audio track of the video file.

Optionally, the instructions for performing, by an audio track mixing interface of the system player, a mixing operation on an audio track associated with an audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file include: an instruction for reading the currently rendered audio playback data of the audio track according to a storage location of the currently rendered audio playback data of the audio track in a memory of the system player through a rendering callback function in the audio track mixing interface; and storing the read currently rendered audio playing data of the audio track into an audio data queue through the rendering callback function to obtain audio playing data corresponding to the audio track, and setting the currently rendered audio playing data of the audio track stored in the storage position to zero.

Optionally, the readable program further comprises: and instructions for configuring, by a rendering callback function in the soundtrack mixing interface, the rendering callback function before reading the currently rendered audio playback data of the soundtrack according to a storage location of the currently rendered audio playback data of the soundtrack in a memory of the system player, associating the configured rendering callback function with an audio processing object in the soundtrack mixing interface, and associating the audio processing object with a resource management object of the video file.

Optionally, the instruction for determining, according to audio playing data corresponding to a sound track of the video file, audio reference data used for video echo cancellation processing of the video file includes: and determining audio reference data for video echo cancellation processing of the video file according to audio playing data corresponding to the audio track of the video file through an audio reference determination interface of the system player.

Optionally, the instruction for determining, through an audio reference determination interface of the system player, audio reference data used for video echo cancellation processing of the video file according to audio playing data corresponding to an audio track of the video file includes: an instruction for storing the audio playing data in the audio data queue into the memory corresponding to the audio reference determination interface according to the storage position of the memory corresponding to the audio reference determination interface through the input callback function of the audio reference determination interface; and the instruction is used for reading the audio playing data corresponding to the audio track according to the storage position of the memory corresponding to the audio reference determination interface through the audio reference determination interface and determining the read audio playing data corresponding to the audio track as the audio reference data.

Optionally, the readable program further comprises: the audio reference determination interface is used for setting the working mode of the audio reference determination interface to be an audio reference determination mode and setting the audio streaming media format of the audio reference determination interface to be the same as the audio streaming media format of the video file before determining the audio reference data for the video echo cancellation processing of the video file according to the audio playing data corresponding to the audio track of the video file through the audio reference determination interface of the system player.

Optionally, the readable program further comprises: and transmitting the audio playing data stored in the memory corresponding to the audio reference determination interface to an audio playing device of the terminal equipment for audio playing through the audio reference determination interface.

Through the computer readable medium provided by the embodiment of the application, the computer storage medium stores the readable program, and the readable program is applied to the system player of the terminal device for video playing, so that the video application program of the terminal device does not need to use a custom player or a third-party player to perform video echo cancellation, but directly uses the system player to perform video echo cancellation, and therefore the size of a download packet of the video application program is not affected.

In addition, through the computer readable medium provided by the embodiment of the application, the audio mixing operation is performed on the audio track associated with the audio track object of the obtained video file to be played, the audio reference data used for the video echo cancellation processing is determined according to the audio playing data corresponding to the audio track of the video file obtained through the audio mixing operation, and then the audio reference data is used for performing the video echo cancellation processing on the audio collecting data obtained by the audio collecting device in the terminal equipment, so that the audio playing data in the audio collecting data can be simply and conveniently eliminated, that is, the video echo can be simply and conveniently eliminated.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the teacher-style predictive model training method described herein. Further, when a general-purpose computer accesses code for implementing the teacher-style prediction model training method shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the teacher-style prediction model training method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A method for eliminating video echo is applied to a system player for video playing of a terminal device, and comprises the following steps:

acquiring an audio track object of a video file to be played in the system player;

performing sound mixing operation on the audio track associated with the audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file;

determining audio reference data for video echo elimination processing of the video file according to audio playing data corresponding to the audio track of the video file;

and carrying out video echo elimination processing on audio acquisition data obtained by an audio acquisition device in the terminal equipment by using the audio reference data so as to eliminate the audio playing data in the audio acquisition data.

2. The method for eliminating video echo according to claim 1, wherein said obtaining a track object of a video file to be played in the system player comprises:

initializing a resource management object of the video file to be played according to the resource data of the video file to be played;

calling the resource management object to play the video file and monitoring the playing state of the video file;

and traversing the media track object of the video file to obtain the audio track object of the video file when the monitored playing state of the video file is the ready-to-play state.

3. The method for removing video echo according to claim 1, wherein the mixing the audio track associated with the audio track object of the video file to obtain audio playing data corresponding to the audio track of the video file comprises:

and performing sound mixing operation on the audio track associated with the audio track object of the video file through an audio track sound mixing interface of the system player to obtain audio playing data corresponding to the audio track of the video file.

4. The method for removing video echo according to claim 3, wherein the mixing, through a soundtrack mixing interface of the system player, a soundtrack associated with a soundtrack object of the video file to obtain audio playing data corresponding to the soundtrack of the video file comprises:

reading the audio playing data currently rendered by the audio track according to the storage position of the audio playing data currently rendered by the audio track in the memory of the system player through a rendering callback function in the audio mixing interface of the audio track;

and storing the read audio playing data currently rendered of the audio track into an audio data queue through the rendering callback function so as to obtain the audio playing data corresponding to the audio track, and setting the audio playing data currently rendered of the audio track stored in the storage position to zero.

5. The method of claim 4, wherein before reading the currently rendered audio playback data of the audio track according to the storage location of the currently rendered audio playback data of the audio track in the memory of the system player through a rendering callback function in the soundtrack mixing interface, the method further comprises:

and configuring the rendering callback function, associating the configured rendering callback function with an audio processing object in the audio track and sound mixing interface, and associating the audio processing object with a resource management object of the video file.

6. The method of claim 4, wherein determining audio reference data for an audio echo cancellation process of the video file according to audio playback data corresponding to the audio track of the video file comprises:

and determining audio reference data for video echo elimination processing of the video file according to audio playing data corresponding to the audio track of the video file through an audio reference determination interface of the system player.

7. The method of claim 6, wherein the determining the audio reference data for the video echo cancellation process of the video file according to the audio playing data corresponding to the audio track of the video file through the audio reference determination interface of the system player comprises:

through the input callback function of the audio reference determination interface, according to the storage position of the memory corresponding to the audio reference determination interface, storing the audio playing data in the audio data queue into the memory corresponding to the audio reference determination interface;

and reading audio playing data corresponding to the audio track according to the storage position of the memory corresponding to the audio reference determination interface through the audio reference determination interface, and determining the read audio playing data corresponding to the audio track as the audio reference data.

8. The method of claim 6, wherein before determining the audio reference data for the video echo cancellation process of the video file according to the audio playing data corresponding to the audio track of the video file through the audio reference determination interface of the system player, the method further comprises:

setting the working mode of the audio reference determination interface to be an audio reference determination mode, and setting the audio streaming media format of the audio reference determination interface to be the same as the audio streaming media format of the video file.

9. The method of canceling video echo according to claim 8, wherein said method further comprises:

and transmitting the audio playing data stored in the memory corresponding to the audio reference determination interface to an audio playing device of the terminal equipment for audio playing through the audio reference determination interface.

10. A computer storage medium, wherein the computer storage medium stores a readable program, the readable program being applied to a system player for video playing of a terminal device, the readable program comprising:

instructions for obtaining an audio track object of a video file to be played in the system player;

instructions for performing a mixing operation on a sound track associated with a sound track object of the video file to obtain audio playing data corresponding to the sound track of the video file;

instructions for determining audio reference data for video echo cancellation processing of the video file according to audio play data corresponding to a sound track of the video file;

and the instruction is used for carrying out video echo elimination processing on the audio acquisition data obtained by the audio acquisition device in the terminal equipment by using the audio reference data so as to eliminate the audio playing data in the audio acquisition data.