CN106303357A - The video call method of a kind of far field speech enhan-cement and system - Google Patents
The video call method of a kind of far field speech enhan-cement and system Download PDFInfo
- Publication number
- CN106303357A CN106303357A CN201610770495.5A CN201610770495A CN106303357A CN 106303357 A CN106303357 A CN 106303357A CN 201610770495 A CN201610770495 A CN 201610770495A CN 106303357 A CN106303357 A CN 106303357A
- Authority
- CN
- China
- Prior art keywords
- video
- module
- sound
- filterings
- call
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
Abstract
The present invention provides the video call system of a kind of far field speech enhan-cement, described system to include: at least two video call terminal, and many noise filterings process engine and many noise filterings API manages server;Described many noise filterings process engine, many noise filterings API management server is connected with two video call terminals by communication network;When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be received by video call terminal simultaneously and record;Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof, and the video call terminal of the other end receives the sound of caller's main body after treatment again.The present invention improves the speech quality of the speech data of long-distance video call.
Description
Technical field
The present invention relates to set-top box technique field, particularly relate to the video call method of a kind of far field speech enhan-cement and be
System.
Background technology
Far field voice call, i.e. remote speech call, the especially distance of telephone user's distance microphone 3 to 5 meters, due to
The impact of the interference factor such as noise and/or reverberation, in video call process, the effect of voice is excessively poor.Actual far field voice
Communicate and comprise following noise sources: (1) reverberation noise: sound wave, when indoor propagation, will be reflected by barrier and absorb, finally
Disappearing, we just feel that sound source also has the mixing of several sound waves to continue for some time after stopping sounding, i.e. the reverberation time
(reverberation).The length of reverberation time is the important acoustic characteristic of the buildings such as music hall, theater, auditorium.(2) back of the body
Scape noise: background noise refers to noisy general name in addition to object of study.(3) people's sound interference: environment voice, non-study pair
The sound of elephant.(4) echo noise: sound wave, in communication process, encounters big reflecting surface (inside the wall of building, mountain
Deng) will reflect at interface, people are called echo the reflection sound wave that can distinguish with primary sound.
In sum, during the video calling of far field, the voice in far field needs to be filled into multiple noise, just can obtain pure leading to
The acoustical signal clearly of words participant.
Have Application No.: 201310066421.X disclosed in prior art, patent name be " speech enhan-cement processing method and
Device " Chinese patent, this inventive embodiments provides a kind of speech enhan-cement treating method and apparatus, the method, including decoding ratio
Special stream, obtains the coding parameter of currently pending speech subframe, and coding parameter includes the first algebraic-codebook gain and first adaptive
Answer codebook gain;Adjust the first algebraic-codebook gain, obtain the second algebraic-codebook gain;According to the first self-adapting code book gain and
Second algebraic-codebook gain, determines the second self-adapting code book gain;Use the second algebraic-codebook gain and the second self-adapting code book
The quantization index of gain replaces the bit that in bit stream, the first algebraic-codebook gain is corresponding with the first self-adapting code book gain.This
Bright technical scheme, can be effectively improved the effect abated the noise, and improves voice call quality.But this contrast patent is with this specially
The Technology Ways that profit application is taked is entirely different.
Prior art also discloses " a kind of phone system strengthened based on wireless location Microphone Array Speech and side
Method ", see application number: the Chinese patent of 201310513373.4, this disclosure of the invention is a kind of based on wireless location microphone array
The phone system of speech enhan-cement and method, system includes wireless location transmitter module, wireless location receiver module, microphone array
Speech reception module, speech enhan-cement module, far-end speech playing module and communication module, wherein wireless location transmitter module and nothing
Line position receiver module uses and wirelessly connects, and wireless location receiver module and Microphone Array Speech receiver module are respectively
Being connected with speech enhan-cement module, speech enhan-cement module is connected with communication module, and far-end speech playing module is connected with communication module.
Target sound source is positioned by call method initially with wireless location technology, then the voice of target speaker is used Mike
Wind array carries out speech enhan-cement process and communication.This invention have fast and accurate for positioning, that reinforced effects good, robustness is high etc. is excellent
Point, can be effectively improved the voice quality of existing phone system.Contrast patent is primarily upon the sound localization of microphone array and determines
Strengthen to voice.When present patent application pays close attention to far field video calling, the sound of call body thereof strengthens and the suppression of many noise sources.
Summary of the invention
One of the technical problem to be solved in the present invention, is to provide the video call system of a kind of far field speech enhan-cement, makes
With many noise filterings engine, suppress multiple noise source to strengthen the sound of call body thereof, and then improve the call of far field video calling
Quality.
One of problem of the present invention is achieved in that the video call system of a kind of far field speech enhan-cement, described system bag
Including: at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;Described many
Noise filtering processes engine, many noise filterings API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be simultaneously
Received by video call terminal and record;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
Further, described video call terminal is provided with hardware driving, operating system module, video calling middleware
Module, microphone array recording module, original sound strengthen module, call master voice and noise source separation module, many noises mistake
Filter engine API, call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module,
Typically complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software
Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal
Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make
For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar
Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced
Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into
PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes
Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
Further, described video calling middleware module includes: input equipment management module, audio frequency and video pretreatment mould
Block, audio/video coding module, audio frequency and video packetization module and network transmission module.
Further, the video calling of described far field speech enhan-cement is wanted the input at modules of focused data and defeated
Go out;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process
To the voice data strengthening the far field voice many noise sources of suppression.
Further, described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine
API, the outside many noise filterings engine API of management, safeguard the internal many noise filterings engine of outside many noise filterings engine API adaptation
API;Safeguard the more New Policy of many noise filterings engine API, safeguard the management strategy of outside many noise filterings engine API, be responsible for
Audit the service quality of many noise filterings engine API.
Further, the video calling operation of described far field speech enhan-cement is particularly as follows: the wheat of video call terminal of one end
Sound that gram wind array recording module receives far field video calling participant and the multiple noise source being associated, video calling is eventually
Hold and original voice data is done signal processing and amplifying by original sound enhancing module, then transfer to local or online many noises
Filtration treatment engine processes;Local or online many noise filterings process engine and process first: by call master voice and noise source
Separation module isolates the sound of call body thereof and multiple noise source;And then strengthened logical by described many noise filterings engine API
Talk about the sound of main body and suppress multiple noise source;And then merge module by enhanced by described call master voice and noise source
Multiple noise source after call body thereof sound and suppression merges, and returns to video call terminal;Video call terminal is by video
Voice data after data and process is packaged into the network of applicable network transmission by described video calling audio frequency and video packetization module
Bag, is transferred to the video calling of the other end through the video calling transport module of video calling middleware module by audio, video data
Terminal.
Further, described video call terminal comprises one or more processor, an internal memory, one or more storages
Device, a power supply, one or more adapters, a network interface and a microphone array;Described video call terminal
Also comprising an operating system, operating system comprises some modules that can run on the one or more processors or application;
Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, adapter, network interface, wheat
Gram wind array uses the mode of intraware communication to interconnect;
One or more processors, are configured in video call terminal perform function or process instruction;One or many
Individual processor can process and be stored in internal memory or memorizer instruction;These instructions can be used for operation hardware module, has come
Become specific function or process;
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit
Enter, and the speed memorizer unrelated with the position of memory element of access.
The two of the technical problem to be solved in the present invention, are to provide the video call method of a kind of far field speech enhan-cement, make
With many noise filterings engine, suppress multiple noise source to strengthen the sound of call body thereof, and then improve the call of far field video calling
Quality.
The two of problem of the present invention are achieved in that the video call method of a kind of far field speech enhan-cement, and described method needs
Thering is provided at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and
Multiple noise source can be received by video call terminal simultaneously and record;By many noise filterings process engine to far field sound and
Multiple noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen call body thereof
Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
Further, described video call terminal is provided with hardware driving, operating system module, video calling middleware
Module, microphone array recording module, original sound strengthen module, call master voice and noise source separation module, many noises mistake
Filter engine API, call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module,
Typically complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software
Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal
Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make
For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar
Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced
Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into
PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes
Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
Further, described video calling middleware module includes: input equipment management module, audio frequency and video pretreatment mould
Block, audio/video coding module, audio frequency and video packetization module and network transmission module.
Further, the video calling of described far field speech enhan-cement is wanted the input at modules of focused data and defeated
Go out;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process
To the voice data strengthening the far field voice many noise sources of suppression.
Further, described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine
API, the outside many noise filterings engine API of management, safeguard the internal many noise filterings engine of outside many noise filterings engine API adaptation
API;Safeguard the more New Policy of many noise filterings engine API, safeguard the management strategy of outside many noise filterings engine API, be responsible for
Audit the service quality of many noise filterings engine API.
Further, the video calling operation of described far field speech enhan-cement is particularly as follows: the wheat of video call terminal of one end
Sound that gram wind array recording module receives far field video calling participant and the multiple noise source being associated, video calling is eventually
Hold and original voice data is done signal processing and amplifying by original sound enhancing module, then transfer to local or online many noises
Filtration treatment engine processes;Local or online many noise filterings process engine and process first: by call master voice and noise source
Separation module isolates the sound of call body thereof and multiple noise source;And then strengthened logical by described many noise filterings engine API
Talk about the sound of main body and suppress multiple noise source;And then merge module by enhanced by described call master voice and noise source
Multiple noise source after call body thereof sound and suppression merges, and returns to video call terminal;Video call terminal is by video
Voice data after data and process is packaged into the network of applicable network transmission by described video calling audio frequency and video packetization module
Bag, is transferred to the video calling of the other end through the video calling transport module of video calling middleware module by audio, video data
Terminal.
Further, described video call terminal comprises one or more processor, an internal memory, one or more storages
Device, a power supply, one or more adapters, a network interface and a microphone array;Described video call terminal
Also comprising an operating system, operating system comprises some modules that can run on the one or more processors or application;
Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, adapter, network interface, wheat
Gram wind array uses the mode of intraware communication to interconnect;
One or more processors, are configured in video call terminal perform function or process instruction;One or many
Individual processor can process and be stored in internal memory or memorizer instruction;These instructions can be used for operation hardware module, has come
Become specific function or process;
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit
Enter, and the speed memorizer unrelated with the position of memory element of access.
Present invention have the advantage that video call terminal of the present invention is by Base communication net (the Internet etc.) interconnection mutually
Logical;Video calling comprises many noise filterings engine;Video calling comprises many noise filterings API and manages server;Far field video leads to
During words, caller far field sound and multiple noise source can be received by microphone array simultaneously and record, and caller's master voice is often
Can be flooded by multiple noise source, cause speech quality degradation.The present invention uses many noise filterings engine, suppresses multiple noise
Source strengthens the sound of call body thereof, and then improves the speech quality of far field video calling.
Accompanying drawing explanation
The present invention is further illustrated the most in conjunction with the embodiments.
Fig. 1 is the system overall framework figure of the present invention.
Fig. 2 is the structural representation of each module in video call terminal of the present invention.
Fig. 3 is the schematic flow sheet of the process crossing noise filtering of far field of the present invention speech-enhancement system.
Fig. 4 is the hardware architecture diagram of video call terminal of the present invention.
Fig. 5 is the inventive method operating process schematic diagram.
Detailed description of the invention
Referring to shown in Fig. 1 to Fig. 4, the video call system of a kind of far field speech enhan-cement, described system includes: at least two
Individual video call terminal, many noise filterings process engine and many noise filterings API manages server;Described many noise filterings
Process engine, many noise filterings API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be simultaneously
Received by video call terminal and record;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
Described video call terminal is provided with hardware driving, operating system module, video calling middleware module, Mike
Wind array recording module, original sound strengthen module, call master voice and noise source separation module, many noise filterings engine API,
Call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module
(network-driven, microphone array drives), typically completes to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software
Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;Generally comprise: input
The modules such as equipment control (mike etc.), audio frequency and video pretreatment, audio/video coding, audio frequency and video packing, network transmission.Video calling
The operation of middleware module is with operating system.
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal
Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make
For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar
Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced
Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into
PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes
Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
As it is shown on figure 3, in the present invention, the video calling of described far field speech enhan-cement is wanted focused data at each mould
The input of block and output;
Far field sound input, including: call voice (Cn), environmental noise, echo noise, reverberation noise and many people noise
Sound;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process
To the voice data strengthening the far field voice many noise sources of suppression.
Described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, management
Outside many noise filterings engine API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard
The more New Policy of many noise filterings engine API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit and makes an uproar more
The service quality of sound filter engine API.
As it is shown in figure 5, the video calling operation of the described far field speech enhan-cement of the present invention is particularly as follows: the video calling of one end
Sound that the microphone array recording module of terminal receives far field video calling participant and the multiple noise source being associated, depending on
Frequently call terminal by original voice data by original sound strengthen module do signal processing and amplifying, then transfer to this locality or
The many noise filterings of line process engine and process;Local or online many noise filterings process engine and process first: by call master voice
The sound of call body thereof and multiple noise source is isolated with noise source separation module;And then by described many noise filterings engine
API strengthens the sound of call body thereof and suppresses multiple noise source;And then merge module by described call master voice and noise source
Multiple noise source after enhanced call body thereof sound and suppression is merged, and returns to video call terminal;Video calling
Voice data after video data and process is packaged into applicable network by described video calling audio frequency and video packetization module by terminal
The network packet of transmission, audio, video data is transferred to the other end by the video calling transport module through video calling middleware module
Video call terminal.
It addition, the described video call terminal of the present invention comprises one or more processor, an internal memory, one or more
Memorizer, a power supply, one or more adapters, a network interface (WIFI/3G/4G) and a microphone array;
Described video call terminal also comprises an operating system, and operating system comprises some can transport on the one or more processors
The module of row or application;Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, company
Connect device, network interface, microphone array use intraware communication mode interconnect (physical connection, two-way communication, two-way behaviour
Make) get up;
One or more processors, can be configured in the video call device of far field perform function or process instruction.
One or more processors can process and be stored in internal memory or memorizer instruction.These instructions may be used for operation hardware
Module, completes specific function or process.
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit
Enter, and the speed memorizer unrelated with the position of memory element of access.Internal memory usually used as operating system or other transport
The ephemeral data storage medium of the program in row.Internal memory is a temporary storage medium, for software or program in the process of execution
In, store interim data or instruction.Internal memory typically uses RAM or SRAM.
One or more memorizeies comprise one or more computer-readable storage medium.One or more memorizeies are used
In perdurable data or the storage of information.One or more memorizeies include non-volatile memory medium, such as: hard disk, SSD,
Flash, EEPROM etc.).
Far field video call device can comprise network interface.Network interface is used for LAN or wan communication.WIFI
For local area network communication.3G/4G module is used for wan communication.Far field video call device by network interface can be outside
Far field video call device equipment communication (mobile phone/flat board/TV/Set Top Box/video calling server etc.)
Far field video call device can comprise adapter (connection of WIFI network, bluetooth, GLONASS, FM
Radio reception)
Far field video call device can comprise power supply, and power supply is probably rechargeable battery, and battery is probably lithium battery, stone
Ink alkene or other suitable materials are made.Power supply may comprise a transformator, and external power source can change into the electricity of suitable charging
Source.
Far field video call device can comprise microphone array, and microphone array is to be coupled by the signal of two mikes
It it is a signal.Using this technology, sound wave was carried out by the difference that two mikes can be utilized to receive between the phase place of sound wave
Filter, can filter environmental background sound to greatest extent, the most remaining sound wave needed.Have employed for using in a noisy environment
The equipment of this configuration, can make hearer sound in a noisy environment and be apparent from, not have noise.
In the video call device of far field, processor, internal memory, memorizer, power supply, adapter be required for system is run
Mini system.Network interface (WIFI/3G/4G), microphone array is the hardware foundation realizing far field video call function.
Operating system (Linux and Android) controls the operation of hardware module in the video call device of far field.Operating system
Can control to be encapsulated in hardware driving layer by operation complicated and changeable for hardware.Keep the unification that operating system layer hardware interface calls.
Operating system is the interface of user and computer, is also the interface of computer hardware and other softwares simultaneously.The merit of operating system
Can include managing the hardware of computer system, software and data resource, control program and run, improve man machine interface, should for other
There is provided with software and support, allow all resources of computer system play a role to greatest extent, it is provided that various forms of user interfaces,
Making user have a good working environment, the exploitation for other software provides necessary service and corresponding interface etc..
Referring to shown in Fig. 4 and Fig. 5, the video call method of a kind of far field speech enhan-cement of the present invention, described method needs
Thering is provided at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and
Multiple noise source can be received by video call terminal simultaneously and record;By many noise filterings process engine to far field sound and
Multiple noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen call body thereof
Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
Described video call terminal is provided with hardware driving, operating system module, video calling middleware module, Mike
Wind array recording module, original sound strengthen module, call master voice and noise source separation module, many noise filterings engine API,
Call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module
(network-driven, microphone array drives), typically completes to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software
Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;Generally comprise: input
The modules such as equipment control (mike etc.), audio frequency and video pretreatment, audio/video coding, audio frequency and video packing, network transmission.Video calling
The operation of middleware module is with operating system.
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal
Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make
For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar
Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced
Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into
PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes
Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
In the present invention, the video calling of described far field speech enhan-cement is wanted focused data the input at modules and
Output;
Far field sound input, including: call voice (Cn), environmental noise, echo noise, reverberation noise and many people noise
Sound;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process
To the voice data strengthening the far field voice many noise sources of suppression.
Described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, management
Outside many noise filterings engine API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard
The more New Policy of many noise filterings engine API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit and makes an uproar more
The service quality of sound filter engine API.
As it is shown in figure 5, the video calling operation of the described far field speech enhan-cement of the present invention is particularly as follows: the video calling of one end
Sound that the microphone array recording module of terminal receives far field video calling participant and the multiple noise source being associated, depending on
Frequently call terminal by original voice data by original sound strengthen module do signal processing and amplifying, then transfer to this locality or
The many noise filterings of line process engine and process;Local or online many noise filterings process engine and process first: by call master voice
The sound of call body thereof and multiple noise source is isolated with noise source separation module;And then by described many noise filterings engine
API strengthens the sound of call body thereof and suppresses multiple noise source;And then merge module by described call master voice and noise source
Multiple noise source after enhanced call body thereof sound and suppression is merged, and returns to video call terminal;Video calling
Voice data after video data and process is packaged into applicable network by described video calling audio frequency and video packetization module by terminal
The network packet of transmission, audio, video data is transferred to the other end by the video calling transport module through video calling middleware module
Video call terminal.
It addition, as shown in Figure 4, the described video call terminal of the present invention comprises one or more processor, an internal memory,
One or more memorizeies, a power supply, one or more adapters, a network interface (WIFI/3G/4G) and a wheat
Gram wind array;Described video call terminal also comprises an operating system, operating system comprise some can be one or more
The module run on processor or application;Video call terminal can comprise standby wakeup module, described processor, internal memory, storage
Device, power supply, adapter, network interface, microphone array use the mode of intraware communication to interconnect (physical connection, two-way
Letter, bidirectional operation) get up;
One or more processors, can be configured in the video call device of far field perform function or process instruction.
One or more processors can process and be stored in internal memory or memorizer instruction.These instructions may be used for operation hardware
Module, completes specific function or process.
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit
Enter, and the speed memorizer unrelated with the position of memory element of access.Internal memory usually used as operating system or other transport
The ephemeral data storage medium of the program in row.Internal memory is a temporary storage medium, for software or program in the process of execution
In, store interim data or instruction.Internal memory typically uses RAM or SRAM.
One or more memorizeies comprise one or more computer-readable storage medium.One or more memorizeies are used
In perdurable data or the storage of information.One or more memorizeies include non-volatile memory medium, such as: hard disk, SSD,
Flash, EEPROM etc.).
Far field video call device can comprise network interface.Network interface is used for LAN or wan communication.WIFI
For local area network communication.3G/4G module is used for wan communication.Far field video call device by network interface can be outside
Far field video call device equipment communication (mobile phone/flat board/TV/Set Top Box/video calling server etc.)
Far field video call device can comprise adapter (connection of WIFI network, bluetooth, GLONASS, FM
Radio reception)
Far field video call device can comprise power supply, and power supply is probably rechargeable battery, and battery is probably lithium battery, stone
Ink alkene or other suitable materials are made.Power supply may comprise a transformator, and external power source can change into the electricity of suitable charging
Source.
Far field video call device can comprise microphone array, and microphone array is to be coupled by the signal of two mikes
It it is a signal.Using this technology, sound wave was carried out by the difference that two mikes can be utilized to receive between the phase place of sound wave
Filter, can filter environmental background sound to greatest extent, the most remaining sound wave needed.Have employed for using in a noisy environment
The equipment of this configuration, can make hearer sound in a noisy environment and be apparent from, not have noise.
In the video call device of far field, processor, internal memory, memorizer, power supply, adapter be required for system is run
Mini system.Network interface (WIFI/3G/4G), microphone array is the hardware foundation realizing far field video call function.
Operating system (Linux and Android) controls the operation of hardware module in the video call device of far field.Operating system
Can control to be encapsulated in hardware driving layer by operation complicated and changeable for hardware.Keep the unification that operating system layer hardware interface calls.
Operating system is the interface of user and computer, is also the interface of computer hardware and other softwares simultaneously.The merit of operating system
Can include managing the hardware of computer system, software and data resource, control program and run, improve man machine interface, should for other
There is provided with software and support, allow all resources of computer system play a role to greatest extent, it is provided that various forms of user interfaces,
Making user have a good working environment, the exploitation for other software provides necessary service and corresponding interface etc..
Although the foregoing describing the detailed description of the invention of the present invention, but those familiar with the art should managing
Solving, our described specific embodiment is merely exemplary rather than for the restriction to the scope of the present invention, is familiar with this
The technical staff in field, in the equivalent modification made according to the spirit of the present invention and change, should be contained the present invention's
In scope of the claimed protection.
Claims (12)
1. the video call system of a far field speech enhan-cement, it is characterised in that: described system includes: at least two video calling
Terminal, many noise filterings process engine and many noise filterings API manages server;Described many noise filterings process engine, many
Noise filtering API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be regarded simultaneously
Frequently call terminal receives and records;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
The video call system of a kind of far field the most according to claim 1 speech enhan-cement, it is characterised in that: described video leads to
Telephone terminal is provided with hardware driving, operating system module, video calling middleware module, microphone array recording module, former
Beginning sound strengthens module, call master voice and noise source separation module, many noise filterings engine API, call master voice and noise
Source merges module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, typically
Complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the foundation ring of running software
Border;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, is strengthened, i.e. at the amplification of acoustical signal by the original sound recorded
Reason;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound as defeated
Enter, output master voice and noise source;
Described many noise filterings engine API: effect is as input, output master voice voice and noise using enhancing original sound
Source;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced main sound
Sound and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then it is packaged into PES stream;
Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into the TS of applicable network transmission again
Stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described video leads to
Words middleware module includes: input equipment management module, audio frequency and video pretreatment module, audio/video coding module, audio frequency and video packing
Module and network transmission module.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described far field language
The video calling that sound strengthens is wanted the input at modules and the output of focused data;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, are increased after process
Strong far field voice suppresses the voice data of many noise sources.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described many noises
Filter API management server and mainly have following functions: safeguard many noise filterings engine API, the outside many noise filterings engine of management
API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard many noise filterings engine API
More New Policy, safeguard the management strategy of outside many noise filterings engine API, be responsible for the clothes of audit many noise filterings engine API
Business quality.
The video call system of a kind of far field the most according to claim 1 speech enhan-cement, it is characterised in that: described far field language
Video calling operation that sound strengthens is particularly as follows: the microphone array recording module of video call terminal of one end receives far field video
The call sound of participant and the multiple noise source being associated, video call terminal by original voice data by original sound
Sound strengthens module and does signal processing and amplifying, then transfers to local or online many noise filterings to process engine and processes;Local or online
Many noise filterings process engine and process first: isolated the sound of call body thereof by call master voice and noise source separation module
With multiple noise source;And then strengthen the sound of call body thereof by described many noise filterings engine API and suppress multiple noise source;
And then merge module by described call master voice and noise source multiple after enhanced call body thereof sound and suppression is made an uproar
Source of sound merges, and returns to video call terminal;Voice data after video data and process is passed through institute by video call terminal
State video calling audio frequency and video packetization module and be packaged into the network packet of applicable network transmission, through regarding of video calling middleware module
Frequently audio, video data is transferred to the video call terminal of the other end by call transfer module.
7. the video call method of a far field speech enhan-cement, it is characterised in that: described method need to provide at least two video to lead to
Telephone terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and multiple
Noise source can be received by video call terminal simultaneously and record;Engine is being processed to far field sound and multiple by many noise filterings
Noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof
Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
The video call method of a kind of far field the most according to claim 7 speech enhan-cement, it is characterised in that: described video leads to
Telephone terminal is provided with hardware driving, operating system module, video calling middleware module, microphone array recording module, former
Beginning sound strengthens module, call master voice and noise source separation module, many noise filterings engine API, call master voice and noise
Source merges module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, typically
Complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the foundation ring of running software
Border;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, is strengthened, i.e. at the amplification of acoustical signal by the original sound recorded
Reason;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound as defeated
Enter, output master voice and noise source;
Described many noise filterings engine API: effect is as input, output master voice voice and noise using enhancing original sound
Source;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced main sound
Sound and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then it is packaged into PES stream;
Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into the TS of applicable network transmission again
Stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
The video call method of a kind of far field the most according to claim 8 speech enhan-cement, it is characterised in that: described video leads to
Words middleware module includes: input equipment management module, audio frequency and video pretreatment module, audio/video coding module, audio frequency and video packing
Module and network transmission module.
The video call method of a kind of far field the most according to claim 8 speech enhan-cement, it is characterised in that: described far field
The video calling of speech enhan-cement is wanted the input at modules and the output of focused data;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, are increased after process
Strong far field voice suppresses the voice data of many noise sources.
The video call method of 11. a kind of far field according to claim 8 speech enhan-cement, it is characterised in that: making an uproar described more
Sound filters API management server mainly following functions: safeguard many noise filterings engine API, and the outside many noise filterings of management draw
Hold up API, safeguard the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard many noise filterings engine
The more New Policy of API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit many noise filterings engine API's
Service quality.
The video call method of 12. a kind of far field according to claim 8 speech enhan-cement, it is characterised in that: described far field
Speech enhan-cement video calling operation particularly as follows: one end video call terminal microphone array recording module receive far field regard
The frequency call sound of participant and the multiple noise source being associated, video call terminal by original voice data by original
Sound strengthens module and does signal processing and amplifying, then transfers to local or online many noise filterings to process engine and processes;Local or
The many noise filterings of line process engine and process first: isolated the sound of call body thereof by call master voice and noise source separation module
Sound and multiple noise source;And then strengthen the sound of call body thereof by described many noise filterings engine API and suppress multiple noise
Source;And then merge multiple by after enhanced call body thereof sound and suppression of module by described call master voice and noise source
Noise source merges, and returns to video call terminal;Voice data after video data and process is passed through by video call terminal
Described video calling audio frequency and video packetization module is packaged into the network packet of applicable network transmission, through video calling middleware module
Audio, video data is transferred to the video call terminal of the other end by video calling transport module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610770495.5A CN106303357B (en) | 2016-08-30 | 2016-08-30 | A kind of video call method and system of far field speech enhan-cement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610770495.5A CN106303357B (en) | 2016-08-30 | 2016-08-30 | A kind of video call method and system of far field speech enhan-cement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106303357A true CN106303357A (en) | 2017-01-04 |
CN106303357B CN106303357B (en) | 2019-11-08 |
Family
ID=57674409
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610770495.5A Active CN106303357B (en) | 2016-08-30 | 2016-08-30 | A kind of video call method and system of far field speech enhan-cement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106303357B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481729A (en) * | 2017-09-13 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of method and system that intelligent terminal is upgraded to far field speech-sound intelligent equipment |
CN111556279A (en) * | 2020-05-22 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Monitoring method and communication method of instant session |
CN111988704A (en) * | 2019-05-21 | 2020-11-24 | 北京小米移动软件有限公司 | Sound signal processing method, device and storage medium |
CN113053411A (en) * | 2020-03-30 | 2021-06-29 | 深圳市优克联新技术有限公司 | Voice data processing device, method, system and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140093059A1 (en) * | 2012-09-28 | 2014-04-03 | International Business Machines Corporation | Elimination of typing noise from conference calls |
CN104012074A (en) * | 2011-12-12 | 2014-08-27 | 华为技术有限公司 | Smart audio and video capture systems for data processing systems |
CN203799645U (en) * | 2014-05-05 | 2014-08-27 | 辽宁工业大学 | Microphone-array-based multichannel voice processing apparatus |
CN104269178A (en) * | 2014-08-08 | 2015-01-07 | 华迪计算机集团有限公司 | Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals |
-
2016
- 2016-08-30 CN CN201610770495.5A patent/CN106303357B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104012074A (en) * | 2011-12-12 | 2014-08-27 | 华为技术有限公司 | Smart audio and video capture systems for data processing systems |
US20140093059A1 (en) * | 2012-09-28 | 2014-04-03 | International Business Machines Corporation | Elimination of typing noise from conference calls |
CN203799645U (en) * | 2014-05-05 | 2014-08-27 | 辽宁工业大学 | Microphone-array-based multichannel voice processing apparatus |
CN104269178A (en) * | 2014-08-08 | 2015-01-07 | 华迪计算机集团有限公司 | Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481729A (en) * | 2017-09-13 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of method and system that intelligent terminal is upgraded to far field speech-sound intelligent equipment |
CN111988704A (en) * | 2019-05-21 | 2020-11-24 | 北京小米移动软件有限公司 | Sound signal processing method, device and storage medium |
CN111988704B (en) * | 2019-05-21 | 2021-10-22 | 北京小米移动软件有限公司 | Sound signal processing method, device and storage medium |
CN113053411A (en) * | 2020-03-30 | 2021-06-29 | 深圳市优克联新技术有限公司 | Voice data processing device, method, system and storage medium |
CN113053411B (en) * | 2020-03-30 | 2024-01-16 | 深圳市优克联新技术有限公司 | Voice data processing device, method, system and storage medium |
CN111556279A (en) * | 2020-05-22 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Monitoring method and communication method of instant session |
Also Published As
Publication number | Publication date |
---|---|
CN106303357B (en) | 2019-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108962240B (en) | Voice control method and system based on earphone | |
CN106303357A (en) | The video call method of a kind of far field speech enhan-cement and system | |
CN103337242B (en) | A kind of sound control method and opertaing device | |
CN101277331B (en) | Sound reproducing device and sound reproduction method | |
US8824666B2 (en) | Noise cancellation for phone conversation | |
CN108140399A (en) | Inhibit for the adaptive noise of ultra wide band music | |
CN109147784A (en) | Voice interactive method, equipment and storage medium | |
CN107331402A (en) | A kind of way of recording and sound pick-up outfit based on dual microphone | |
CN107005800A (en) | Transmission, method of reseptance and the device of audio file, equipment and its system | |
CN105438900A (en) | Mobile phone elevator taking system | |
CN103491488A (en) | Echo cancellation method and device for microphone | |
US9812149B2 (en) | Methods and systems for providing consistency in noise reduction during speech and non-speech periods | |
CN103458137A (en) | Systems and methods for voice enhancement in audio conference | |
CN102624961A (en) | Method and terminal for preventing sound crosstalk between external loudspeaker and microphone | |
CN109165004A (en) | Double screen terminal audio frequency output method, terminal and computer readable storage medium | |
US10354673B2 (en) | Noise reduction method and electronic device | |
CN103298143A (en) | Method and system for achieving multi-party call and mobile terminal | |
CN107240396A (en) | Speaker adaptation method, device, equipment and storage medium | |
CN101834923A (en) | Method for controlling sound playing of mobile terminal and mobile terminal | |
CN201294570Y (en) | Echo elimination device for teleconference system | |
CN103402038B (en) | Under Mobile phone hand-free state, eliminate method and the device of the echo of the other side's receiver | |
CN201717913U (en) | Mobile terminal | |
CN101924581A (en) | Communication device | |
CN103327173A (en) | Voice controlling method and device of mobile terminal | |
CN101360155A (en) | Active noise silencing control system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 350000 building, No. 89, software Avenue, Gulou District, Fujian, Fuzhou 18, China Patentee after: Ruixin Microelectronics Co., Ltd Address before: 350000 building, No. 89, software Avenue, Gulou District, Fujian, Fuzhou 18, China Patentee before: Fuzhou Rockchips Electronics Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |