CN106303357A - The video call method of a kind of far field speech enhan-cement and system - Google Patents

The video call method of a kind of far field speech enhan-cement and system Download PDF

Info

Publication number
CN106303357A
CN106303357A CN201610770495.5A CN201610770495A CN106303357A CN 106303357 A CN106303357 A CN 106303357A CN 201610770495 A CN201610770495 A CN 201610770495A CN 106303357 A CN106303357 A CN 106303357A
Authority
CN
China
Prior art keywords
video
module
sound
filterings
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610770495.5A
Other languages
Chinese (zh)
Other versions
CN106303357B (en
Inventor
洪涛
孙铭俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rockchip Electronics Co Ltd
Original Assignee
Fuzhou Rockchip Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Rockchip Electronics Co Ltd filed Critical Fuzhou Rockchip Electronics Co Ltd
Priority to CN201610770495.5A priority Critical patent/CN106303357B/en
Publication of CN106303357A publication Critical patent/CN106303357A/en
Application granted granted Critical
Publication of CN106303357B publication Critical patent/CN106303357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Abstract

The present invention provides the video call system of a kind of far field speech enhan-cement, described system to include: at least two video call terminal, and many noise filterings process engine and many noise filterings API manages server;Described many noise filterings process engine, many noise filterings API management server is connected with two video call terminals by communication network;When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be received by video call terminal simultaneously and record;Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof, and the video call terminal of the other end receives the sound of caller's main body after treatment again.The present invention improves the speech quality of the speech data of long-distance video call.

Description

The video call method of a kind of far field speech enhan-cement and system
Technical field
The present invention relates to set-top box technique field, particularly relate to the video call method of a kind of far field speech enhan-cement and be System.
Background technology
Far field voice call, i.e. remote speech call, the especially distance of telephone user's distance microphone 3 to 5 meters, due to The impact of the interference factor such as noise and/or reverberation, in video call process, the effect of voice is excessively poor.Actual far field voice Communicate and comprise following noise sources: (1) reverberation noise: sound wave, when indoor propagation, will be reflected by barrier and absorb, finally Disappearing, we just feel that sound source also has the mixing of several sound waves to continue for some time after stopping sounding, i.e. the reverberation time (reverberation).The length of reverberation time is the important acoustic characteristic of the buildings such as music hall, theater, auditorium.(2) back of the body Scape noise: background noise refers to noisy general name in addition to object of study.(3) people's sound interference: environment voice, non-study pair The sound of elephant.(4) echo noise: sound wave, in communication process, encounters big reflecting surface (inside the wall of building, mountain Deng) will reflect at interface, people are called echo the reflection sound wave that can distinguish with primary sound.
In sum, during the video calling of far field, the voice in far field needs to be filled into multiple noise, just can obtain pure leading to The acoustical signal clearly of words participant.
Have Application No.: 201310066421.X disclosed in prior art, patent name be " speech enhan-cement processing method and Device " Chinese patent, this inventive embodiments provides a kind of speech enhan-cement treating method and apparatus, the method, including decoding ratio Special stream, obtains the coding parameter of currently pending speech subframe, and coding parameter includes the first algebraic-codebook gain and first adaptive Answer codebook gain;Adjust the first algebraic-codebook gain, obtain the second algebraic-codebook gain;According to the first self-adapting code book gain and Second algebraic-codebook gain, determines the second self-adapting code book gain;Use the second algebraic-codebook gain and the second self-adapting code book The quantization index of gain replaces the bit that in bit stream, the first algebraic-codebook gain is corresponding with the first self-adapting code book gain.This Bright technical scheme, can be effectively improved the effect abated the noise, and improves voice call quality.But this contrast patent is with this specially The Technology Ways that profit application is taked is entirely different.
Prior art also discloses " a kind of phone system strengthened based on wireless location Microphone Array Speech and side Method ", see application number: the Chinese patent of 201310513373.4, this disclosure of the invention is a kind of based on wireless location microphone array The phone system of speech enhan-cement and method, system includes wireless location transmitter module, wireless location receiver module, microphone array Speech reception module, speech enhan-cement module, far-end speech playing module and communication module, wherein wireless location transmitter module and nothing Line position receiver module uses and wirelessly connects, and wireless location receiver module and Microphone Array Speech receiver module are respectively Being connected with speech enhan-cement module, speech enhan-cement module is connected with communication module, and far-end speech playing module is connected with communication module. Target sound source is positioned by call method initially with wireless location technology, then the voice of target speaker is used Mike Wind array carries out speech enhan-cement process and communication.This invention have fast and accurate for positioning, that reinforced effects good, robustness is high etc. is excellent Point, can be effectively improved the voice quality of existing phone system.Contrast patent is primarily upon the sound localization of microphone array and determines Strengthen to voice.When present patent application pays close attention to far field video calling, the sound of call body thereof strengthens and the suppression of many noise sources.
Summary of the invention
One of the technical problem to be solved in the present invention, is to provide the video call system of a kind of far field speech enhan-cement, makes With many noise filterings engine, suppress multiple noise source to strengthen the sound of call body thereof, and then improve the call of far field video calling Quality.
One of problem of the present invention is achieved in that the video call system of a kind of far field speech enhan-cement, described system bag Including: at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;Described many Noise filtering processes engine, many noise filterings API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be simultaneously Received by video call terminal and record;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
Further, described video call terminal is provided with hardware driving, operating system module, video calling middleware Module, microphone array recording module, original sound strengthen module, call master voice and noise source separation module, many noises mistake Filter engine API, call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, Typically complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
Further, described video calling middleware module includes: input equipment management module, audio frequency and video pretreatment mould Block, audio/video coding module, audio frequency and video packetization module and network transmission module.
Further, the video calling of described far field speech enhan-cement is wanted the input at modules of focused data and defeated Go out;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process To the voice data strengthening the far field voice many noise sources of suppression.
Further, described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, the outside many noise filterings engine API of management, safeguard the internal many noise filterings engine of outside many noise filterings engine API adaptation API;Safeguard the more New Policy of many noise filterings engine API, safeguard the management strategy of outside many noise filterings engine API, be responsible for Audit the service quality of many noise filterings engine API.
Further, the video calling operation of described far field speech enhan-cement is particularly as follows: the wheat of video call terminal of one end Sound that gram wind array recording module receives far field video calling participant and the multiple noise source being associated, video calling is eventually Hold and original voice data is done signal processing and amplifying by original sound enhancing module, then transfer to local or online many noises Filtration treatment engine processes;Local or online many noise filterings process engine and process first: by call master voice and noise source Separation module isolates the sound of call body thereof and multiple noise source;And then strengthened logical by described many noise filterings engine API Talk about the sound of main body and suppress multiple noise source;And then merge module by enhanced by described call master voice and noise source Multiple noise source after call body thereof sound and suppression merges, and returns to video call terminal;Video call terminal is by video Voice data after data and process is packaged into the network of applicable network transmission by described video calling audio frequency and video packetization module Bag, is transferred to the video calling of the other end through the video calling transport module of video calling middleware module by audio, video data Terminal.
Further, described video call terminal comprises one or more processor, an internal memory, one or more storages Device, a power supply, one or more adapters, a network interface and a microphone array;Described video call terminal Also comprising an operating system, operating system comprises some modules that can run on the one or more processors or application; Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, adapter, network interface, wheat Gram wind array uses the mode of intraware communication to interconnect;
One or more processors, are configured in video call terminal perform function or process instruction;One or many Individual processor can process and be stored in internal memory or memorizer instruction;These instructions can be used for operation hardware module, has come Become specific function or process;
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit Enter, and the speed memorizer unrelated with the position of memory element of access.
The two of the technical problem to be solved in the present invention, are to provide the video call method of a kind of far field speech enhan-cement, make With many noise filterings engine, suppress multiple noise source to strengthen the sound of call body thereof, and then improve the call of far field video calling Quality.
The two of problem of the present invention are achieved in that the video call method of a kind of far field speech enhan-cement, and described method needs Thering is provided at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and Multiple noise source can be received by video call terminal simultaneously and record;By many noise filterings process engine to far field sound and Multiple noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen call body thereof Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
Further, described video call terminal is provided with hardware driving, operating system module, video calling middleware Module, microphone array recording module, original sound strengthen module, call master voice and noise source separation module, many noises mistake Filter engine API, call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, Typically complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
Further, described video calling middleware module includes: input equipment management module, audio frequency and video pretreatment mould Block, audio/video coding module, audio frequency and video packetization module and network transmission module.
Further, the video calling of described far field speech enhan-cement is wanted the input at modules of focused data and defeated Go out;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process To the voice data strengthening the far field voice many noise sources of suppression.
Further, described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, the outside many noise filterings engine API of management, safeguard the internal many noise filterings engine of outside many noise filterings engine API adaptation API;Safeguard the more New Policy of many noise filterings engine API, safeguard the management strategy of outside many noise filterings engine API, be responsible for Audit the service quality of many noise filterings engine API.
Further, the video calling operation of described far field speech enhan-cement is particularly as follows: the wheat of video call terminal of one end Sound that gram wind array recording module receives far field video calling participant and the multiple noise source being associated, video calling is eventually Hold and original voice data is done signal processing and amplifying by original sound enhancing module, then transfer to local or online many noises Filtration treatment engine processes;Local or online many noise filterings process engine and process first: by call master voice and noise source Separation module isolates the sound of call body thereof and multiple noise source;And then strengthened logical by described many noise filterings engine API Talk about the sound of main body and suppress multiple noise source;And then merge module by enhanced by described call master voice and noise source Multiple noise source after call body thereof sound and suppression merges, and returns to video call terminal;Video call terminal is by video Voice data after data and process is packaged into the network of applicable network transmission by described video calling audio frequency and video packetization module Bag, is transferred to the video calling of the other end through the video calling transport module of video calling middleware module by audio, video data Terminal.
Further, described video call terminal comprises one or more processor, an internal memory, one or more storages Device, a power supply, one or more adapters, a network interface and a microphone array;Described video call terminal Also comprising an operating system, operating system comprises some modules that can run on the one or more processors or application; Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, adapter, network interface, wheat Gram wind array uses the mode of intraware communication to interconnect;
One or more processors, are configured in video call terminal perform function or process instruction;One or many Individual processor can process and be stored in internal memory or memorizer instruction;These instructions can be used for operation hardware module, has come Become specific function or process;
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit Enter, and the speed memorizer unrelated with the position of memory element of access.
Present invention have the advantage that video call terminal of the present invention is by Base communication net (the Internet etc.) interconnection mutually Logical;Video calling comprises many noise filterings engine;Video calling comprises many noise filterings API and manages server;Far field video leads to During words, caller far field sound and multiple noise source can be received by microphone array simultaneously and record, and caller's master voice is often Can be flooded by multiple noise source, cause speech quality degradation.The present invention uses many noise filterings engine, suppresses multiple noise Source strengthens the sound of call body thereof, and then improves the speech quality of far field video calling.
Accompanying drawing explanation
The present invention is further illustrated the most in conjunction with the embodiments.
Fig. 1 is the system overall framework figure of the present invention.
Fig. 2 is the structural representation of each module in video call terminal of the present invention.
Fig. 3 is the schematic flow sheet of the process crossing noise filtering of far field of the present invention speech-enhancement system.
Fig. 4 is the hardware architecture diagram of video call terminal of the present invention.
Fig. 5 is the inventive method operating process schematic diagram.
Detailed description of the invention
Referring to shown in Fig. 1 to Fig. 4, the video call system of a kind of far field speech enhan-cement, described system includes: at least two Individual video call terminal, many noise filterings process engine and many noise filterings API manages server;Described many noise filterings Process engine, many noise filterings API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be simultaneously Received by video call terminal and record;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
Described video call terminal is provided with hardware driving, operating system module, video calling middleware module, Mike Wind array recording module, original sound strengthen module, call master voice and noise source separation module, many noise filterings engine API, Call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module (network-driven, microphone array drives), typically completes to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;Generally comprise: input The modules such as equipment control (mike etc.), audio frequency and video pretreatment, audio/video coding, audio frequency and video packing, network transmission.Video calling The operation of middleware module is with operating system.
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
As it is shown on figure 3, in the present invention, the video calling of described far field speech enhan-cement is wanted focused data at each mould The input of block and output;
Far field sound input, including: call voice (Cn), environmental noise, echo noise, reverberation noise and many people noise Sound;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process To the voice data strengthening the far field voice many noise sources of suppression.
Described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, management Outside many noise filterings engine API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard The more New Policy of many noise filterings engine API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit and makes an uproar more The service quality of sound filter engine API.
As it is shown in figure 5, the video calling operation of the described far field speech enhan-cement of the present invention is particularly as follows: the video calling of one end Sound that the microphone array recording module of terminal receives far field video calling participant and the multiple noise source being associated, depending on Frequently call terminal by original voice data by original sound strengthen module do signal processing and amplifying, then transfer to this locality or The many noise filterings of line process engine and process;Local or online many noise filterings process engine and process first: by call master voice The sound of call body thereof and multiple noise source is isolated with noise source separation module;And then by described many noise filterings engine API strengthens the sound of call body thereof and suppresses multiple noise source;And then merge module by described call master voice and noise source Multiple noise source after enhanced call body thereof sound and suppression is merged, and returns to video call terminal;Video calling Voice data after video data and process is packaged into applicable network by described video calling audio frequency and video packetization module by terminal The network packet of transmission, audio, video data is transferred to the other end by the video calling transport module through video calling middleware module Video call terminal.
It addition, the described video call terminal of the present invention comprises one or more processor, an internal memory, one or more Memorizer, a power supply, one or more adapters, a network interface (WIFI/3G/4G) and a microphone array; Described video call terminal also comprises an operating system, and operating system comprises some can transport on the one or more processors The module of row or application;Video call terminal can comprise standby wakeup module, described processor, internal memory, memorizer, power supply, company Connect device, network interface, microphone array use intraware communication mode interconnect (physical connection, two-way communication, two-way behaviour Make) get up;
One or more processors, can be configured in the video call device of far field perform function or process instruction. One or more processors can process and be stored in internal memory or memorizer instruction.These instructions may be used for operation hardware Module, completes specific function or process.
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit Enter, and the speed memorizer unrelated with the position of memory element of access.Internal memory usually used as operating system or other transport The ephemeral data storage medium of the program in row.Internal memory is a temporary storage medium, for software or program in the process of execution In, store interim data or instruction.Internal memory typically uses RAM or SRAM.
One or more memorizeies comprise one or more computer-readable storage medium.One or more memorizeies are used In perdurable data or the storage of information.One or more memorizeies include non-volatile memory medium, such as: hard disk, SSD, Flash, EEPROM etc.).
Far field video call device can comprise network interface.Network interface is used for LAN or wan communication.WIFI For local area network communication.3G/4G module is used for wan communication.Far field video call device by network interface can be outside Far field video call device equipment communication (mobile phone/flat board/TV/Set Top Box/video calling server etc.)
Far field video call device can comprise adapter (connection of WIFI network, bluetooth, GLONASS, FM Radio reception)
Far field video call device can comprise power supply, and power supply is probably rechargeable battery, and battery is probably lithium battery, stone Ink alkene or other suitable materials are made.Power supply may comprise a transformator, and external power source can change into the electricity of suitable charging Source.
Far field video call device can comprise microphone array, and microphone array is to be coupled by the signal of two mikes It it is a signal.Using this technology, sound wave was carried out by the difference that two mikes can be utilized to receive between the phase place of sound wave Filter, can filter environmental background sound to greatest extent, the most remaining sound wave needed.Have employed for using in a noisy environment The equipment of this configuration, can make hearer sound in a noisy environment and be apparent from, not have noise.
In the video call device of far field, processor, internal memory, memorizer, power supply, adapter be required for system is run Mini system.Network interface (WIFI/3G/4G), microphone array is the hardware foundation realizing far field video call function.
Operating system (Linux and Android) controls the operation of hardware module in the video call device of far field.Operating system Can control to be encapsulated in hardware driving layer by operation complicated and changeable for hardware.Keep the unification that operating system layer hardware interface calls. Operating system is the interface of user and computer, is also the interface of computer hardware and other softwares simultaneously.The merit of operating system Can include managing the hardware of computer system, software and data resource, control program and run, improve man machine interface, should for other There is provided with software and support, allow all resources of computer system play a role to greatest extent, it is provided that various forms of user interfaces, Making user have a good working environment, the exploitation for other software provides necessary service and corresponding interface etc..
Referring to shown in Fig. 4 and Fig. 5, the video call method of a kind of far field speech enhan-cement of the present invention, described method needs Thering is provided at least two video call terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and Multiple noise source can be received by video call terminal simultaneously and record;By many noise filterings process engine to far field sound and Multiple noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen call body thereof Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
Described video call terminal is provided with hardware driving, operating system module, video calling middleware module, Mike Wind array recording module, original sound strengthen module, call master voice and noise source separation module, many noise filterings engine API, Call master voice and noise source merge module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module (network-driven, microphone array drives), typically completes to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the base of running software Plinth environment;
Described video calling middleware module: there is the software kit of video call function basic function;Generally comprise: input The modules such as equipment control (mike etc.), audio frequency and video pretreatment, audio/video coding, audio frequency and video packing, network transmission.Video calling The operation of middleware module is with operating system.
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, putting of the original sound enhancing that will record, i.e. acoustical signal Big process;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound and make For input, output master voice and noise source;
Described many noise filterings engine API: effect is as input using enhancing original sound, exports master voice voice and makes an uproar Source of sound;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced Master voice and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then is packaged into PES flows;Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into applicable network again and passes Defeated TS stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
In the present invention, the video calling of described far field speech enhan-cement is wanted focused data the input at modules and Output;
Far field sound input, including: call voice (Cn), environmental noise, echo noise, reverberation noise and many people noise Sound;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, after process To the voice data strengthening the far field voice many noise sources of suppression.
Described many noise filterings API management server mainly has following functions: safeguard many noise filterings engine API, management Outside many noise filterings engine API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard The more New Policy of many noise filterings engine API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit and makes an uproar more The service quality of sound filter engine API.
As it is shown in figure 5, the video calling operation of the described far field speech enhan-cement of the present invention is particularly as follows: the video calling of one end Sound that the microphone array recording module of terminal receives far field video calling participant and the multiple noise source being associated, depending on Frequently call terminal by original voice data by original sound strengthen module do signal processing and amplifying, then transfer to this locality or The many noise filterings of line process engine and process;Local or online many noise filterings process engine and process first: by call master voice The sound of call body thereof and multiple noise source is isolated with noise source separation module;And then by described many noise filterings engine API strengthens the sound of call body thereof and suppresses multiple noise source;And then merge module by described call master voice and noise source Multiple noise source after enhanced call body thereof sound and suppression is merged, and returns to video call terminal;Video calling Voice data after video data and process is packaged into applicable network by described video calling audio frequency and video packetization module by terminal The network packet of transmission, audio, video data is transferred to the other end by the video calling transport module through video calling middleware module Video call terminal.
It addition, as shown in Figure 4, the described video call terminal of the present invention comprises one or more processor, an internal memory, One or more memorizeies, a power supply, one or more adapters, a network interface (WIFI/3G/4G) and a wheat Gram wind array;Described video call terminal also comprises an operating system, operating system comprise some can be one or more The module run on processor or application;Video call terminal can comprise standby wakeup module, described processor, internal memory, storage Device, power supply, adapter, network interface, microphone array use the mode of intraware communication to interconnect (physical connection, two-way Letter, bidirectional operation) get up;
One or more processors, can be configured in the video call device of far field perform function or process instruction. One or more processors can process and be stored in internal memory or memorizer instruction.These instructions may be used for operation hardware Module, completes specific function or process.
Internal memory is the internal storage directly exchanging data with CPU, and the content of memory element can on-demand random taking-up or deposit Enter, and the speed memorizer unrelated with the position of memory element of access.Internal memory usually used as operating system or other transport The ephemeral data storage medium of the program in row.Internal memory is a temporary storage medium, for software or program in the process of execution In, store interim data or instruction.Internal memory typically uses RAM or SRAM.
One or more memorizeies comprise one or more computer-readable storage medium.One or more memorizeies are used In perdurable data or the storage of information.One or more memorizeies include non-volatile memory medium, such as: hard disk, SSD, Flash, EEPROM etc.).
Far field video call device can comprise network interface.Network interface is used for LAN or wan communication.WIFI For local area network communication.3G/4G module is used for wan communication.Far field video call device by network interface can be outside Far field video call device equipment communication (mobile phone/flat board/TV/Set Top Box/video calling server etc.)
Far field video call device can comprise adapter (connection of WIFI network, bluetooth, GLONASS, FM Radio reception)
Far field video call device can comprise power supply, and power supply is probably rechargeable battery, and battery is probably lithium battery, stone Ink alkene or other suitable materials are made.Power supply may comprise a transformator, and external power source can change into the electricity of suitable charging Source.
Far field video call device can comprise microphone array, and microphone array is to be coupled by the signal of two mikes It it is a signal.Using this technology, sound wave was carried out by the difference that two mikes can be utilized to receive between the phase place of sound wave Filter, can filter environmental background sound to greatest extent, the most remaining sound wave needed.Have employed for using in a noisy environment The equipment of this configuration, can make hearer sound in a noisy environment and be apparent from, not have noise.
In the video call device of far field, processor, internal memory, memorizer, power supply, adapter be required for system is run Mini system.Network interface (WIFI/3G/4G), microphone array is the hardware foundation realizing far field video call function.
Operating system (Linux and Android) controls the operation of hardware module in the video call device of far field.Operating system Can control to be encapsulated in hardware driving layer by operation complicated and changeable for hardware.Keep the unification that operating system layer hardware interface calls. Operating system is the interface of user and computer, is also the interface of computer hardware and other softwares simultaneously.The merit of operating system Can include managing the hardware of computer system, software and data resource, control program and run, improve man machine interface, should for other There is provided with software and support, allow all resources of computer system play a role to greatest extent, it is provided that various forms of user interfaces, Making user have a good working environment, the exploitation for other software provides necessary service and corresponding interface etc..
Although the foregoing describing the detailed description of the invention of the present invention, but those familiar with the art should managing Solving, our described specific embodiment is merely exemplary rather than for the restriction to the scope of the present invention, is familiar with this The technical staff in field, in the equivalent modification made according to the spirit of the present invention and change, should be contained the present invention's In scope of the claimed protection.

Claims (12)

1. the video call system of a far field speech enhan-cement, it is characterised in that: described system includes: at least two video calling Terminal, many noise filterings process engine and many noise filterings API manages server;Described many noise filterings process engine, many Noise filtering API management server is connected with two video call terminals by communication network;
When the video call terminal of one end carries out far field video calling, caller far field sound and multiple noise source can be regarded simultaneously Frequently call terminal receives and records;
Many noise filterings process engine and far field sound and multiple noise source are carried out filtration treatment;
Many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof,
The video call terminal of the other end receives the sound of caller's main body after treatment again.
The video call system of a kind of far field the most according to claim 1 speech enhan-cement, it is characterised in that: described video leads to Telephone terminal is provided with hardware driving, operating system module, video calling middleware module, microphone array recording module, former Beginning sound strengthens module, call master voice and noise source separation module, many noise filterings engine API, call master voice and noise Source merges module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, typically Complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the foundation ring of running software Border;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, is strengthened, i.e. at the amplification of acoustical signal by the original sound recorded Reason;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound as defeated Enter, output master voice and noise source;
Described many noise filterings engine API: effect is as input, output master voice voice and noise using enhancing original sound Source;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced main sound Sound and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then it is packaged into PES stream; Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into the TS of applicable network transmission again Stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described video leads to Words middleware module includes: input equipment management module, audio frequency and video pretreatment module, audio/video coding module, audio frequency and video packing Module and network transmission module.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described far field language The video calling that sound strengthens is wanted the input at modules and the output of focused data;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, are increased after process Strong far field voice suppresses the voice data of many noise sources.
The video call system of a kind of far field the most according to claim 2 speech enhan-cement, it is characterised in that: described many noises Filter API management server and mainly have following functions: safeguard many noise filterings engine API, the outside many noise filterings engine of management API, safeguards the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard many noise filterings engine API More New Policy, safeguard the management strategy of outside many noise filterings engine API, be responsible for the clothes of audit many noise filterings engine API Business quality.
The video call system of a kind of far field the most according to claim 1 speech enhan-cement, it is characterised in that: described far field language Video calling operation that sound strengthens is particularly as follows: the microphone array recording module of video call terminal of one end receives far field video The call sound of participant and the multiple noise source being associated, video call terminal by original voice data by original sound Sound strengthens module and does signal processing and amplifying, then transfers to local or online many noise filterings to process engine and processes;Local or online Many noise filterings process engine and process first: isolated the sound of call body thereof by call master voice and noise source separation module With multiple noise source;And then strengthen the sound of call body thereof by described many noise filterings engine API and suppress multiple noise source; And then merge module by described call master voice and noise source multiple after enhanced call body thereof sound and suppression is made an uproar Source of sound merges, and returns to video call terminal;Voice data after video data and process is passed through institute by video call terminal State video calling audio frequency and video packetization module and be packaged into the network packet of applicable network transmission, through regarding of video calling middleware module Frequently audio, video data is transferred to the video call terminal of the other end by call transfer module.
7. the video call method of a far field speech enhan-cement, it is characterised in that: described method need to provide at least two video to lead to Telephone terminal, many noise filterings process engine and many noise filterings API manages server;
When described method is particularly as follows: the video call terminal of one end carries out far field video calling, caller far field sound and multiple Noise source can be received by video call terminal simultaneously and record;Engine is being processed to far field sound and multiple by many noise filterings Noise source carries out filtration treatment;And then many noise filterings API management server suppresses multiple noise source to strengthen the sound of call body thereof Sound, then the sound of caller's main body after processing is sent to the video call terminal of the other end.
The video call method of a kind of far field the most according to claim 7 speech enhan-cement, it is characterised in that: described video leads to Telephone terminal is provided with hardware driving, operating system module, video calling middleware module, microphone array recording module, former Beginning sound strengthens module, call master voice and noise source separation module, many noise filterings engine API, call master voice and noise Source merges module, video calling audio frequency and video packetization module, video calling transport module;
Described hardware driving: equipment includes interiorly or exteriorly hardware module, hardware driving is the drive software of hardware module, typically Complete to initialize in the operating system initialization stage;
Described operating system is that the unified interface of device hardware and hardware interface is abstract, and operating system is the foundation ring of running software Border;
Described video calling middleware module: there is the software kit of video call function basic function;
Described microphone array recording module: the microphone array interface of call operation system, the module of recorded voice;
Described original sound strengthens module: call audio algorithm, is strengthened, i.e. at the amplification of acoustical signal by the original sound recorded Reason;
Described call master voice and noise source separation module: call many noise filterings engine API, will strengthen original sound as defeated Enter, output master voice and noise source;
Described many noise filterings engine API: effect is as input, output master voice voice and noise using enhancing original sound Source;Many noise filterings engine API can be deployed on local device or server;
Described call master voice and noise source merge module: strengthen master voice, and after suppressing noise source, then by enhanced main sound Sound and the noise source after weakening, synthesize a sound;
Described video calling audio frequency and video packetization module: video flowing encodes according to H264/H265 coded system, then it is packaged into PES stream; Audio frequency encodes according to AAC, AC3 coded system, then encapsulates PES stream;The PES stream of audio frequency and video is packaged into the TS of applicable network transmission again Stream;
Described video calling transport module: TS stream transmits in a communication network according to video call service logic.
The video call method of a kind of far field the most according to claim 8 speech enhan-cement, it is characterised in that: described video leads to Words middleware module includes: input equipment management module, audio frequency and video pretreatment module, audio/video coding module, audio frequency and video packing Module and network transmission module.
The video call method of a kind of far field the most according to claim 8 speech enhan-cement, it is characterised in that: described far field The video calling of speech enhan-cement is wanted the input at modules and the output of focused data;
Far field sound input, including: call voice, environmental noise, echo noise, reverberation noise and many voice noise;
Microphone array recording module receives and records above-mentioned far field sound, and the sound of output digit signals;
Digitized far field sound is input to many noise filterings and processes engine;
Many noise filterings process engine and access many noise filterings API management server acquisition many noise filterings engine API;
The outside many noise filterings engine API of many noise filterings API management server admin;
Many noise filterings process engine calling many noise filterings engine API and process digitized far field sound, are increased after process Strong far field voice suppresses the voice data of many noise sources.
The video call method of 11. a kind of far field according to claim 8 speech enhan-cement, it is characterised in that: making an uproar described more Sound filters API management server mainly following functions: safeguard many noise filterings engine API, and the outside many noise filterings of management draw Hold up API, safeguard the internal many noise filterings engine API of outside many noise filterings engine API adaptation;Safeguard many noise filterings engine The more New Policy of API, safeguards the management strategy of outside many noise filterings engine API, is responsible for audit many noise filterings engine API's Service quality.
The video call method of 12. a kind of far field according to claim 8 speech enhan-cement, it is characterised in that: described far field Speech enhan-cement video calling operation particularly as follows: one end video call terminal microphone array recording module receive far field regard The frequency call sound of participant and the multiple noise source being associated, video call terminal by original voice data by original Sound strengthens module and does signal processing and amplifying, then transfers to local or online many noise filterings to process engine and processes;Local or The many noise filterings of line process engine and process first: isolated the sound of call body thereof by call master voice and noise source separation module Sound and multiple noise source;And then strengthen the sound of call body thereof by described many noise filterings engine API and suppress multiple noise Source;And then merge multiple by after enhanced call body thereof sound and suppression of module by described call master voice and noise source Noise source merges, and returns to video call terminal;Voice data after video data and process is passed through by video call terminal Described video calling audio frequency and video packetization module is packaged into the network packet of applicable network transmission, through video calling middleware module Audio, video data is transferred to the video call terminal of the other end by video calling transport module.
CN201610770495.5A 2016-08-30 2016-08-30 A kind of video call method and system of far field speech enhan-cement Active CN106303357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610770495.5A CN106303357B (en) 2016-08-30 2016-08-30 A kind of video call method and system of far field speech enhan-cement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610770495.5A CN106303357B (en) 2016-08-30 2016-08-30 A kind of video call method and system of far field speech enhan-cement

Publications (2)

Publication Number Publication Date
CN106303357A true CN106303357A (en) 2017-01-04
CN106303357B CN106303357B (en) 2019-11-08

Family

ID=57674409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610770495.5A Active CN106303357B (en) 2016-08-30 2016-08-30 A kind of video call method and system of far field speech enhan-cement

Country Status (1)

Country Link
CN (1) CN106303357B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481729A (en) * 2017-09-13 2017-12-15 百度在线网络技术(北京)有限公司 A kind of method and system that intelligent terminal is upgraded to far field speech-sound intelligent equipment
CN111556279A (en) * 2020-05-22 2020-08-18 腾讯科技(深圳)有限公司 Monitoring method and communication method of instant session
CN111988704A (en) * 2019-05-21 2020-11-24 北京小米移动软件有限公司 Sound signal processing method, device and storage medium
CN113053411A (en) * 2020-03-30 2021-06-29 深圳市优克联新技术有限公司 Voice data processing device, method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140093059A1 (en) * 2012-09-28 2014-04-03 International Business Machines Corporation Elimination of typing noise from conference calls
CN104012074A (en) * 2011-12-12 2014-08-27 华为技术有限公司 Smart audio and video capture systems for data processing systems
CN203799645U (en) * 2014-05-05 2014-08-27 辽宁工业大学 Microphone-array-based multichannel voice processing apparatus
CN104269178A (en) * 2014-08-08 2015-01-07 华迪计算机集团有限公司 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104012074A (en) * 2011-12-12 2014-08-27 华为技术有限公司 Smart audio and video capture systems for data processing systems
US20140093059A1 (en) * 2012-09-28 2014-04-03 International Business Machines Corporation Elimination of typing noise from conference calls
CN203799645U (en) * 2014-05-05 2014-08-27 辽宁工业大学 Microphone-array-based multichannel voice processing apparatus
CN104269178A (en) * 2014-08-08 2015-01-07 华迪计算机集团有限公司 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481729A (en) * 2017-09-13 2017-12-15 百度在线网络技术(北京)有限公司 A kind of method and system that intelligent terminal is upgraded to far field speech-sound intelligent equipment
CN111988704A (en) * 2019-05-21 2020-11-24 北京小米移动软件有限公司 Sound signal processing method, device and storage medium
CN111988704B (en) * 2019-05-21 2021-10-22 北京小米移动软件有限公司 Sound signal processing method, device and storage medium
CN113053411A (en) * 2020-03-30 2021-06-29 深圳市优克联新技术有限公司 Voice data processing device, method, system and storage medium
CN113053411B (en) * 2020-03-30 2024-01-16 深圳市优克联新技术有限公司 Voice data processing device, method, system and storage medium
CN111556279A (en) * 2020-05-22 2020-08-18 腾讯科技(深圳)有限公司 Monitoring method and communication method of instant session

Also Published As

Publication number Publication date
CN106303357B (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN108962240B (en) Voice control method and system based on earphone
CN106303357A (en) The video call method of a kind of far field speech enhan-cement and system
CN103337242B (en) A kind of sound control method and opertaing device
CN101277331B (en) Sound reproducing device and sound reproduction method
US8824666B2 (en) Noise cancellation for phone conversation
CN108140399A (en) Inhibit for the adaptive noise of ultra wide band music
CN109147784A (en) Voice interactive method, equipment and storage medium
CN107331402A (en) A kind of way of recording and sound pick-up outfit based on dual microphone
CN107005800A (en) Transmission, method of reseptance and the device of audio file, equipment and its system
CN105438900A (en) Mobile phone elevator taking system
CN103491488A (en) Echo cancellation method and device for microphone
US9812149B2 (en) Methods and systems for providing consistency in noise reduction during speech and non-speech periods
CN103458137A (en) Systems and methods for voice enhancement in audio conference
CN102624961A (en) Method and terminal for preventing sound crosstalk between external loudspeaker and microphone
CN109165004A (en) Double screen terminal audio frequency output method, terminal and computer readable storage medium
US10354673B2 (en) Noise reduction method and electronic device
CN103298143A (en) Method and system for achieving multi-party call and mobile terminal
CN107240396A (en) Speaker adaptation method, device, equipment and storage medium
CN101834923A (en) Method for controlling sound playing of mobile terminal and mobile terminal
CN201294570Y (en) Echo elimination device for teleconference system
CN103402038B (en) Under Mobile phone hand-free state, eliminate method and the device of the echo of the other side's receiver
CN201717913U (en) Mobile terminal
CN101924581A (en) Communication device
CN103327173A (en) Voice controlling method and device of mobile terminal
CN101360155A (en) Active noise silencing control system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 350000 building, No. 89, software Avenue, Gulou District, Fujian, Fuzhou 18, China

Patentee after: Ruixin Microelectronics Co., Ltd

Address before: 350000 building, No. 89, software Avenue, Gulou District, Fujian, Fuzhou 18, China

Patentee before: Fuzhou Rockchips Electronics Co.,Ltd.

CP01 Change in the name or title of a patent holder