EP3120534A2 - Interpretation system and method - Google Patents
Interpretation system and methodInfo
- Publication number
- EP3120534A2 EP3120534A2 EP15765582.0A EP15765582A EP3120534A2 EP 3120534 A2 EP3120534 A2 EP 3120534A2 EP 15765582 A EP15765582 A EP 15765582A EP 3120534 A2 EP3120534 A2 EP 3120534A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- sound interface
- sound
- participant
- interface
- interpretation system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B1/00—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
- H04B1/06—Receivers
- H04B1/16—Circuits
- H04B1/30—Circuits for homodyne or synchrodyne receivers
- H04B2001/305—Circuits for homodyne or synchrodyne receivers using dc offset compensation techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/20—Aspects of automatic or semi-automatic exchanges related to features of supplementary services
- H04M2203/2061—Language aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2242/00—Special services or facilities
- H04M2242/12—Language recognition, selection or translation arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/10—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic with switching of direction of transmission by voice frequency
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Definitions
- the present disclosure relates to an interpretation system with a first sound interface, for a first participant such as an interviewer, a second sound interface, for a second participant such as an interviewee, and a third sound interface for a third participant such as an interpreter.
- the system has a switching subsystem that can be switched between at least a first setting, where a voice signal generated at the first sound interface is connected primarily to the third sound interface and a voice signal generated at the third interface is connected primarily to the second sound interface, and a second setting, where a voice signal generated at the second sound interface is connected primarily to the third sound interface and a voice signal generated at the third sound interface is connected primarily to the first sound interface.
- the disclosure further relates to a corresponding method.
- EP-15451 1 1 -A Such a system is described in EP-15451 1 1 -A, which provides for bi- directional simultaneous interpretation services in connection with an interpretation assistance device.
- An interpreter may be in a remote location, and the users may generate switch signals by pressing buttons. The switch signals are detected by the system which directs sound to and from different users and the interpreter in such a way that unwanted sound is cancelled or attenuated.
- An object of the present invention is therefore to provide an improved system that is reliable and easy to use.
- an interpretation system of the initially men- tioned kind which is provided with a processing unit which is capable of detecting speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings.
- a processing unit which is capable of detecting speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings.
- the detection of beginning and termination of speech may be carried out by comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. This has shown to provide a reliable detection also in cases where there is background noise. Such detection may be carried out by detecting and removing a DC component from a voice signal resulting in an AC signal, rectifying and low-pass filtering the AC signal to obtain a detection signal, and-comparing a first order derivative of detection signal to a positive and a negative threshold.
- the interpretation may be adapted to switch between an idle state and at least a first active state corresponding to the first setting, in which the first participant is active, and a second active state corresponding to the second setting, wherein the second participant is active.
- the system may be adapted to remain in the first active state for a predetermined time after it is detected that the first participant stops talking, and may further be adapted to remain in the first active state for a predetermined time after it is detected that the third participant, interpreting the first participant, stops talking. This makes sure that the system does not switch in an undesired way because of the first participant e.g. allowing the interpreter to catch up in the interview process.
- the system may be adapted to provide a visual feedback signal in response to a switching of the switching subsystem, such as for instance changing the backlight colour of a display in the system.
- the present disclosure further relates to a corresponding method. That method generally involves steps corresponding to the measures carried out by the different features of the system, and the method may be varied in correspondence with the system.
- That method generally involves steps corresponding to the measures carried out by the different features of the system, and the method may be varied in correspondence with the system.
- Fig 1 illustrates a system overview.
- a switching device is in an interviewer-to-interviewee setting.
- Fig 2 shows the switching device in fig 1 in an interviewee-to-interviewer setting.
- Fig 3 shows a flow chart of a process for detection of speech.
- Figs 4a-4d show schematically waveforms corresponding to the first four steps of fig 3, and fig 4e illustrates an envelope, with a time frame larger than the waveforms in figs 4a-4d, where detection of speech takes place.
- Fig 5 shows a flow chart for a switching procedure.
- Fig 1 illustrates schematically an overview of an simultaneous interpretation system 1 according to the present disclosure.
- the system is intended to use in a situation where a first person, hereinafter called interviewer, talks to a second person, hereinafter called interviewee.
- interviewer a first person
- interviewee a second person
- This naming of the first and second persons is done to simplify the following disclosure does not limit the scope of the present disclosure.
- the interviewer and the interviewee may have totally symmetrical roles as simply persons talking to each other.
- the system may be used in situations such as police, customs and immigration investigations as well as healthcare procedures, and other procedures.
- the interviewer and the interviewee do not share a common language, or may at least not be capable of communicating in a common language with a sufficient quality to ensure, depending on the situation, for instance legal certainty or medical safety.
- the interviewer and the interviewee may be present in the same room, although this is not necessary.
- the interpreter may also be present, or may be available via a telephone line, a mobile telephone connection, a video conference system, or the like.
- the interpreter is present but placed e.g. in a neigboring room simply to maintain the interpreter's anonymity.
- the system may be capable of dealing with all such configurations by applying different settings, as will be discussed later. It should be noted that the interviewer or interviewee may be remote with regard to the system as well.
- the system may comprise a first 3, a second 5 and a third 7 sound interface, each providing a sound input 9, for feeding sound to a user loudspeaker or more likely headphones, and a sound output 1 1 providing an output from a user microphone.
- the system may further comprise a switching subsystem 13 that directs the flow of sound in a path that is appropriate in the current situation.
- a switching subsystem 13 that directs the flow of sound in a path that is appropriate in the current situation. For in- stance, if the interviewer speaks, his or her microphone signal is transferred to the interpreter's headphones, and the signal from the latter's microphone is transferred to the interviewee's headphones. This path is achieved with the connection pattern indicated with black filled dots in the switching subsystem of fig 1 . When the interviewer stops speaking and the interviewee begins to speak, this path is altered by the switching subsystem by changing the connection pattern as indicated with dashed rings, as will be discussed later.
- processor unit 15 which may be a central processing unit, CPU, a digital signal processor, DSP, a dedicated application specific integrated circuit, ASIC, or a collection of circuits, optionally comprising both analog and digital signal processing means, as will be discussed further later.
- system may include I/O processing means 17, a user interface 19, and additional storage means 21 , as will be discussed in more detail later.
- each sound interface may be provided with an amplifier 23 that the processor unit can adjust.
- the sound interfaces may be adaptable to the configuration currently used.
- the system may allow, in one configuration, the inteviewer and the interviewee to be connected directly to the system by means of a headset with earphones and a microphone, and to connect the interpreter via a video conference system or a fixed telephone line.
- all three parties may be connected directly to the system via a headset.
- Other configurations e.g. using cellphones may be considered, and it has also been considered to use more than three sound interfaces. The latter may be useful e.g. to allow having two interpreters interpreting via an intermediate language, or only interpreting in one direction, from a first to a second language.
- unbalanced microphones can be used, it may be preferred to use balanced microphones, e.g. using XLR connectors, to provide improved sound quality and lesser susceptibility to interference.
- TRS tip/ring/sleeve connectors
- phantom powering may be used which provides a power source if a condenser microphone is used.
- Balanced headphones may be used as well.
- Each sound interface may also be connected to an internal mobile telephone system to connect one of the interfaces to e.g. a GSM compliant cell phone, at least as an emergency solution.
- Other options are available for wireless connection of a sound interface to a headset or the like, such as a wireless LAN, Bluetooth, etc.
- the switching subsystem may be accomplished with different means.
- the switching subsystem may, as the skilled person understands, be realized with anything from a set of mechanical relays to a software routine executed in a processor as long as it is capable of switching between different connection patterns, that connect the microphone of one speaker to the headphones of another as necessary in the circumstances and as decided adaptively by the system.
- the system may be integrated in an IP (Internet Protocol) telephony system using session initiation protocols (SIP) and real-time transport (RTP) protocols.
- IP Internet Protocol
- SIP session initiation protocols
- RTP real-time transport
- the configuration indicated with black filled dots in the switching subsystem of fig 1 is used when the interviewer speaks.
- the microphone signal from the interviewer's sound interface 3 is connected by the switching subsystem to the input/headphone line of the interpreter's sound interface 7, such that the interpreter hears the interviewer's voice.
- the signal from the interpreter's microphone is similarly transferred to the interviewee's headphones by the switching system.
- the system may be set in a conference mode, where each participant hear the others and can speak to the others.
- the connections need not switch between on and off.
- the interviewer may, in the configuration indicated in fig 2 hear the voice of the interviewee, at a low volume, together with the voice of the interpreter, at a higher volume. This may, even though the interviewer and interviewee may not share improve the mutual understanding as the original speech, together with eye contact, body language, etc. can contribute with nuances and the like.
- the processor unit may, as mentioned earlier, be a CPU, a DSP or an application specific circuit. It should further be noted that the switching subunit, the amplifiers, and at least parts of the sound interfaces, etc. may be integrated with the processing unit. Although the illustrated schematic configuration may be realised, it is primarily an example useful for understanding the overall disclosure of the system.
- One way of triggering the switching from one configuration to another is to detect when one party, typically the interviewer or the interviewee begins to speak.
- An example of a method for accomplishing this speech detection is described with reference to the flow chart of fig 3 and the corresponding waveforms shown in figs 4a-4d.
- An analog voice signal is, very schematically shown in fig 4a. This signal has an AC component and a DC component 27. In a first step, the DC component is detected 25, and in a second step the DC component is removed 29, leaving only the AC component in the signal as illustrated in fig 4b. In a DSP this could be carried out with suitable subroutines, and in an analog system an operational amplifier or even a simple capacitor circuit may be used to remove the DC component directly.
- the signal is rectified 31 resulting in the waveform shown in fig 4c.
- This signal is in turn low-pass, LP, filtered 33 in a fourth step result- ing in the waveform of fig 4d.
- This resulting signal shows the instantaneous changes in voice signal amplitude, and in a fifth step there is carried out a detection, which determines 35 whether the first order derivative of the amplitude, ⁇ / ⁇ , exceeds a predetermined positive or negative threshold, cf. fig 4e. If a positive threshold is exceeded, it is determined that speech has begun, and if a negative threshold is exceeded it is determined that speech has ended. This to a great extent corresponds to comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. The system may react on this as will now be described.
- the disclosed features allows for automatically switching between interviewer and interviewee, and vice versa. This implies an improvement as a conversation can flow much more freely as compared to if a manual control, e.g. by the interviewer, would be used. Needless to say, it is possible to override this automatical switching and carry out such manual control if needed in a specific interview situation.
- the speech quality will be much improved, as one party (interviewer/interviewee) at a time talks. This is particularly useful if the conversation is recorded e.g. as evidence. In that case it may also be possible to analyse at a later stage how the interpretation affects e.g. questions raised and answers produced in order to achieve higher legal certainty.
- the system remains in one connection pattern, e.g. interviewee to interviewer as long as the interviewee speaks.
- the system may wait for a short waiting time and then switch to the reversed connection pattern in order to allow the interviewer to talk.
- the system may then produce optical and/or acoustic feedback to the users to indicate that switching has taken place and that the previously silent part can begin to talk. Different feedback features are discussed later.
- the system may remain in the first connection pattern until the interviewee is ready.
- This procedure can be summarized in an example flow chart as shown in fig 5.
- the system is continuously or at regular intervals tested whether interviewer speech is detected 39 or whether interviewee speech is detected 41 . If for instance interviewer speech is detected, the system switches 43 to an interviewer-translator-interviewee pattern as described before, and provides feedback via the user interface as will be discussed, such that the interviewer and interviewee become aware of the switching.
- the processor unit cf. 15 in fig 1
- the amplifiers cf. 23 in fig 1
- the system is thus in an interviewer-active state 45, where preferably any voice signals from the interviewee are shut down or at least substantially attenuated. If the interviewee attempts to talk, a feedback signal, e.g. optical or acoustic, may further be provided to the interviewee to inform the interviewee that he should wait. In the interviewer-active state, the interviewer may thus talk for as long as needed without being interrupted. In the interviewer-active state 45, it is regularly tested 47 whether the interviewer becomes inactive as discussed before. If the interviewer is inactive for a predetermined time period T, where T is typically in the range 0.5-5 s and preferably about 1 s, it is assumed that the interviewer has stopped talking.
- T is typically in the range 0.5-5 s and preferably about 1 s
- the interpreter may be the case that the interpreter lags a few seconds. It is therefore optionally also tested 49 whether the interpreter becomes inactive for a time period that may also be T s, even if this is not necessary. If this does not happen, it is assumed that the interviewer has begun talking again, and the system remains in the interviewer-active state 45. If the interpreter however is silent long enough, the system returns to the idle state 37, and this is indicated by the user interface, as a feedback to the participants.
- the system may operate in the same way if in the idle state 37, it is determined that the interviewee begins to talk, and the system enters an interviewee-active state 51 . In this way, an interview situation can be handled very smoothly, and can be readily dealt with by the interpreter.
- the user interface 19 may typically include a keyboard 53 a screen 55, such as an LCD screen and some indicator lamps 57.
- the keyboard 53 may be used to select different settings such as the above-described automatic switching or the previously mentioned conference mode. It can also be used to manually control switching if needed.
- Feedback to the users regarding in which state may be provided in different ways e.g. using the screen 55 or the indicator lamps 57.
- One efficient way of giving feedback is to use the screen's backlight colour. For instance, in the inter- viewer-active mode, the backlight may be red, while it is green in the idle mode. Other variations of course exist.
- a user interface may also be useful to choose the language e.g. the interviewee wishes to speak. For instance, a pressure sensitive screen may initially show a number of nations' flags, each representing a specific language. The interviewee may than tap a desired flag/language, and a suitable interpreter is connected to the system accordingly.
- the I/O subsystem 17 may connect the system to other functions. For instance, it is possible to provide additional feedback lights on each user's headset or the like to enhance the feedback function. Further connections to storage solutions such as a harddrive, etc. may be provided to store interview sound data produced during an interview. It is possible to store voice data in a number of separate channels.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE1450295A SE1450295A1 (en) | 2014-03-17 | 2014-03-17 | System and method of simultaneous interpretation |
PCT/SE2015/050284 WO2015142249A2 (en) | 2014-03-17 | 2015-03-13 | Interpretation system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3120534A2 true EP3120534A2 (en) | 2017-01-25 |
EP3120534A4 EP3120534A4 (en) | 2017-10-25 |
Family
ID=54145455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP15765582.0A Withdrawn EP3120534A4 (en) | 2014-03-17 | 2015-03-13 | Interpretation system and method |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3120534A4 (en) |
SE (1) | SE1450295A1 (en) |
WO (1) | WO2015142249A2 (en) |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5867574A (en) * | 1997-05-19 | 1999-02-02 | Lucent Technologies Inc. | Voice activity detection system and method |
US20030088622A1 (en) * | 2001-11-04 | 2003-05-08 | Jenq-Neng Hwang | Efficient and robust adaptive algorithm for silence detection in real-time conferencing |
CA2501002A1 (en) * | 2002-09-27 | 2004-04-08 | Ginganet Corporation | Telephone interpretation system |
AU2003266594B2 (en) * | 2002-09-27 | 2007-10-04 | Ginganet Corporation | Telephone interpretation aid device and telephone interpretation system using the same |
WO2004030328A1 (en) * | 2002-09-27 | 2004-04-08 | Ginganet Corporation | Video telephone interpretation system and video telephone interpretation method |
US7826805B2 (en) * | 2003-11-11 | 2010-11-02 | Matech, Inc. | Automatic-switching wireless communication device |
CN1937664B (en) * | 2006-09-30 | 2010-11-10 | 华为技术有限公司 | System and method for realizing multi-language conference |
US8041018B2 (en) * | 2007-12-03 | 2011-10-18 | Samuel Joseph Wald | System and method for establishing a conference in two or more different languages |
GB2469329A (en) * | 2009-04-09 | 2010-10-13 | Webinterpret Sas | Combining an interpreted voice signal with the original voice signal at a sound level lower than the original sound level before sending to the other user |
-
2014
- 2014-03-17 SE SE1450295A patent/SE1450295A1/en not_active Application Discontinuation
-
2015
- 2015-03-13 EP EP15765582.0A patent/EP3120534A4/en not_active Withdrawn
- 2015-03-13 WO PCT/SE2015/050284 patent/WO2015142249A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
SE1450295A1 (en) | 2015-09-18 |
WO2015142249A2 (en) | 2015-09-24 |
EP3120534A4 (en) | 2017-10-25 |
WO2015142249A3 (en) | 2015-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10553235B2 (en) | Transparent near-end user control over far-end speech enhancement processing | |
US9253303B2 (en) | Signal processing apparatus and storage medium | |
US10499136B2 (en) | Providing isolation from distractions | |
US10574804B2 (en) | Automatic volume control of a voice signal provided to a captioning communication service | |
US20190066710A1 (en) | Transparent near-end user control over far-end speech enhancement processing | |
EP2466885B1 (en) | Video muting | |
US20100184488A1 (en) | Sound signal adjuster adjusting the sound volume of a distal end voice signal responsively to proximal background noise | |
US9826085B2 (en) | Audio signal processing in a communication system | |
EP3430819A1 (en) | Earphones having separate microphones for binaural recordings and for telephoning | |
US9967813B1 (en) | Managing communication sessions with respect to multiple transport media | |
US20120140918A1 (en) | System and method for echo reduction in audio and video telecommunications over a network | |
US20180269842A1 (en) | Volume-dependent automatic gain control | |
WO2012175964A2 (en) | Multi-party teleconference methods and systems | |
US10483933B2 (en) | Amplification adjustment in communication devices | |
WO2015142249A2 (en) | Interpretation system and method | |
CN115348411A (en) | Method and system for processing remotely active voice during a call | |
US20090310520A1 (en) | Wideband telephone conference system interface | |
US20150201057A1 (en) | Method of processing telephone voice output and earphone | |
TWI639344B (en) | Sound collection equipment having function of answering incoming calls and control method of sound collection | |
DE3426815A1 (en) | Level adjustment for a telephone station with a hands-free facility | |
WO2019056300A1 (en) | Adjustment system and method for automatically switching audio mode during call | |
JP2010034815A (en) | Sound output device and communication system | |
US10264116B2 (en) | Virtual duplex operation | |
JPH11275243A (en) | Loud speaker type interphone system | |
KR20220111521A (en) | Ambient noise reduction system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20161011 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20170922 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04N 7/15 20060101ALI20170918BHEP Ipc: H04M 3/00 20060101ALI20170918BHEP Ipc: H04M 3/56 20060101AFI20170918BHEP Ipc: G06F 17/28 20060101ALI20170918BHEP Ipc: H04M 9/10 20060101ALI20170918BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20180421 |