EP3120534A2 - Interpretation system and method - Google Patents

Interpretation system and method

Info

Publication number
EP3120534A2
EP3120534A2 EP15765582.0A EP15765582A EP3120534A2 EP 3120534 A2 EP3120534 A2 EP 3120534A2 EP 15765582 A EP15765582 A EP 15765582A EP 3120534 A2 EP3120534 A2 EP 3120534A2
Authority
EP
European Patent Office
Prior art keywords
sound interface
sound
participant
interface
interpretation system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP15765582.0A
Other languages
German (de)
French (fr)
Other versions
EP3120534A4 (en
Inventor
Pär STIHL
Martin HAMMARSTRÖM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Simultanex AB
Original Assignee
Simultanex AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Simultanex AB filed Critical Simultanex AB
Publication of EP3120534A2 publication Critical patent/EP3120534A2/en
Publication of EP3120534A4 publication Critical patent/EP3120534A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/06Receivers
    • H04B1/16Circuits
    • H04B1/30Circuits for homodyne or synchrodyne receivers
    • H04B2001/305Circuits for homodyne or synchrodyne receivers using dc offset compensation techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/20Aspects of automatic or semi-automatic exchanges related to features of supplementary services
    • H04M2203/2061Language aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2242/00Special services or facilities
    • H04M2242/12Language recognition, selection or translation arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/10Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic with switching of direction of transmission by voice frequency
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present disclosure relates to an interpretation system with a first sound interface, for a first participant such as an interviewer, a second sound interface, for a second participant such as an interviewee, and a third sound interface for a third participant such as an interpreter.
  • the system has a switching subsystem that can be switched between at least a first setting, where a voice signal generated at the first sound interface is connected primarily to the third sound interface and a voice signal generated at the third interface is connected primarily to the second sound interface, and a second setting, where a voice signal generated at the second sound interface is connected primarily to the third sound interface and a voice signal generated at the third sound interface is connected primarily to the first sound interface.
  • the disclosure further relates to a corresponding method.
  • EP-15451 1 1 -A Such a system is described in EP-15451 1 1 -A, which provides for bi- directional simultaneous interpretation services in connection with an interpretation assistance device.
  • An interpreter may be in a remote location, and the users may generate switch signals by pressing buttons. The switch signals are detected by the system which directs sound to and from different users and the interpreter in such a way that unwanted sound is cancelled or attenuated.
  • An object of the present invention is therefore to provide an improved system that is reliable and easy to use.
  • an interpretation system of the initially men- tioned kind which is provided with a processing unit which is capable of detecting speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings.
  • a processing unit which is capable of detecting speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings.
  • the detection of beginning and termination of speech may be carried out by comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. This has shown to provide a reliable detection also in cases where there is background noise. Such detection may be carried out by detecting and removing a DC component from a voice signal resulting in an AC signal, rectifying and low-pass filtering the AC signal to obtain a detection signal, and-comparing a first order derivative of detection signal to a positive and a negative threshold.
  • the interpretation may be adapted to switch between an idle state and at least a first active state corresponding to the first setting, in which the first participant is active, and a second active state corresponding to the second setting, wherein the second participant is active.
  • the system may be adapted to remain in the first active state for a predetermined time after it is detected that the first participant stops talking, and may further be adapted to remain in the first active state for a predetermined time after it is detected that the third participant, interpreting the first participant, stops talking. This makes sure that the system does not switch in an undesired way because of the first participant e.g. allowing the interpreter to catch up in the interview process.
  • the system may be adapted to provide a visual feedback signal in response to a switching of the switching subsystem, such as for instance changing the backlight colour of a display in the system.
  • the present disclosure further relates to a corresponding method. That method generally involves steps corresponding to the measures carried out by the different features of the system, and the method may be varied in correspondence with the system.
  • That method generally involves steps corresponding to the measures carried out by the different features of the system, and the method may be varied in correspondence with the system.
  • Fig 1 illustrates a system overview.
  • a switching device is in an interviewer-to-interviewee setting.
  • Fig 2 shows the switching device in fig 1 in an interviewee-to-interviewer setting.
  • Fig 3 shows a flow chart of a process for detection of speech.
  • Figs 4a-4d show schematically waveforms corresponding to the first four steps of fig 3, and fig 4e illustrates an envelope, with a time frame larger than the waveforms in figs 4a-4d, where detection of speech takes place.
  • Fig 5 shows a flow chart for a switching procedure.
  • Fig 1 illustrates schematically an overview of an simultaneous interpretation system 1 according to the present disclosure.
  • the system is intended to use in a situation where a first person, hereinafter called interviewer, talks to a second person, hereinafter called interviewee.
  • interviewer a first person
  • interviewee a second person
  • This naming of the first and second persons is done to simplify the following disclosure does not limit the scope of the present disclosure.
  • the interviewer and the interviewee may have totally symmetrical roles as simply persons talking to each other.
  • the system may be used in situations such as police, customs and immigration investigations as well as healthcare procedures, and other procedures.
  • the interviewer and the interviewee do not share a common language, or may at least not be capable of communicating in a common language with a sufficient quality to ensure, depending on the situation, for instance legal certainty or medical safety.
  • the interviewer and the interviewee may be present in the same room, although this is not necessary.
  • the interpreter may also be present, or may be available via a telephone line, a mobile telephone connection, a video conference system, or the like.
  • the interpreter is present but placed e.g. in a neigboring room simply to maintain the interpreter's anonymity.
  • the system may be capable of dealing with all such configurations by applying different settings, as will be discussed later. It should be noted that the interviewer or interviewee may be remote with regard to the system as well.
  • the system may comprise a first 3, a second 5 and a third 7 sound interface, each providing a sound input 9, for feeding sound to a user loudspeaker or more likely headphones, and a sound output 1 1 providing an output from a user microphone.
  • the system may further comprise a switching subsystem 13 that directs the flow of sound in a path that is appropriate in the current situation.
  • a switching subsystem 13 that directs the flow of sound in a path that is appropriate in the current situation. For in- stance, if the interviewer speaks, his or her microphone signal is transferred to the interpreter's headphones, and the signal from the latter's microphone is transferred to the interviewee's headphones. This path is achieved with the connection pattern indicated with black filled dots in the switching subsystem of fig 1 . When the interviewer stops speaking and the interviewee begins to speak, this path is altered by the switching subsystem by changing the connection pattern as indicated with dashed rings, as will be discussed later.
  • processor unit 15 which may be a central processing unit, CPU, a digital signal processor, DSP, a dedicated application specific integrated circuit, ASIC, or a collection of circuits, optionally comprising both analog and digital signal processing means, as will be discussed further later.
  • system may include I/O processing means 17, a user interface 19, and additional storage means 21 , as will be discussed in more detail later.
  • each sound interface may be provided with an amplifier 23 that the processor unit can adjust.
  • the sound interfaces may be adaptable to the configuration currently used.
  • the system may allow, in one configuration, the inteviewer and the interviewee to be connected directly to the system by means of a headset with earphones and a microphone, and to connect the interpreter via a video conference system or a fixed telephone line.
  • all three parties may be connected directly to the system via a headset.
  • Other configurations e.g. using cellphones may be considered, and it has also been considered to use more than three sound interfaces. The latter may be useful e.g. to allow having two interpreters interpreting via an intermediate language, or only interpreting in one direction, from a first to a second language.
  • unbalanced microphones can be used, it may be preferred to use balanced microphones, e.g. using XLR connectors, to provide improved sound quality and lesser susceptibility to interference.
  • TRS tip/ring/sleeve connectors
  • phantom powering may be used which provides a power source if a condenser microphone is used.
  • Balanced headphones may be used as well.
  • Each sound interface may also be connected to an internal mobile telephone system to connect one of the interfaces to e.g. a GSM compliant cell phone, at least as an emergency solution.
  • Other options are available for wireless connection of a sound interface to a headset or the like, such as a wireless LAN, Bluetooth, etc.
  • the switching subsystem may be accomplished with different means.
  • the switching subsystem may, as the skilled person understands, be realized with anything from a set of mechanical relays to a software routine executed in a processor as long as it is capable of switching between different connection patterns, that connect the microphone of one speaker to the headphones of another as necessary in the circumstances and as decided adaptively by the system.
  • the system may be integrated in an IP (Internet Protocol) telephony system using session initiation protocols (SIP) and real-time transport (RTP) protocols.
  • IP Internet Protocol
  • SIP session initiation protocols
  • RTP real-time transport
  • the configuration indicated with black filled dots in the switching subsystem of fig 1 is used when the interviewer speaks.
  • the microphone signal from the interviewer's sound interface 3 is connected by the switching subsystem to the input/headphone line of the interpreter's sound interface 7, such that the interpreter hears the interviewer's voice.
  • the signal from the interpreter's microphone is similarly transferred to the interviewee's headphones by the switching system.
  • the system may be set in a conference mode, where each participant hear the others and can speak to the others.
  • the connections need not switch between on and off.
  • the interviewer may, in the configuration indicated in fig 2 hear the voice of the interviewee, at a low volume, together with the voice of the interpreter, at a higher volume. This may, even though the interviewer and interviewee may not share improve the mutual understanding as the original speech, together with eye contact, body language, etc. can contribute with nuances and the like.
  • the processor unit may, as mentioned earlier, be a CPU, a DSP or an application specific circuit. It should further be noted that the switching subunit, the amplifiers, and at least parts of the sound interfaces, etc. may be integrated with the processing unit. Although the illustrated schematic configuration may be realised, it is primarily an example useful for understanding the overall disclosure of the system.
  • One way of triggering the switching from one configuration to another is to detect when one party, typically the interviewer or the interviewee begins to speak.
  • An example of a method for accomplishing this speech detection is described with reference to the flow chart of fig 3 and the corresponding waveforms shown in figs 4a-4d.
  • An analog voice signal is, very schematically shown in fig 4a. This signal has an AC component and a DC component 27. In a first step, the DC component is detected 25, and in a second step the DC component is removed 29, leaving only the AC component in the signal as illustrated in fig 4b. In a DSP this could be carried out with suitable subroutines, and in an analog system an operational amplifier or even a simple capacitor circuit may be used to remove the DC component directly.
  • the signal is rectified 31 resulting in the waveform shown in fig 4c.
  • This signal is in turn low-pass, LP, filtered 33 in a fourth step result- ing in the waveform of fig 4d.
  • This resulting signal shows the instantaneous changes in voice signal amplitude, and in a fifth step there is carried out a detection, which determines 35 whether the first order derivative of the amplitude, ⁇ / ⁇ , exceeds a predetermined positive or negative threshold, cf. fig 4e. If a positive threshold is exceeded, it is determined that speech has begun, and if a negative threshold is exceeded it is determined that speech has ended. This to a great extent corresponds to comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. The system may react on this as will now be described.
  • the disclosed features allows for automatically switching between interviewer and interviewee, and vice versa. This implies an improvement as a conversation can flow much more freely as compared to if a manual control, e.g. by the interviewer, would be used. Needless to say, it is possible to override this automatical switching and carry out such manual control if needed in a specific interview situation.
  • the speech quality will be much improved, as one party (interviewer/interviewee) at a time talks. This is particularly useful if the conversation is recorded e.g. as evidence. In that case it may also be possible to analyse at a later stage how the interpretation affects e.g. questions raised and answers produced in order to achieve higher legal certainty.
  • the system remains in one connection pattern, e.g. interviewee to interviewer as long as the interviewee speaks.
  • the system may wait for a short waiting time and then switch to the reversed connection pattern in order to allow the interviewer to talk.
  • the system may then produce optical and/or acoustic feedback to the users to indicate that switching has taken place and that the previously silent part can begin to talk. Different feedback features are discussed later.
  • the system may remain in the first connection pattern until the interviewee is ready.
  • This procedure can be summarized in an example flow chart as shown in fig 5.
  • the system is continuously or at regular intervals tested whether interviewer speech is detected 39 or whether interviewee speech is detected 41 . If for instance interviewer speech is detected, the system switches 43 to an interviewer-translator-interviewee pattern as described before, and provides feedback via the user interface as will be discussed, such that the interviewer and interviewee become aware of the switching.
  • the processor unit cf. 15 in fig 1
  • the amplifiers cf. 23 in fig 1
  • the system is thus in an interviewer-active state 45, where preferably any voice signals from the interviewee are shut down or at least substantially attenuated. If the interviewee attempts to talk, a feedback signal, e.g. optical or acoustic, may further be provided to the interviewee to inform the interviewee that he should wait. In the interviewer-active state, the interviewer may thus talk for as long as needed without being interrupted. In the interviewer-active state 45, it is regularly tested 47 whether the interviewer becomes inactive as discussed before. If the interviewer is inactive for a predetermined time period T, where T is typically in the range 0.5-5 s and preferably about 1 s, it is assumed that the interviewer has stopped talking.
  • T is typically in the range 0.5-5 s and preferably about 1 s
  • the interpreter may be the case that the interpreter lags a few seconds. It is therefore optionally also tested 49 whether the interpreter becomes inactive for a time period that may also be T s, even if this is not necessary. If this does not happen, it is assumed that the interviewer has begun talking again, and the system remains in the interviewer-active state 45. If the interpreter however is silent long enough, the system returns to the idle state 37, and this is indicated by the user interface, as a feedback to the participants.
  • the system may operate in the same way if in the idle state 37, it is determined that the interviewee begins to talk, and the system enters an interviewee-active state 51 . In this way, an interview situation can be handled very smoothly, and can be readily dealt with by the interpreter.
  • the user interface 19 may typically include a keyboard 53 a screen 55, such as an LCD screen and some indicator lamps 57.
  • the keyboard 53 may be used to select different settings such as the above-described automatic switching or the previously mentioned conference mode. It can also be used to manually control switching if needed.
  • Feedback to the users regarding in which state may be provided in different ways e.g. using the screen 55 or the indicator lamps 57.
  • One efficient way of giving feedback is to use the screen's backlight colour. For instance, in the inter- viewer-active mode, the backlight may be red, while it is green in the idle mode. Other variations of course exist.
  • a user interface may also be useful to choose the language e.g. the interviewee wishes to speak. For instance, a pressure sensitive screen may initially show a number of nations' flags, each representing a specific language. The interviewee may than tap a desired flag/language, and a suitable interpreter is connected to the system accordingly.
  • the I/O subsystem 17 may connect the system to other functions. For instance, it is possible to provide additional feedback lights on each user's headset or the like to enhance the feedback function. Further connections to storage solutions such as a harddrive, etc. may be provided to store interview sound data produced during an interview. It is possible to store voice data in a number of separate channels.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present disclosure relates to an interpretation system comprising a first sound interface (3), for instance for an interviewer, a second sound interface (5), for instance for an interviewer, and a third sound interface (7) for an interpreter. The interpretation system comprises a switching subsystem (13) which can be switched between a first setting, interviewer-interpreter-interviewee, and a second setting, interviewee-interpreter-interviewer. The system comprises a processing unit (15) which is devised to detect speech originating from the first and second sound interfaces, and to automatically control the switching subsystem depending on this detection.

Description

INTERPRETATION SYSTEM AND METHOD
Technical field
The present disclosure relates to an interpretation system with a first sound interface, for a first participant such as an interviewer, a second sound interface, for a second participant such as an interviewee, and a third sound interface for a third participant such as an interpreter. The system has a switching subsystem that can be switched between at least a first setting, where a voice signal generated at the first sound interface is connected primarily to the third sound interface and a voice signal generated at the third interface is connected primarily to the second sound interface, and a second setting, where a voice signal generated at the second sound interface is connected primarily to the third sound interface and a voice signal generated at the third sound interface is connected primarily to the first sound interface.
The disclosure further relates to a corresponding method.
Background
Such a system is described in EP-15451 1 1 -A, which provides for bi- directional simultaneous interpretation services in connection with an interpretation assistance device. An interpreter may be in a remote location, and the users may generate switch signals by pressing buttons. The switch signals are detected by the system which directs sound to and from different users and the interpreter in such a way that unwanted sound is cancelled or attenuated.
One problem associated with such systems is how to make them user friendly to provide smooth and clear interview sessions. Summary
An object of the present invention is therefore to provide an improved system that is reliable and easy to use.
This object is achieved in an interpretation system of the initially men- tioned kind which is provided with a processing unit which is capable of detecting speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings. This means that the system can operate automatically and adapt to the interview situation. The participants, particularly the interviewee who may be inexperienced, need not control the system manually, and a clear and undisturbed interview may nevertheless be produced and optionally recorded. The interpreter also obtains a better working situation as he or she may concentrate on translating in one direction at the time in an orderly manner. The interpreter need not be concerned with controlling the system.
The detection of beginning and termination of speech may be carried out by comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. This has shown to provide a reliable detection also in cases where there is background noise. Such detection may be carried out by detecting and removing a DC component from a voice signal resulting in an AC signal, rectifying and low-pass filtering the AC signal to obtain a detection signal, and-comparing a first order derivative of detection signal to a positive and a negative threshold.
The interpretation may be adapted to switch between an idle state and at least a first active state corresponding to the first setting, in which the first participant is active, and a second active state corresponding to the second setting, wherein the second participant is active. This means that the system adapts naturally to an interview situation. The system may be adapted to remain in the first active state for a predetermined time after it is detected that the first participant stops talking, and may further be adapted to remain in the first active state for a predetermined time after it is detected that the third participant, interpreting the first participant, stops talking. This makes sure that the system does not switch in an undesired way because of the first participant e.g. allowing the interpreter to catch up in the interview process.
It is possible to gradually adjust the gain of an amplifier of at least one of the sound interfaces in response to a switching of the switching subsystem. This avoids disturbing clicks during switching.
The system may be adapted to provide a visual feedback signal in response to a switching of the switching subsystem, such as for instance changing the backlight colour of a display in the system.
The present disclosure further relates to a corresponding method. That method generally involves steps corresponding to the measures carried out by the different features of the system, and the method may be varied in correspondence with the system. Brief description of the drawings
Fig 1 illustrates a system overview. A switching device is in an interviewer-to-interviewee setting.
Fig 2 shows the switching device in fig 1 in an interviewee-to-interviewer setting.
Fig 3 shows a flow chart of a process for detection of speech.
Figs 4a-4d show schematically waveforms corresponding to the first four steps of fig 3, and fig 4e illustrates an envelope, with a time frame larger than the waveforms in figs 4a-4d, where detection of speech takes place.
Fig 5 shows a flow chart for a switching procedure.
Detailed description
Fig 1 illustrates schematically an overview of an simultaneous interpretation system 1 according to the present disclosure. The system is intended to use in a situation where a first person, hereinafter called interviewer, talks to a second person, hereinafter called interviewee. This naming of the first and second persons is done to simplify the following disclosure does not limit the scope of the present disclosure. In fact, the interviewer and the interviewee may have totally symmetrical roles as simply persons talking to each other.
Typically, the system may be used in situations such as police, customs and immigration investigations as well as healthcare procedures, and other procedures.
The interviewer and the interviewee do not share a common language, or may at least not be capable of communicating in a common language with a sufficient quality to ensure, depending on the situation, for instance legal certainty or medical safety.
Usually, the interviewer and the interviewee may be present in the same room, although this is not necessary. The interpreter may also be present, or may be available via a telephone line, a mobile telephone connection, a video conference system, or the like. In another example, the interpreter is present but placed e.g. in a neigboring room simply to maintain the interpreter's anonymity. The system may be capable of dealing with all such configurations by applying different settings, as will be discussed later. It should be noted that the interviewer or interviewee may be remote with regard to the system as well.
In summary, and as an example, the system may comprise a first 3, a second 5 and a third 7 sound interface, each providing a sound input 9, for feeding sound to a user loudspeaker or more likely headphones, and a sound output 1 1 providing an output from a user microphone.
The system may further comprise a switching subsystem 13 that directs the flow of sound in a path that is appropriate in the current situation. For in- stance, if the interviewer speaks, his or her microphone signal is transferred to the interpreter's headphones, and the signal from the latter's microphone is transferred to the interviewee's headphones. This path is achieved with the connection pattern indicated with black filled dots in the switching subsystem of fig 1 . When the interviewer stops speaking and the interviewee begins to speak, this path is altered by the switching subsystem by changing the connection pattern as indicated with dashed rings, as will be discussed later. The operation of the system is controlled by a processor unit 15, which may be a central processing unit, CPU, a digital signal processor, DSP, a dedicated application specific integrated circuit, ASIC, or a collection of circuits, optionally comprising both analog and digital signal processing means, as will be discussed further later.
Additionally, the system may include I/O processing means 17, a user interface 19, and additional storage means 21 , as will be discussed in more detail later.
In order to achieve good sound quality, the input and output of each sound interface may be provided with an amplifier 23 that the processor unit can adjust.
Sound interface
The sound interfaces may be adaptable to the configuration currently used. For instance, the system may allow, in one configuration, the inteviewer and the interviewee to be connected directly to the system by means of a headset with earphones and a microphone, and to connect the interpreter via a video conference system or a fixed telephone line. In another configuration, all three parties may be connected directly to the system via a headset. Other configurations, e.g. using cellphones may be considered, and it has also been considered to use more than three sound interfaces. The latter may be useful e.g. to allow having two interpreters interpreting via an intermediate language, or only interpreting in one direction, from a first to a second language.
While unbalanced microphones can be used, it may be preferred to use balanced microphones, e.g. using XLR connectors, to provide improved sound quality and lesser susceptibility to interference. TRS (tip/ring/sleeve) connectors may be used as well. Further, phantom powering may be used which provides a power source if a condenser microphone is used. Balanced headphones may be used as well.
Other standardized line in/out connectors may be used to connect the sound interface to a videoconference system. Each sound interface may also be connected to an internal mobile telephone system to connect one of the interfaces to e.g. a GSM compliant cell phone, at least as an emergency solution. Other options are available for wireless connection of a sound interface to a headset or the like, such as a wireless LAN, Bluetooth, etc.
Regardless of which solution is used to connect the sound interfaces to interviewers, interviewees and interpreters, it may be useful to allow the processor unit to control the amplitude of the incoming and outgoing signals of each interface, which may be done by means of controlling each line's amplifier, as will be discussed later.
Switching subsystem
The switching subsystem may be accomplished with different means. First, it should be noted that conveying both digital and analog sound signals have been considered. While employing electronics well known for decades, analog signal transmission may be considered, as the interpretation system 1 may be used in an environment with low interference and may be rather compact. Further, analog systems can sometimes provide superior sound quality, thanks to absense of quantization noise, etc.
Needless to say, corresponding entirely digital systems may be employed as well. In fact, the switching subsystem may, as the skilled person understands, be realized with anything from a set of mechanical relays to a software routine executed in a processor as long as it is capable of switching between different connection patterns, that connect the microphone of one speaker to the headphones of another as necessary in the circumstances and as decided adaptively by the system. The system may be integrated in an IP (Internet Protocol) telephony system using session initiation protocols (SIP) and real-time transport (RTP) protocols.
As mentioned, the configuration indicated with black filled dots in the switching subsystem of fig 1 is used when the interviewer speaks. The microphone signal from the interviewer's sound interface 3 is connected by the switching subsystem to the input/headphone line of the interpreter's sound interface 7, such that the interpreter hears the interviewer's voice. The signal from the interpreter's microphone is similarly transferred to the interviewee's headphones by the switching system.
When the interviewer stops speaking and the interviewee begins to speak, the path reverses; from the interviewee to the interpreter to the interviewer by changing the connection pattern as indicated with dashed rings, and as indicated in fig 2.
Other configurations are also possible. For instance, the system may be set in a conference mode, where each participant hear the others and can speak to the others. Also, even if indicated as such in fig 1 , the connections need not switch between on and off. For instance, the interviewer may, in the configuration indicated in fig 2 hear the voice of the interviewee, at a low volume, together with the voice of the interpreter, at a higher volume. This may, even though the interviewer and interviewee may not share improve the mutual understanding as the original speech, together with eye contact, body language, etc. can contribute with nuances and the like.
Processor unit
The processor unit may, as mentioned earlier, be a CPU, a DSP or an application specific circuit. It should further be noted that the switching subunit, the amplifiers, and at least parts of the sound interfaces, etc. may be integrated with the processing unit. Although the illustrated schematic configuration may be realised, it is primarily an example useful for understanding the overall disclosure of the system.
Speech detection
One way of triggering the switching from one configuration to another is to detect when one party, typically the interviewer or the interviewee begins to speak. An example of a method for accomplishing this speech detection is described with reference to the flow chart of fig 3 and the corresponding waveforms shown in figs 4a-4d. An analog voice signal is, very schematically shown in fig 4a. This signal has an AC component and a DC component 27. In a first step, the DC component is detected 25, and in a second step the DC component is removed 29, leaving only the AC component in the signal as illustrated in fig 4b. In a DSP this could be carried out with suitable subroutines, and in an analog system an operational amplifier or even a simple capacitor circuit may be used to remove the DC component directly.
In a third step, the signal is rectified 31 resulting in the waveform shown in fig 4c. This signal is in turn low-pass, LP, filtered 33 in a fourth step result- ing in the waveform of fig 4d. This resulting signal shows the instantaneous changes in voice signal amplitude, and in a fifth step there is carried out a detection, which determines 35 whether the first order derivative of the amplitude, ΔΑ/Δί, exceeds a predetermined positive or negative threshold, cf. fig 4e. If a positive threshold is exceeded, it is determined that speech has begun, and if a negative threshold is exceeded it is determined that speech has ended. This to a great extent corresponds to comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively. The system may react on this as will now be described.
Switching procedure
The disclosed features allows for automatically switching between interviewer and interviewee, and vice versa. This implies an improvement as a conversation can flow much more freely as compared to if a manual control, e.g. by the interviewer, would be used. Needless to say, it is possible to override this automatical switching and carry out such manual control if needed in a specific interview situation.
Further, as compared to regular conference calls, the speech quality will be much improved, as one party (interviewer/interviewee) at a time talks. This is particularly useful if the conversation is recorded e.g. as evidence. In that case it may also be possible to analyse at a later stage how the interpretation affects e.g. questions raised and answers produced in order to achieve higher legal certainty.
The system remains in one connection pattern, e.g. interviewee to interviewer as long as the interviewee speaks.
When the system detects, for instance as described above, that the interviewee stops talking, the system may wait for a short waiting time and then switch to the reversed connection pattern in order to allow the interviewer to talk. The system may then produce optical and/or acoustic feedback to the users to indicate that switching has taken place and that the previously silent part can begin to talk. Different feedback features are discussed later.
If, on the other hand, the interviewee resumes talking during the waiting time, the system may remain in the first connection pattern until the interviewee is ready.
This procedure can be summarized in an example flow chart as shown in fig 5. Starting out from a state where the system is idle 37, it is continuously or at regular intervals tested whether interviewer speech is detected 39 or whether interviewee speech is detected 41 . If for instance interviewer speech is detected, the system switches 43 to an interviewer-translator-interviewee pattern as described before, and provides feedback via the user interface as will be discussed, such that the interviewer and interviewee become aware of the switching. As the processor unit (cf. 15 in fig 1 ) may be able to control the amplifiers (cf. 23 in fig 1 ) for the sound inputs and outputs for each interface, it is possible to make the switching smooth by allowing the amplifier gains to ramp up and down rather than just switching on and off. This means that uncomfortable and disturbing clicking in the switching transitions can be avoided.
The system is thus in an interviewer-active state 45, where preferably any voice signals from the interviewee are shut down or at least substantially attenuated. If the interviewee attempts to talk, a feedback signal, e.g. optical or acoustic, may further be provided to the interviewee to inform the interviewee that he should wait. In the interviewer-active state, the interviewer may thus talk for as long as needed without being interrupted. In the interviewer-active state 45, it is regularly tested 47 whether the interviewer becomes inactive as discussed before. If the interviewer is inactive for a predetermined time period T, where T is typically in the range 0.5-5 s and preferably about 1 s, it is assumed that the interviewer has stopped talking. Howevever, it may be the case that the interpreter lags a few seconds. It is therefore optionally also tested 49 whether the interpreter becomes inactive for a time period that may also be T s, even if this is not necessary. If this does not happen, it is assumed that the interviewer has begun talking again, and the system remains in the interviewer-active state 45. If the interpreter however is silent long enough, the system returns to the idle state 37, and this is indicated by the user interface, as a feedback to the participants.
As illustrated in fig 5, the system may operate in the same way if in the idle state 37, it is determined that the interviewee begins to talk, and the system enters an interviewee-active state 51 . In this way, an interview situation can be handled very smoothly, and can be readily dealt with by the interpreter. User interface
Again with reference to fig 1 , the user interface 19, may typically include a keyboard 53 a screen 55, such as an LCD screen and some indicator lamps 57. The keyboard 53 may be used to select different settings such as the above-described automatic switching or the previously mentioned conference mode. It can also be used to manually control switching if needed.
Feedback to the users regarding in which state (e.g. interviewer-active or interviewee-active as described above) may be provided in different ways e.g. using the screen 55 or the indicator lamps 57. One efficient way of giving feedback is to use the screen's backlight colour. For instance, in the inter- viewer-active mode, the backlight may be red, while it is green in the idle mode. Other variations of course exist. A user interface may also be useful to choose the language e.g. the interviewee wishes to speak. For instance, a pressure sensitive screen may initially show a number of nations' flags, each representing a specific language. The interviewee may than tap a desired flag/language, and a suitable interpreter is connected to the system accordingly.
I/O subsystem and memory
The I/O subsystem 17 may connect the system to other functions. For instance, it is possible to provide additional feedback lights on each user's headset or the like to enhance the feedback function. Further connections to storage solutions such as a harddrive, etc. may be provided to store interview sound data produced during an interview. It is possible to store voice data in a number of separate channels.
Additionally it is possible to provide local storage, such as indicated with an SD card.
The present disclosure is not limited by the examples given above, and may be varied in different ways within the scope of the appended claims.

Claims

1 . An interpretation system comprising a first sound interface (3), for a first participant such as an interviewer, a second sound interface (5), for a second participant such as an interviewee, and a third sound interface (7) for a third participant such as an interpreter, wherein the interpretation system comprises a switching subsystem (13) that can be switched between at least a first setting (45), where a voice signal generated at the first sound interface (3) is connected primarily to the third sound interface (7) and a voice signal generated at the third interface is connected primarily to the second sound interface (5), and a second setting (51 ), where a voice signal generated at the second sound interface (5) is connected primarily to the third sound interface (7) and a voice signal generated at the third sound interface (7) is connected primarily to the first sound interface (3), characterised by a processing unit (15) which is devised to detect speech originating from the first and second sound interfaces, and to control the switching subsystem depending on this detection, such that the system switches between the first and second settings.
2. An interpretation system according to claim 1 , which is adapted to detect beginning and termination of speech by comparing a parameter corresponding to the first order derivative of the RMS of the AC component in a voice signal to a positive and a negative threshold, respectively.
3. An interpretation system according to claim 2, which is adapted to detect beginning and termination of speech by:
-detecting and removing a DC component from a voice signal resulting in an AC signal,
-rectifying and low-pass filtering the AC signal to obtain a detection signal, and -comparing a first order derivative of the detection signal to a positive and a negative threshold.
4. An interpretation system according to any of the preceding claims, which is adapted to switch between an idle state and at least a first active state corresponding to the first setting, in which the first participant is active, and a second active state corresponding to the second setting, wherein the second participant is active.
5. An interpretation system according to claim 4, which is adapted to remain in the first active state for a predetermined time after it is detected that the first participant stops talking.
6. An interpretation system according to claim 5, which is further adapted to remain in the first active state for a predetermined time after it is detected that the third participant, interpreting the first participant, stops talking.
7. An interpretation system according to any of the preceding claims, which is adapted to gradually adjust the gain of an amplifier of at least one of the sound interfaces in response to a switching of the switching subsystem.
8. An interpretation system according to any of the preceding claims, which is adapted to provide a visual feedback signal in response to a switching of the switching subsystem.
9. An interpretation system according to claim 8, wherein the visual feedback signal includes changing the backlight colour of a display.
10. A method for controlling an interpretation system, the system comprising a first sound interface (3), for a first participant such as an interviewer, a second sound interface (5), for a second participant such as an interviewee, and a third sound interface (7) for a third participant such as an interpreter, wherein the interpretation system comprises a switching subsystem (13) that can be switched between at least a first setting (45), where a voice signal generated at the first sound interface (3) is connected primarily to the third sound interface (7) and a voice signal generated at the third interface is connected primarily to the second sound interface (5), and a second setting (51 ), where a voice signal generated at the second sound interface (5) is connected primarily to the third sound interface (7) and a voice signal generated at the third sound interface (7) is connected primarily to the first sound interface (3), characterised by detecting speech originating from the first and second sound interfaces, and to controlling the switching subsystem depending on this detection, such that the system switches between the first and second settings.
EP15765582.0A 2014-03-17 2015-03-13 Interpretation system and method Withdrawn EP3120534A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE1450295A SE1450295A1 (en) 2014-03-17 2014-03-17 System and method of simultaneous interpretation
PCT/SE2015/050284 WO2015142249A2 (en) 2014-03-17 2015-03-13 Interpretation system and method

Publications (2)

Publication Number Publication Date
EP3120534A2 true EP3120534A2 (en) 2017-01-25
EP3120534A4 EP3120534A4 (en) 2017-10-25

Family

ID=54145455

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15765582.0A Withdrawn EP3120534A4 (en) 2014-03-17 2015-03-13 Interpretation system and method

Country Status (3)

Country Link
EP (1) EP3120534A4 (en)
SE (1) SE1450295A1 (en)
WO (1) WO2015142249A2 (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867574A (en) * 1997-05-19 1999-02-02 Lucent Technologies Inc. Voice activity detection system and method
US20030088622A1 (en) * 2001-11-04 2003-05-08 Jenq-Neng Hwang Efficient and robust adaptive algorithm for silence detection in real-time conferencing
CA2501002A1 (en) * 2002-09-27 2004-04-08 Ginganet Corporation Telephone interpretation system
AU2003266594B2 (en) * 2002-09-27 2007-10-04 Ginganet Corporation Telephone interpretation aid device and telephone interpretation system using the same
WO2004030328A1 (en) * 2002-09-27 2004-04-08 Ginganet Corporation Video telephone interpretation system and video telephone interpretation method
US7826805B2 (en) * 2003-11-11 2010-11-02 Matech, Inc. Automatic-switching wireless communication device
CN1937664B (en) * 2006-09-30 2010-11-10 华为技术有限公司 System and method for realizing multi-language conference
US8041018B2 (en) * 2007-12-03 2011-10-18 Samuel Joseph Wald System and method for establishing a conference in two or more different languages
GB2469329A (en) * 2009-04-09 2010-10-13 Webinterpret Sas Combining an interpreted voice signal with the original voice signal at a sound level lower than the original sound level before sending to the other user

Also Published As

Publication number Publication date
SE1450295A1 (en) 2015-09-18
WO2015142249A2 (en) 2015-09-24
EP3120534A4 (en) 2017-10-25
WO2015142249A3 (en) 2015-11-12

Similar Documents

Publication Publication Date Title
US10553235B2 (en) Transparent near-end user control over far-end speech enhancement processing
US9253303B2 (en) Signal processing apparatus and storage medium
US10499136B2 (en) Providing isolation from distractions
US10574804B2 (en) Automatic volume control of a voice signal provided to a captioning communication service
US20190066710A1 (en) Transparent near-end user control over far-end speech enhancement processing
EP2466885B1 (en) Video muting
US20100184488A1 (en) Sound signal adjuster adjusting the sound volume of a distal end voice signal responsively to proximal background noise
US9826085B2 (en) Audio signal processing in a communication system
EP3430819A1 (en) Earphones having separate microphones for binaural recordings and for telephoning
US9967813B1 (en) Managing communication sessions with respect to multiple transport media
US20120140918A1 (en) System and method for echo reduction in audio and video telecommunications over a network
US20180269842A1 (en) Volume-dependent automatic gain control
WO2012175964A2 (en) Multi-party teleconference methods and systems
US10483933B2 (en) Amplification adjustment in communication devices
WO2015142249A2 (en) Interpretation system and method
CN115348411A (en) Method and system for processing remotely active voice during a call
US20090310520A1 (en) Wideband telephone conference system interface
US20150201057A1 (en) Method of processing telephone voice output and earphone
TWI639344B (en) Sound collection equipment having function of answering incoming calls and control method of sound collection
DE3426815A1 (en) Level adjustment for a telephone station with a hands-free facility
WO2019056300A1 (en) Adjustment system and method for automatically switching audio mode during call
JP2010034815A (en) Sound output device and communication system
US10264116B2 (en) Virtual duplex operation
JPH11275243A (en) Loud speaker type interphone system
KR20220111521A (en) Ambient noise reduction system and method

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20161011

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20170922

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 7/15 20060101ALI20170918BHEP

Ipc: H04M 3/00 20060101ALI20170918BHEP

Ipc: H04M 3/56 20060101AFI20170918BHEP

Ipc: G06F 17/28 20060101ALI20170918BHEP

Ipc: H04M 9/10 20060101ALI20170918BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20180421