EP1220202A1 - System and method for coding and decoding speaker-independent and speaker-dependent speech information - Google Patents
System and method for coding and decoding speaker-independent and speaker-dependent speech information Download PDFInfo
- Publication number
- EP1220202A1 EP1220202A1 EP00440335A EP00440335A EP1220202A1 EP 1220202 A1 EP1220202 A1 EP 1220202A1 EP 00440335 A EP00440335 A EP 00440335A EP 00440335 A EP00440335 A EP 00440335A EP 1220202 A1 EP1220202 A1 EP 1220202A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- speaker
- dependent
- signals
- independent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000001419 dependent effect Effects 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims description 19
- 230000036962 time dependent Effects 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 3
- 230000004044 response Effects 0.000 claims description 35
- 230000008878 coupling Effects 0.000 description 23
- 238000010168 coupling process Methods 0.000 description 23
- 238000005859 coupling reaction Methods 0.000 description 23
- 230000001755 vocal effect Effects 0.000 description 10
- 230000015654 memory Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 239000000835 fiber Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0018—Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
Definitions
- the invention relates to a system comprising a coder for coding a voice/speech signal into at least one coded signal and comprising a decoder for decoding at least one further signal.
- Such a system is known from EP 0 718 819, in which a first coder is used for coding vocally produced audio, like spoken words, singing and other vocal utterances, and in which a second coder is used for coding non-vocally produced audio, like music.
- Said known system further comprises a first decoder and a second decoder for decoding purposes.
- the at least one further signal may either correspond entirely, or partly, or not at all with the at least one coded signal.
- Such a system is disadvantageous, inter alia, due to voice/speech like said spoken words being coded and decoded in an inefficient way.
- the system according to the invention is characterised in that said system comprises a system processor system for processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and for processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal and for processing a first further signal and in response generating a speaker-independent signal and for processing a second further signal and in response generating a speaker-dependent signal.
- Said speaker-independent signal of said voice/speech signal corresponds for example with phoneme information like letters and parts of words which are coded into said first coded signal
- said a speaker-dependent signal of said voice/speech signal corresponds for example with prosody information like a user-volume and user-voice-frequencies which are coded into said second coded signal.
- Said first further signal for example corresponds (either entirely, or partly, or not at all) with said first coded signal and then is related to said phoneme information like letters and parts of words which are decoded into said speaker-independent signal
- said second further signal for example corresponds (either entirely, or partly, or not at all) with said prosody information like a user-volume and user-voice-frequencies which are decoded into said speaker-dependent signal.
- the invention is based on the insight, inter alia, that voice/speech coding/decoding processes comprise redundancies which can be removed without reducing the quality.
- the invention solves the problem, inter alia, of providing a more efficient system, by introducing the separately processing.
- EP 0 718 819 distinguishes between vocally produced audio, like spoken words, singing and other vocal utterances, and non-vocally produced audio, like music
- the system according to the invention distinguishes between speaker-independent signals withing voice/speech signals and speaker-dependent signals withing said voice/speech signals.
- the invention further relates to a coder for coding a voice/speech signal into at least one coded signal.
- the coder according to the invention is characterised in that said coder comprises a processor system for processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and for processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal.
- a first embodiment of the coder according to the invention is characterised in that said speaker-dependent signal comprises a time-independent part and a time-dependent part, with said processor system processing said time-independent part and in response generating a third coded signal and processing said time-dependent part and in response generating a fourth coded signal.
- Said third coded signal and said fourth coded signal together correspond with (at least a part of) said second coded signal.
- a second embodiment of the coder according to the invention is characterised in that said coder forms part of a distributed speech recognition system, with said processor system preprocessing said speaker-dependent signal.
- the invention yet further relates to a decoder for decoding at least one coded signal.
- the decoder according to the invention is characterised in that said decoder comprises a processor system for processing a first coded signal and in response generating a speaker-independent signal and for processing a second coded signal and in response generating a speaker-dependent signal.
- a first embodiment of the decoder according to the invention is characterised in that at least one of both speaker-independent signal and speaker-dependent signal is generated in dependence of the other one.
- the decoder By using a result (the speaker-independent signals or the speaker-dependent signals) of one of the decoding processes and/or by using an input signal (one of both coded signals) of one of the decoding processes for the other decoding process, the decoder has an increased efficiency.
- a second embodiment of the decoder according to the invention is characterised in that said decoder decodes a third coded signal and a fourth coded signal, with said processor system processing said third coded signal and in response generating a time-independent part of said speaker-dependent signal and processing said fourth coded signal and in response generating a time-dependent part of said speaker-dependent signal.
- a third embodiment of the decoder according to the invention is characterised in that said decoder forms part of a distributed speech recognition system, with said processor system final processing said second coded signal.
- the coding method according to the invention is characterised in that said method comprises the steps of processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and of processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal.
- the invention yet also relates to a decoding method for decoding at least one coded signal.
- the decoding method according to the invention is characterised in that said method comprises the steps of processing a first coded signal and in response generating a speaker-independent signal and of processing a second coded signal and in response generating a speaker-dependent signal.
- Embodiments of both methods according to the invention correspond with embodiments of the coder and/or decoder according to the invention.
- Embodiments of the system according to the invention correspond with embodiments of the coder and/or decoder according to the invention.
- EP 0 718 819 in which a first coder is used for coding vocally produced audio, like spoken words, singing and other vocal utterances, and in which a second coder is used for coding non-vocally produced audio, like music, does not disclose the separately processing, within its first coder for coding vocally produced audio, of speaker-independent signals and speaker-dependent signals both forming part of voice/speech signals, and does not disclose that said speaker-independent signals and speaker-dependent signals, within its decoder for decoding the coded vocally produced audio, are separately processed and separately decoded, possibly with at least one of the decoding processes being dependent upon the other one.
- US 5,012,518 discloses a low bit-rate speech coder
- US 5,388,181 discloses a digital audio compression system. Neither one of these documents discloses the system according to the invention, the coder according to the invention, the decoder according to the invention or the methods according to the invention.
- figure 1 discloses a system according to the invention comprising a coder according to the invention and a decoder according to the invention.
- Coder 1 according to the invention as shown in figure 1 comprises a voice/speech receiver 11 like a microphone for receiving voice/speech signals of which an output is coupled via a coupling 31 to an input of sampler 12, of which an output is coupled via a coupling 32 to an input of first analyser 13 for determining for example speaker-dependent frequency information of said voice/speech signals and to an input of second analyser 16 for determining for example speaker-dependent prosody information in said voice/speech signals and to an input of third analyser 17 for determining for example speaker-dependent volume information of said voice/speech signals and to a phoneme unit 18 for determining for example speaker-independent phoneme information in said voice/speech signals.
- a voice/speech receiver 11 like a microphone for receiving voice/speech signals of which an output is coupled via a coupling 31 to an input of sampler 12, of which an output is coupled via a coupling 32 to an input of first analyser 13 for determining for example speaker-dependent frequency information of said voice/speech
- An output of first analyser 13 is coupled via a coupling 33 to an input of a fourth analyser 14 for determining for example vocal tract information, of which an output is coupled via a coupling 34 to an input of a combining unit 15.
- An output of third analyser 17 is coupled via a coupling 36 to a further input of second analyser 16, of which an output is coupled via a coupling 35 to a further input of combining unit 15.
- An output of combining unit 15 is coupled via a coupling 38 to decoder 2
- an output of phoneme unit 18 is coupled via a coupling 37 to decoder 2.
- Decoder 2 according to the invention as shown in figure 1 comprises a first converting unit 23 for converting for example a combination of frequency information, prosody information and volume information into a first vocal signal, of which an input is coupled to a coupling 44 which is coupled to coupling 38, and of which a further input is coupled via a coupling 45 to an output of second converting unit 24 for converting for example speaker-independent phoneme information into a second vocal signal, of which an input is coupled to a coupling 43 which is coupled to coupling 37, and of which a further input is coupled to a coupling 44.
- An output of first converting unit 23 is coupled via a coupling 42 to an input of a third converting unit 22 for performing for example a digital/analog conversion, of which an output is coupled to an input of a voice/speech generator 21, like for example a loudspeaker.
- the coder 1 according to the invention and the decoder 2 according to the invention as shown in figure 1 function as follows.
- coder 1 forms part of a first terminal
- decoder 2 forms part of a second terminal.
- Voice/speech originating from a first user at said first terminal is received by receiver 11, and then sampled (and possibly coded) by sampler 12.
- first analyser 13 for determining (for example calculating) for example (speaker-dependent, time-independent) frequency information (like one or more basic frequencies) of said voice/speech signals and to second analyser 16 for determining (for example calculating) for example (speaker-dependent, time-dependent) prosody information (like intonations) in said voice/speech signals and to third analyser 17 for determining (calculating) for example (speaker-dependent, time-dependent) volume information (like amplitudes) of said voice/speech signals and to a phoneme unit for determining (calculating) for example (speaker-independent) phoneme information (like letters, parts of words) in said voice/speech signals.
- Fourth analyser 14 for determining (calculating) for example vocal tract information receives said frequency information directly (unamendedly) or indirectly (after being processed), and combining unit 15 combines the results coming from fourth analyser 14 and second analyser 16, which has used the result coming from third analyser 17.
- the output signal generated by combining unit 15 is supplied directly (unamendedly) or indirectly (after being processed) via (wired or wireless) couplings 38 and 44 (and for example one or more networks, switches, routers, bridges, mobile switching centers, basestations, etc.) to decoder 2 and corresponds with a speaker-dependent part of said voice/speech signals in coded form.
- the output signal generated by phoneme unit 18 is supplied directly (unamendedly) or indirectly (after being processed) via (wired or wireless) couplings 37 and 43 (and for example one or more networks, switches, routers, bridges, mobile switching centers, basestations, etc.) to decoder 2 and corresponds with a speaker-independent part of said voice/speech signals in coded form.
- Second converting unit 24 in decoder 2 receives both output signals directly (unamendedly) or indirectly (after being processed), and generates in response the second vocal signal which is supplied to first converting unit 23, which receives said second vocal signal as well as said output signal arrived via coupling 44, each one of both directly (unamendedly) or indirectly (after being processed), and generates the first vocal signal, which via said third converting unit 22 and said voice/speech generator 21 is converted into voice/speech signals destined for a second user at said second terminal.
- coder 1 and decoder 2 each comprise a processor and a memory both not shown and coupled to at least one of said receiver 11, sampler 12, analysers 13,14,16,17, phoneme unit 18, combining unit 15, converting units 22,23,24 and generator 21.
- coder and decoder intelligence is centralised.
- couplings 37-43 and 38-44 may be realised via different wires or one wire, different fibers or one fiber, different wireless links or one wireless link, and via different channels or one channel, different timeslots or same timeslots, different codes or same codes etc.
- said coder 1 is located in a terminal, and (at least a part of) said decoder 2 is located in a network, or vice versa.
- said coder 1 is located in a terminal
- said decoder 2 is located in a network, or vice versa.
- said coder 1 forms part of a distributed speech recognition (DSR) system, whereby in said coder 1 said speaker-dependent signal is preprocessed, and final processing is done generally in said network or exceptionally in said decoder 2.
- DSR distributed speech recognition
- This for example corresponds with at least a part of at least one function performed by at least one of said four analysers 13,14,16,17 is shifted generally into the network or exceptionally into the decoder 2, and/or at least a part of at least one function performed by at least one of said two converting units 23,24 is shifted into the network.
- coder 1 at least some of blocks 12,13,14,15, 16,17,18 represent functions performed by a processor system running one or more software programs thereby using a memory
- decoder 2 at least some of blocks 22,23,24 represent functions performed by a processor system running one or more software programs thereby using a memory.
- decoder 2 there is an option of amending and/or inserting speaker-dependent information, by for example providing converting unit 23 with a yet further input for receiving additional speaker-dependent information and/or for receiving amending information for amending said speaker-dependent information and/or by for example interrupting coupling 44 via for example an interruptor not shown (like a switch having a conductive and a non-conductive state) located either in coder 1 and/or in decoder 2.
- an interruptor not shown like a switch having a conductive and a non-conductive state
- the signals present at one or more of the couplings 37, 38, 43 and 44 are stored in one or more memories not shown and located outside coder 1 and/or decoder 2, to allow generation of these signals later in time, under control of a user, a terminal and/or a network(-unit).
- coder 1 and/or decoder 2 and/or a terminal and/or a network-unit are provided with for example a phoneme recogniser not shown and/or a memory not shown for verification purposes to allow verification of phoneme signals sent and/or received earlier, under control of a user, a terminal and/or a network(-unit), for example for checking Wallstreet orders and/or (trans)actions, by generating (for example unamendable) phoneme signals stored before. Therefore, a method of doing business (comprising a step of generating phoneme signals stored before, possibly via said recogniser) is not to be excluded.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Coders ans decoders which do not distinguish between
speaker-independent signals and speaker-dependent signals in
voice/speech signals communicate inefficiently. By separately coding
speaker-independent signals (for example phonemes) and speaker-dependent
signals (for example prosody, amplitude) in the coder into two streams
of coded signals, and transmitting both streams of coded signals
possibly in a multiplex way to the decoder, which separately decodes
both streams of coded signals into speaker-independent signals and
speaker-dependent signals, the entire system is more efficient.
Preferably, in the decoder, after having decoded one of both streams of
coded signals, the result is used for decoding the other stream.
Further, said speaker-dependent signals may be divided into
time-independent signals and time-dependent signals, for further
optimisation. Finally, said system can be a distributed speech
recognition system, with the coder performing the preprocessing
(prerecognition) and a network between coder and decoder performing the
final processing (final recognition).
Description
The invention relates to a system comprising a coder for coding a
voice/speech signal into at least one coded signal and comprising a decoder for
decoding at least one further signal.
Such a system is known from EP 0 718 819, in which a first coder is used
for coding vocally produced audio, like spoken words, singing and other vocal
utterances, and in which a second coder is used for coding non-vocally produced
audio, like music. Said known system further comprises a first decoder and a
second decoder for decoding purposes. The at least one further signal may either
correspond entirely, or partly, or not at all with the at least one coded signal.
Such a system is disadvantageous, inter alia, due to voice/speech like said
spoken words being coded and decoded in an inefficient way.
It is an object of the invention, inter alia, to provide a system described in
the preamble, which is more efficient.
Thereto, the system according to the invention is characterised in that said
system comprises a system processor system for processing a speaker-independent
signal of said voice/speech signal and in response generating a first
coded signal and for processing a speaker-dependent signal of said
voice/speech signal and in response generating a second coded signal and for
processing a first further signal and in response generating a speaker-independent
signal and for processing a second further signal and in response
generating a speaker-dependent signal.
Said speaker-independent signal of said voice/speech signal
corresponds for example with phoneme information like letters and parts of
words which are coded into said first coded signal, and said a speaker-dependent
signal of said voice/speech signal corresponds for example with
prosody information like a user-volume and user-voice-frequencies which
are coded into said second coded signal. Said first further signal for
example corresponds (either entirely, or partly, or not at all) with said first
coded signal and then is related to said phoneme information like letters
and parts of words which are decoded into said speaker-independent
signal, and said second further signal for example corresponds (either
entirely, or partly, or not at all) with said prosody information like a user-volume
and user-voice-frequencies which are decoded into said speaker-dependent
signal.
By introducing the separately processing of speaker-independent
signals and speaker-dependent signals, the efficiency of the system is
increased.
The invention is based on the insight, inter alia, that voice/speech
coding/decoding processes comprise redundancies which can be removed
without reducing the quality.
The invention solves the problem, inter alia, of providing a more
efficient system, by introducing the separately processing.
Where EP 0 718 819 distinguishes between vocally produced audio, like
spoken words, singing and other vocal utterances, and non-vocally produced
audio, like music, the system according to the invention distinguishes between
speaker-independent signals withing voice/speech signals and speaker-dependent
signals withing said voice/speech signals.
The invention further relates to a coder for coding a voice/speech signal
into at least one coded signal.
The coder according to the invention is characterised in that said coder
comprises a processor system for processing a speaker-independent signal of
said voice/speech signal and in response generating a first coded signal and for
processing a speaker-dependent signal of said voice/speech signal and in
response generating a second coded signal.
A first embodiment of the coder according to the invention is characterised
in that said speaker-dependent signal comprises a time-independent part and a
time-dependent part, with said processor system processing said time-independent
part and in response generating a third coded signal and
processing said time-dependent part and in response generating a fourth coded
signal.
By introducing, with respect to the speaker-dependent signals, the
separately processing of time-independent and time-dependent signals, the
efficiency is further increased. Said third coded signal and said fourth coded
signal together correspond with (at least a part of) said second coded signal.
A second embodiment of the coder according to the invention is
characterised in that said coder forms part of a distributed speech recognition
system, with said processor system preprocessing said speaker-dependent signal.
By locating this coder in a distributed speech recognition system, said
system becomes more efficient.
The invention yet further relates to a decoder for decoding at least one
coded signal.
The decoder according to the invention is characterised in that said
decoder comprises a processor system for processing a first coded signal and in
response generating a speaker-independent signal and for processing a second
coded signal and in response generating a speaker-dependent signal.
A first embodiment of the decoder according to the invention is
characterised in that at least one of both speaker-independent signal and
speaker-dependent signal is generated in dependence of the other one.
By using a result (the speaker-independent signals or the speaker-dependent
signals) of one of the decoding processes and/or by using an input
signal (one of both coded signals) of one of the decoding processes for the other
decoding process, the decoder has an increased efficiency.
A second embodiment of the decoder according to the invention is
characterised in that said decoder decodes a third coded signal and a fourth
coded signal, with said processor system processing said third coded signal and
in response generating a time-independent part of said speaker-dependent
signal and processing said fourth coded signal and in response generating a
time-dependent part of said speaker-dependent signal.
A third embodiment of the decoder according to the invention is
characterised in that said decoder forms part of a distributed speech recognition
system, with said processor system final processing said second coded signal.
The invention also relates to a coding method for coding a voice/speech
signal into at least one coded signal.
The coding method according to the invention is characterised in that said
method comprises the steps of processing a speaker-independent signal of said
voice/speech signal and in response generating a first coded signal and of
processing a speaker-dependent signal of said voice/speech signal and in
response generating a second coded signal.
The invention yet also relates to a decoding method for decoding at least
one coded signal.
The decoding method according to the invention is characterised in that
said method comprises the steps of processing a first coded signal and in
response generating a speaker-independent signal and of processing a second
coded signal and in response generating a speaker-dependent signal.
Embodiments of both methods according to the invention correspond with
embodiments of the coder and/or decoder according to the invention.
Embodiments of the system according to the invention correspond with
embodiments of the coder and/or decoder according to the invention.
EP 0 718 819, in which a first coder is used for coding vocally produced
audio, like spoken words, singing and other vocal utterances, and in which a
second coder is used for coding non-vocally produced audio, like music, does
not disclose the separately processing, within its first coder for coding vocally
produced audio, of speaker-independent signals and speaker-dependent signals
both forming part of voice/speech signals, and does not disclose that said
speaker-independent signals and speaker-dependent signals, within its decoder
for decoding the coded vocally produced audio, are separately processed and
separately decoded, possibly with at least one of the decoding processes being
dependent upon the other one. US 5,012,518 discloses a low bit-rate speech
coder, and US 5,388,181 discloses a digital audio compression system. Neither
one of these documents discloses the system according to the invention, the
coder according to the invention, the decoder according to the invention or the
methods according to the invention.
All references, including references cited with respect to these references,
are considered to be incorporated.
The invention will be further explained at the hand of an embodiment
described with respect to a drawing, whereby
figure 1 discloses a system according to the invention comprising a coder according to the invention and a decoder according to the invention.
figure 1 discloses a system according to the invention comprising a coder according to the invention and a decoder according to the invention.
The coder 1 according to the invention and the decoder 2 according to the
invention as shown in figure 1 function as follows.
According to a first embodiment, coder 1 forms part of a first terminal,
and decoder 2 forms part of a second terminal. Voice/speech originating from a
first user at said first terminal is received by receiver 11, and then sampled (and
possibly coded) by sampler 12. The result is supplied to first analyser 13 for
determining (for example calculating) for example (speaker-dependent, time-independent)
frequency information (like one or more basic frequencies) of said
voice/speech signals and to second analyser 16 for determining (for example
calculating) for example (speaker-dependent, time-dependent) prosody
information (like intonations) in said voice/speech signals and to third analyser
17 for determining (calculating) for example (speaker-dependent, time-dependent)
volume information (like amplitudes) of said voice/speech signals
and to a phoneme unit for determining (calculating) for example (speaker-independent)
phoneme information (like letters, parts of words) in said
voice/speech signals. Fourth analyser 14 for determining (calculating) for
example vocal tract information receives said frequency information directly
(unamendedly) or indirectly (after being processed), and combining unit 15
combines the results coming from fourth analyser 14 and second analyser 16,
which has used the result coming from third analyser 17. The output signal
generated by combining unit 15 is supplied directly (unamendedly) or indirectly
(after being processed) via (wired or wireless) couplings 38 and 44 (and for
example one or more networks, switches, routers, bridges, mobile switching
centers, basestations, etc.) to decoder 2 and corresponds with a speaker-dependent
part of said voice/speech signals in coded form. The output signal
generated by phoneme unit 18 is supplied directly (unamendedly) or indirectly
(after being processed) via (wired or wireless) couplings 37 and 43 (and for
example one or more networks, switches, routers, bridges, mobile switching
centers, basestations, etc.) to decoder 2 and corresponds with a speaker-independent
part of said voice/speech signals in coded form.
Second converting unit 24 in decoder 2 receives both output signals
directly (unamendedly) or indirectly (after being processed), and generates in
response the second vocal signal which is supplied to first converting unit 23,
which receives said second vocal signal as well as said output signal arrived via
coupling 44, each one of both directly (unamendedly) or indirectly (after being
processed), and generates the first vocal signal, which via said third converting
unit 22 and said voice/speech generator 21 is converted into voice/speech
signals destined for a second user at said second terminal.
As a result of using said coder 1 and decoder 2, the highest
coding/decoding quality has been combined with the highest transmission
efficiency (lowest bit rates - lowest capacity needed).
According to a first alternative to said first embodiment, in coder 1
couplings 37 and 38 are combined, for example multiplexed by a multiplexer not
shown, in which case decoder 2 will comprise a demultiplexer for demultiplexing
etc.
According to a second alternative to said first embodiment, coder 1 and
decoder 2 each comprise a processor and a memory both not shown and
coupled to at least one of said receiver 11, sampler 12, analysers 13,14,16,17,
phoneme unit 18, combining unit 15, converting units 22,23,24 and generator
21. As a result, in said coder and decoder, intelligence is centralised.
Therefore, couplings 37-43 and 38-44 may be realised via different wires
or one wire, different fibers or one fiber, different wireless links or one wireless
link, and via different channels or one channel, different timeslots or same
timeslots, different codes or same codes etc.
According to a second embodiment, (at least a part of) said coder 1 is
located in a terminal, and (at least a part of) said decoder 2 is located in a
network, or vice versa. This will for example be necessary in case old fashioned
terminals not comprising these coders/decoders and novel hightech terminals
comprising these coders/decoders are both used, whereby in the network for
each possibly communication via an old fashioned terminal a coder/decoder
needs to be available.
According to a third embodiment, said coder 1 forms part of a distributed
speech recognition (DSR) system, whereby in said coder 1 said speaker-dependent
signal is preprocessed, and final processing is done generally in said
network or exceptionally in said decoder 2. This for example corresponds with at
least a part of at least one function performed by at least one of said four
analysers 13,14,16,17 is shifted generally into the network or exceptionally into
the decoder 2, and/or at least a part of at least one function performed by at
least one of said two converting units 23,24 is shifted into the network.
According to a fourth embodiment, in coder 1 at least some of blocks
12,13,14,15, 16,17,18 represent functions performed by a processor system
running one or more software programs thereby using a memory, and in
decoder 2 at least some of blocks 22,23,24 represent functions performed by a
processor system running one or more software programs thereby using a
memory.
According to a fifth embodiment, in decoder 2 there is an option of
amending and/or inserting speaker-dependent information, by for example
providing converting unit 23 with a yet further input for receiving additional
speaker-dependent information and/or for receiving amending information for
amending said speaker-dependent information and/or by for example
interrupting coupling 44 via for example an interruptor not shown (like a switch
having a conductive and a non-conductive state) located either in coder 1 and/or
in decoder 2. As a result, a user may select the voice speaking to said user.
According to a first alternative to said fifth embodiment, coder 1 and/or
decoder 2 are provided with one or more memories not shown for storing for
example the signals present at one or more of the couplings 37, 38, 43 and 44,
to allow generation of these signals later in time, under control of a user, a
terminal and/or a network(-unit).
According to a second alternative to said fifth embodiment, for example
the signals present at one or more of the couplings 37, 38, 43 and 44, are
stored in one or more memories not shown and located outside coder 1 and/or
decoder 2, to allow generation of these signals later in time, under control of a
user, a terminal and/or a network(-unit).
According to a third alternative to said fifth embodiment, coder 1 and/or
decoder 2 and/or a terminal and/or a network-unit are provided with for
example a phoneme recogniser not shown and/or a memory not shown for
verification purposes to allow verification of phoneme signals sent and/or
received earlier, under control of a user, a terminal and/or a network(-unit), for
example for checking Wallstreet orders and/or (trans)actions, by generating (for
example unamendable) phoneme signals stored before. Therefore, a method of
doing business (comprising a step of generating phoneme signals stored before,
possibly via said recogniser) is not to be excluded.
All embodiments are just embodiments and do not exclude other
embodiments not shown and/or described. All alternatives are just alternatives
and do not exclude other alternatives not shown and/or described. Each (part of
an) embodiment and/or each (part of an) alternative can be combined with any
other (part of an) embodiment and/or any other (part of an) alternative. Terms
like "in response to K" and "in dependence of L" and "for doing M" do not exclude
that there could be a further "in response to N" and a further "in dependence of
O" and a further "for doing P" etc.
Claims (10)
- System comprising a coder for coding a voice/speech signal into at least one coded signal and comprising a decoder for decoding at least one further signal, characterised in that said system comprises a system processor system for processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and for processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal and for processing a first further signal and in response generating a speaker-independent signal and for processing a second further signal and in response generating a speaker-dependent signal.
- Coder for coding a voice/speech signal into at least one coded signal, characterised in that said coder comprises a processor system for processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and for processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal.
- Coder according to claim 2, characterised in that said speaker-dependent signal comprises a time-independent part and a time-dependent part, with said processor system processing said time-independent part and in response generating a third coded signal and processing said time-dependent part and in response generating a fourth coded signal.
- Coder according to claim 2 or 3, characterised in that said coder forms part of a distributed speech recognition system, with said processor system preprocessing said speaker-dependent signal.
- Decoder for decoding at least one coded signal, characterised in that said decoder comprises a processor system for processing a first coded signal and in response generating a speaker-independent signal and for processing a second coded signal and in response generating a speaker-dependent signal.
- Decoder according to claim 5, characterised in that at least one of both speaker-independent signal and speaker-dependent signal is generated in dependence of the other one.
- Decoder according to claim 5 or 6, characterised in that said decoder decodes a third coded signal and a fourth coded signal, with said processor system processing said third coded signal and in response generating a time-independent part of said speaker-dependent signal and processing said fourth coded signal and in response generating a time-dependent part of said speaker-dependent signal.
- Decoder according to claim 5, 6 or 7, characterised in that said decoder forms part of a distributed speech recognition system, with said processor system final processing said second coded signal.
- Coding method for coding a voice/speech signal into at least one coded signal, characterised in that said method comprises the steps of processing a speaker-independent signal of said voice/speech signal and in response generating a first coded signal and of processing a speaker-dependent signal of said voice/speech signal and in response generating a second coded signal.
- Decoding method for decoding at least one coded signal, characterised in that said method comprises the steps of processing a first coded signal and in response generating a speaker-independent signal and of processing a second coded signal and in response generating a speaker-dependent signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00440335A EP1220202A1 (en) | 2000-12-29 | 2000-12-29 | System and method for coding and decoding speaker-independent and speaker-dependent speech information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00440335A EP1220202A1 (en) | 2000-12-29 | 2000-12-29 | System and method for coding and decoding speaker-independent and speaker-dependent speech information |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1220202A1 true EP1220202A1 (en) | 2002-07-03 |
Family
ID=8174197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00440335A Withdrawn EP1220202A1 (en) | 2000-12-29 | 2000-12-29 | System and method for coding and decoding speaker-independent and speaker-dependent speech information |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1220202A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0423800A2 (en) * | 1989-10-19 | 1991-04-24 | Matsushita Electric Industrial Co., Ltd. | Speech recognition system |
US6073094A (en) * | 1998-06-02 | 2000-06-06 | Motorola | Voice compression by phoneme recognition and communication of phoneme indexes and voice features |
US6119086A (en) * | 1998-04-28 | 2000-09-12 | International Business Machines Corporation | Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens |
-
2000
- 2000-12-29 EP EP00440335A patent/EP1220202A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0423800A2 (en) * | 1989-10-19 | 1991-04-24 | Matsushita Electric Industrial Co., Ltd. | Speech recognition system |
US6119086A (en) * | 1998-04-28 | 2000-09-12 | International Business Machines Corporation | Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens |
US6073094A (en) * | 1998-06-02 | 2000-06-06 | Motorola | Voice compression by phoneme recognition and communication of phoneme indexes and voice features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4518714B2 (en) | Speech code conversion method | |
US5995923A (en) | Method and apparatus for improving the voice quality of tandemed vocoders | |
US20030220783A1 (en) | Efficiency improvements in scalable audio coding | |
JP2000174909A (en) | Conference terminal controller | |
JPH10260692A (en) | Method and system for recognition synthesis encoding and decoding of speech | |
KR100351484B1 (en) | Speech coding apparatus and speech decoding apparatus | |
US6304845B1 (en) | Method of transmitting voice data | |
US5680512A (en) | Personalized low bit rate audio encoder and decoder using special libraries | |
US5666350A (en) | Apparatus and method for coding excitation parameters in a very low bit rate voice messaging system | |
EP1020848A2 (en) | Method for transmitting auxiliary information in a vocoder stream | |
CN101981872A (en) | Systems, methods and apparatus for transmitting data over a voice channel of a wireless telephone network | |
EP1220202A1 (en) | System and method for coding and decoding speaker-independent and speaker-dependent speech information | |
WO2001059765A1 (en) | Rate determination coding | |
EP1159738B1 (en) | Speech synthesizer based on variable rate speech coding | |
CN1331340C (en) | Sound code cut-over method and device and sound communication terminal | |
JP3487158B2 (en) | Audio coding transmission system | |
US6094628A (en) | Method and apparatus for transmitting user-customized high-quality, low-bit-rate speech | |
US6044147A (en) | Telecommunications system | |
JP4985743B2 (en) | Speech code conversion method | |
JP4597360B2 (en) | Speech decoding apparatus and speech decoding method | |
EP2285025A1 (en) | Method and apparatus for coding/decoding a stereo audio signal into a mono audio signal | |
JP2010034630A (en) | Sound transmission system | |
JPH04258037A (en) | Vocoder | |
AU711562B2 (en) | Telecommunications system | |
JP2002084518A (en) | Method and device for communicating information based on object-selecting system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
AKX | Designation fees paid |
Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20021204 |