EP2261900A1 - Method and apparatus for modifying the playback rate of audio-video signals - Google Patents

Method and apparatus for modifying the playback rate of audio-video signals Download PDF

Info

Publication number
EP2261900A1
EP2261900A1 EP10165115A EP10165115A EP2261900A1 EP 2261900 A1 EP2261900 A1 EP 2261900A1 EP 10165115 A EP10165115 A EP 10165115A EP 10165115 A EP10165115 A EP 10165115A EP 2261900 A1 EP2261900 A1 EP 2261900A1
Authority
EP
European Patent Office
Prior art keywords
audio
signal
video
playback rate
modifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10165115A
Other languages
German (de)
French (fr)
Inventor
Andrea Trucco
Luca Racca
Matteo Racca
Michele Ricchetti
Stefania Repetto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linear Srl
Original Assignee
Linear Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to DK10165111.5T priority Critical patent/DK2227042T3/en
Application filed by Linear Srl filed Critical Linear Srl
Publication of EP2261900A1 publication Critical patent/EP2261900A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion

Definitions

  • the present invention relates to a method allowing the playback rate of audio and video signals to be modified and to an apparatus for performing such method according to the preamble of claim 1.
  • a method of this type is known in WO2005/045830 , where the playback rate of the audio-video signal is modified according to a parameter that can be set.
  • the method and the apparatus of the present invention can be a great advantage in studying foreign languages allowing students to have a better comprehension when watching original language movies.
  • a further alternative and advantageous use of the method and apparatus of the present invention is to slow down the time flow of audio or audio-video information in order to facilitate the comprehension of very disturbed audio or audio-video messages: the fact that it is possible to change the rate of the input signal without substantially changing the timbre makes it possible to overcome problems in the comprehension of a disturbed message, which comprehension by the known method is still more difficult due to a change in the frequencies, causing the audio or audio-video signal to be harmfully changed.
  • the present invention can be used also as an instrument useful for facilitating the transcription as written text and possible translations of vocal contents recorded or stated in real-time.
  • the user both a human person and in the form of an automatic writing program from audio files can set the rate between a maximum and a minimum keeping the frequency properties of the voice unchanged.
  • such effect influences also the fact that word scanning and the pronunciation are substantially unchanged and well comprehensible, which does not occur with standard methods changing the playback rate of audio signals.
  • samples/second value and “samples/second” will be alternately used to mean the unit of measure of the playback rate of the audio signal, as well as the term “frame rate” will be used to mean the unit of measure of the playback rate of the video signal.
  • Speech as a rule contain pauses made by the speaker for example due to pauses among sentences, expressive interruptions, or silence moments corresponding to breathing.
  • Such pauses can occur with different time percentages, substantially ranging from 10% to 30%.
  • the invention relates to a method as described in the preamble of claim 1, which method provides also that the playback rate of the audio signal is set by defining audio signal components related to pauses, that is without voice emission, and related to speech, that is with voice emission, a different playback rate being defined for said two components of the audio signal, and said different playback rate being applied also in the processing of the different playback rate of video signal components which coincide with the audio signal components related to pauses and to speech.
  • the method of the present invention provides the following steps:
  • the aim of the method of the present invention is also to underline pauses increasing their time length.
  • the "useful" pause can be defined as the temporary drop-off in the amplitude of the voice signal under 6% of the peak, for a time period of about 20-50 milliseconds.
  • a pause lasting more than 200 milliseconds can be defined as a silence probably corresponding to a lack of conversation/signal by the speaker.
  • Structural pauses short interruptions among syllabes
  • grammatical pauses interruptions among sentences or used for emphasizing words
  • Steps for defining the modified playback rate of video and audio signals and synchronization will be described below in more details with reference to possible variants. All such modes provide audio-video signals having a modified playback rate and which can be reproduced without having distorsions nor losses of information and transmission of the signal.
  • the method and apparatus of the present invention mainly, but not exclusively, operate with digital signals: in the case of analog sources the method and the apparatus of the present invention provide a variant embodiment using a usually used digital sampler receiving an analog audio-video input signal and generating a digital audio-video signal having a specific samples/second value for the audio portion and a specific frame rate value for the video portion.
  • a check guaranteeing the synchronization between the audio portion and the video portion can be provided after processing means.
  • An example provides to use time sequences of sets of synchronization bits, made of at least a pair of bits, one after the other according to a time sequence provided with predetermined and fixed arrangement intervals, each bit of the same set being univocally associated to only one sequence of the signal portion whose playback rate has to be modified, while bits of the same set are further univocally correlated one another, the time sequence of the synchronization bits being checked to be the same before and after the processing and the synchronization is considered as being maintained when said check results in synchronization bits, upstream and downstram of the signal processing, following the same time sequence.
  • the check of the synchronization of the two audio and video portions can also act for generating a copy of the input signal which is loaded into the working memory such that it can be processed again if the output signal will not show a synchronization between said portions of said input signal.
  • samples/second values related to the audio portion of the input signal are calculated and compared.
  • the method of the present invention allows also the playback rate of audio-video signals to be modified in real-time.
  • the input signal is processed and retransmitted without delays.
  • the input signal is loaded within the working memory in order to manage delays due to the difference in the playback rate between the input signal and the output signal, such not to have losses of information when reconstructing the signal.
  • the provision of a real-time reconstruction and the lack of asynchrony between the audio portion and the video portion of the signal are characteristics that are kept also in the case the input signal contains other information.
  • subtitles help both hearing impaired people and people studying foreign languages in understanding audio video sources, and consequently, the synchronization among audio, video and subtitles is a very important aspect increasing the utility and improving the operation of the method of the present invention.
  • the method of the present invention as a preferred embodiment for modifying the playback rate of the audio portion related to the input signal provides the audio portion of the input signal to be divided into windows having a constant time width representing the minimum signal portion that will be processed for modifying the playback rate and provides said windows to be overlapped on the audio portion of the output signal for a time amount depending on the time width of the selected windows and on the value of the modification of the execution/playback rate of the audio-video signal.
  • the method of the present invention preferably uses algorithms of the algorithm family called SOLA for modifying only the audio portion of audio video signals, more precise details being available in documents US 5,717,818 , US 5,175,769 and " an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech-IEEE Proceedings of ICASSP-93, vol.II, p. 554-557, 1993 .
  • SOLA waveform similarity
  • the method of the present invention in its preferred embodiment uses the parameter of the video frame rate for modifying the playback rate of the video portion of the input signal.
  • the datum related to the modification of the playback rate of the input signal set by the user is extrapolated, the new playback rate value of the audio portion of the signal is obtained by the method described above and by means of such new value in combinaion with a table function the new playback rate value of the video portion is calculated modifying the frame-rate value of the output video signal.
  • the invention relates also to an apparatus used for performing the described method and which is the subject of the claims.
  • the apparatus of the present invention is composed of processing means for executing programs having a working memory within which audio-video signals received by said apparatus from a source by input ports are loaded.
  • the processing means comprise a program memory wherein a program processing the audio-video signal is loaded or loadable, which program modifies the playback rate of the input audio-video signal and by output ports it provides an audio-video signal with a modified rate and with synchronization between the audio portion and the video portion without causing distorsions and/or artefacts and/or losses of information on said output signal.
  • the apparatus has a user interface for entering data and inputs which can be preferably composed of cable or wireless remote control means, such as remote controls commonly used in electronic devices.
  • a possible embodiment of the apparatus of the present invention inside said processing means has an additional memory wherein the output signal with modified rate can be stored, then it can be read out by the user interface and transmitted again without the need of having the input signal.
  • This variant gives a further function to the apparatus of the present invention, it acts as a file for films and television programs: the input signal can be modified and saved in the memory in order to be displayed subsequently and at different playback rates selected by the user on the grounds of his/her own needs by means of the user interface.
  • the apparatus of the present invention has all the peculiarities of standard decoders on the market, such as for example the possibility of having an internal timer allowing allowing an input audio-video signal to be recorded and modified by means of pre-settings, without the user being required to remotely start the recording.
  • the apparatus of the present invention it is possible to select only some portions of the signal to be reproduced and to select different playback rates for each portion.
  • the apparatus can have more than one input in order to select the source from where the signal is taken, such as for example a television, a player of several storage media (cd, DVD), satellite receivers and any outer sources and it can process several input signals modifying the playback rate thereof with common or different values for each input signal.
  • a source such as for example a television, a player of several storage media (cd, DVD), satellite receivers and any outer sources and it can process several input signals modifying the playback rate thereof with common or different values for each input signal.
  • Figure 1 shows the flow chart summarizing the steps of the method of the present invention according to a possible operation mode.
  • the method of the present invention is used for modifying the playback rate of audio video signals, generating an output video audio signal with a modified rate having a synchronization of the audio portion, the video portion and possible information carried by the signal, without generating distorsions or losses of information with respect to the input signal.
  • the aim of the method of the present invention is also to emphasize pauses that are present by nature in the speech increasing their time length.
  • Figure 1 shows a possible operating mode of the method of the present invention providing the following steps, shown by function blocks.
  • the user by a remote control device, turns on a device within which the method occurs, then he/she selects the source from which he/she desires to take the signal to be modified;
  • Signal loading denoted by 102: the signal is loaded within a working memory
  • Modifying the playback rate denoted by 103: the user decides whether modifying or not the playback rate.
  • a step processing the signal is performed, denoted by 104: the input signal is taken from the working memory and the audio portion is divided from the video portion and these are individually loaded within processing means.
  • Detecting components relating to pauses and speech denoted by 106: the audio portion of the signal is analysed by means of an algorithm in order to detect the voice presence.
  • a step setting a first playback rate is carried out, denoted by 107, according to a value that can be set by the user or a predetermined value within the device memory.
  • a step setting a second playback rate, denoted by 108 is carried out, according to a value that can be set by the user or a predetermined value within the device memory.
  • the playback rate used for setting the signal components relating to pauses is slower than the playback rate used for setting signal components relating to speech.
  • signal components of the audio portion relating to pauses are detected by a VAD algorithm (Voice Activity Detector), which algorithm recognizes a temporary drop-off in the amplitude of the audio portion under 6% of the peak, for a time of about 20-50 milliseconds as pauses.
  • VAD algorithm Voice Activity Detector
  • the input signal is windowed by using windows, which are then overlapped for reconstructing the signal in order to obtain the desired output samples/second value such not to cause frequency distorsions.
  • a table function which associates to a general change in the rate of any audio-video signal a change of the frame rate value relating to the video portion of the input signal.
  • the modification of the playback rate provides the frame rate to be increased or decreased by currently usually used methods, while as regards the audio portion the possibility of modifying the playback rate without distortions in frequencies, namely keeping for example the voice tone unchanged, is not banal: in order to achieve the desired effect the method of the present invention preferably uses a particular type of algorithms called SOLA, algorithms and/or methods are known and are described in more details in US 5,717,818 , US 5,175,769 and " an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech IEEE Proceedings of ICASSP-93, vol. II, pp. 554-557, 1993 whose contents is an integral part of the present description.
  • SOLA waveform similarity
  • the numeral 109 denotes the check of the synchronization. It is checked for the audio portion and video portion and the check acts upstream and downstream of the processing.
  • An embodiment of a method for verifying the synchronization provides to use synchronization bits, and it is shown and described in more details with reference to figure 2 .
  • the check loads again the input signal and it modifies iteratively the playback rate of the two audio and video portions till reaching the synchonization of the output signal;
  • the numeral 110 denotes the step generating and transmitting the signal which is carried out when the audio portion and the video portion of the signal are synchronized, they are joined again together generating the modified output signal which is then transmitted.
  • Figure 2 shows the step checking the synchronization of the method of the present invention according to the above mentioned example which is applied to a general audio signal for simplicity purposes without considering the fact of dividing components of the audio signal relating to pauses and speech respectively.
  • the synchronization check upstream of the processing process divides the two audio and video portions of the input signal, denoted in figure by 1 and 2 respectively, into a time sequence of sub-units: to each sub-unit 11 and 22 the synchronization bits 31 and 32 are univocally associated, in figure 2 this is highlighted by a different drawing effect of bits and sub-units; bits are also univocally correlated each other and belong to a sequence of pairs of synchronization bits which follow one another according to a time base that is a clock, having predetermined and fixed arrangement intervals; each pair of bits is divided and univocally associated to each audio sub-unit and video sub-unit of the input signal.
  • the signal is processed by processing means 42 and the check verifies the correspondance both between joined synchornization bits 31 and 32, and between audio and video sub-units 11 and 22 and its own associated bit, 31 and 32 respectively.
  • Figure 3 shows a block diagram of the structure of the apparatus 4 of the present invention.
  • the apparatus 4 receives from any source an audio-video signal by the input port 41 communicating with a processing unit 42.
  • a processing unit 42 there are provided a working memory 421 where the input signal is loaded, a program memoery 422 where a program processing the audio-video signal is loaded or loadable and a CPU 423 dividing the input signal into the audio portion and video portion and allowing said two portions to be processed: said two portions are then checked by the checking unit 43 acting upstream and downstream of processing means 42 which, by the method described above, checks the synchronization of the modified signal and if it is so it joins again the 2 signal portions and transmits the signal to the output port 44.
  • the apparatus 4 finally has an interface unit 45 communicating with the processing unit 42 and allowing the user to set the desired rate by means of a remote control and to carry out further several actions such as for example to select the source to be used as the input signal or to select the signal to be reproduced among the signals recorded within the working memory 421.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
  • Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)

Abstract

Method for modifying the playback rate of audio-video signals comprising the following steps:
a) receiving an input signal from any source,
b) storing said signal into a working memory of an apparatus,
c) setting the playback rate of said audio-video signal,
d) reading out said audio-video signal into said working memory,
e) separating the audio portion from the video portion of said signal
f) processing the playback rate both of the audio portion and of the video portion wherein the parameter of the playback rate set at step c) is used in order to define the new frame-rate value of the video portion and the new samples/second value of the audio portion, that have been modified individually and in a balanced way,
g) joining the audio portion with modified playback rate to the video portion with modified playback rate,
h) generating and transmitting an output signal providing synchronization between the audio portion and the video portion without leading to distortions and/or artefacts and/or losses of information on said output signal

and wherein
the playback rate of the audio signal is set by finding the audio signal components related to pauses, that is without voice emission, and related to speech, that is voice emission, a different playback rate being defined for said two components of the audio signal, and said different playback rate being applied also for processing the different playback rate of video signal components coinciding with the audio signal components related to pauses and speech.
The present invention relates also to an apparatus for performing said method.

Description

  • The present invention relates to a method allowing the playback rate of audio and video signals to be modified and to an apparatus for performing such method according to the preamble of claim 1.
  • A method of this type is known in WO2005/045830 , where the playback rate of the audio-video signal is modified according to a parameter that can be set.
  • The fact of being able to modify the rate of an audio-video signal has several advantages. For example such a method allows the comprehension of audio-video signals by hearing impaired individuals to be enhanced: many people suffering from a chronic lowering of the hearing threshold have comprehension difficulties when watching television programs such as for example television news and entertainment shows. The present invention guarantees to such individuals a better comprehension by slowing down the playback rate of audio-video signals characterizing such programs without losing the entertainment aspect due to a synchronization among images, music and words.
  • It is advantageous also to use the method and the apparatus of the present invention as an instrument when performing speech therapy both for evaluating hearing impaired people and as a support for performing the therapy and/or training and exercise therapies in particular for hearing impaired people.
  • Moreover the method and the apparatus of the present invention can be a great advantage in studying foreign languages allowing students to have a better comprehension when watching original language movies.
  • A further alternative and advantageous use of the method and apparatus of the present invention is to slow down the time flow of audio or audio-video information in order to facilitate the comprehension of very disturbed audio or audio-video messages: the fact that it is possible to change the rate of the input signal without substantially changing the timbre makes it possible to overcome problems in the comprehension of a disturbed message, which comprehension by the known method is still more difficult due to a change in the frequencies, causing the audio or audio-video signal to be harmfully changed.
  • The present invention can be used also as an instrument useful for facilitating the transcription as written text and possible translations of vocal contents recorded or stated in real-time. In this case the user both a human person and in the form of an automatic writing program from audio files can set the rate between a maximum and a minimum keeping the frequency properties of the voice unchanged. In combination, such effect influences also the fact that word scanning and the pronunciation are substantially unchanged and well comprehensible, which does not occur with standard methods changing the playback rate of audio signals.
  • As regards the fact of defining protocols and the types of communication or networks they are described in details in documents available by the so called RFC-Editor Web pages by means of which RFC Document database can be consulted which contains documents published by "The Internet Society" ISOC and by IETF (Internet Engineering task force) and are available on the website www.ietf.org.
  • Moroever it has to be noted that for drafting and comprehension purposes the terms "samples/second value" and "samples/second" will be alternately used to mean the unit of measure of the playback rate of the audio signal, as well as the term "frame rate" will be used to mean the unit of measure of the playback rate of the video signal.
  • Within such methods, experimental studies have pointed out that several types of verbal communication performed for different purposes and by different individuals are different each other for the expression accuracy, and so the accuracy in the pronunciation, the speed according to which words are pronounced and how long pauses are.
  • Speech as a rule contain pauses made by the speaker for example due to pauses among sentences, expressive interruptions, or silence moments corresponding to breathing.
  • Such pauses can occur with different time percentages, substantially ranging from 10% to 30%.
  • In professional communications, such as for example news reading, pauses are very short, the time percentage approaching 10%.
  • In such cases the comprehension can be very difficult above all for people with hearing problems, or people listening to a foreign language, and surprisingly it has been found that the fact of lengthening pauses and therefore the fact of increasing the time percentage thereof, leads to a considerable improvement in the comprehension.
  • None of the known documents, nor the above document, considers the possibility of improving the comprehension by acting also on the specific length of pauses.
  • Therefore in order to achieve the described advantages, the invention relates to a method as described in the preamble of claim 1, which method provides also that the playback rate of the audio signal is set by defining audio signal components related to pauses, that is without voice emission, and related to speech, that is with voice emission, a different playback rate being defined for said two components of the audio signal, and said different playback rate being applied also in the processing of the different playback rate of video signal components which coincide with the audio signal components related to pauses and to speech.
  • In particular the method of the present invention provides the following steps:
    • processing the audio portion of the audio-video signal for finding the signal components related to pauses, that is without voice emission, and signal components related to speech, that is voice emission;
    • finding signal components of the video portion related to pauses and speech respectively by means of a correlation of the corresponding signal components of the audio portion synchronized with the video portion which are related to pauses and speech respectively;
    • setting the playback rate according to predetermined setting parameters of the audio portion in a different way for signal components related to pauses and speech respectively;
    • applying said parameters for differently setting the playback rate to the video portion, for correspondingly set the playback rate of the signal components related to pauses and speech respectively.
  • The aim of the method of the present invention is also to underline pauses increasing their time length.
  • First of all it is necessary within an audio signal to define what is a "useful" pause, namely a pause affecting the cognitive activity.
  • To this aim it is necessary to make a step preprocessing the signal in order to filter possible noise peaks.
  • After that, the "useful" pause can be defined as the temporary drop-off in the amplitude of the voice signal under 6% of the peak, for a time period of about 20-50 milliseconds.
  • A pause lasting more than 200 milliseconds can be defined as a silence probably corresponding to a lack of conversation/signal by the speaker.
  • Structural pauses (short interruptions among syllabes) or grammatical pauses (interruptions among sentences or used for emphasizing words) therefore can be considered as "useful" pauses, and therefore can be detected by a VAD algorithm (Voice Activity Detector), able to classify within the spoken audio signal different types of pauses.
  • Steps for defining the modified playback rate of video and audio signals and synchronization will be described below in more details with reference to possible variants. All such modes provide audio-video signals having a modified playback rate and which can be reproduced without having distorsions nor losses of information and transmission of the signal.
  • It has to be noted that the method and apparatus of the present invention mainly, but not exclusively, operate with digital signals: in the case of analog sources the method and the apparatus of the present invention provide a variant embodiment using a usually used digital sampler receiving an analog audio-video input signal and generating a digital audio-video signal having a specific samples/second value for the audio portion and a specific frame rate value for the video portion.
  • According to an improvement of the invention, before reproducing the audio-video signal with a modified rate, a check guaranteeing the synchronization between the audio portion and the video portion can be provided after processing means.
  • Several ways for checking the synchronization are possible.
  • An example provides to use time sequences of sets of synchronization bits, made of at least a pair of bits, one after the other according to a time sequence provided with predetermined and fixed arrangement intervals, each bit of the same set being univocally associated to only one sequence of the signal portion whose playback rate has to be modified, while bits of the same set are further univocally correlated one another, the time sequence of the synchronization bits being checked to be the same before and after the processing and the synchronization is considered as being maintained when said check results in synchronization bits, upstream and downstram of the signal processing, following the same time sequence.
  • The check of the synchronization of the two audio and video portions provided by the method of the present invention, can also act for generating a copy of the input signal which is loaded into the working memory such that it can be processed again if the output signal will not show a synchronization between said portions of said input signal.
  • According to a possible variant embodiment, it is possible to make a numerical check on values of the playback rate of the individual signal portions before and after processing them, which numerical check provides the following steps:
    • extrapolating the value concerning the frame-rate of the video portion of the output signal from which the nominal value of the frame-rate of the input signal is obtained by using an inverse function of the application concerning the processing, that is an inverse fuction of the modification of the playback rate that is the frame rate;
    • subsequently comparing the nominal frame-rate value with the real frame-rate value which values are related to the video portion of the input signal.
  • Similarly samples/second values related to the audio portion of the input signal are calculated and compared.
  • The method of the present invention allows also the playback rate of audio-video signals to be modified in real-time. The input signal is processed and retransmitted without delays. The input signal is loaded within the working memory in order to manage delays due to the difference in the playback rate between the input signal and the output signal, such not to have losses of information when reconstructing the signal. The provision of a real-time reconstruction and the lack of asynchrony between the audio portion and the video portion of the signal are characteristics that are kept also in the case the input signal contains other information.
  • It is important to point out this aspect since, for example, the provision of subtitles in a television program can be an essential fact for some of the characteristic aims of the method of the present invention: subtitles help both hearing impaired people and people studying foreign languages in understanding audio video sources, and consequently, the synchronization among audio, video and subtitles is a very important aspect increasing the utility and improving the operation of the method of the present invention.
  • The method of the present invention as a preferred embodiment for modifying the playback rate of the audio portion related to the input signal provides the audio portion of the input signal to be divided into windows having a constant time width representing the minimum signal portion that will be processed for modifying the playback rate and provides said windows to be overlapped on the audio portion of the output signal for a time amount depending on the time width of the selected windows and on the value of the modification of the execution/playback rate of the audio-video signal.
  • Starting from the steps described above the method of the present invention preferably uses algorithms of the algorithm family called SOLA for modifying only the audio portion of audio video signals, more precise details being available in documents US 5,717,818 , US 5,175,769 and "an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech-IEEE Proceedings of ICASSP-93, vol.II, p. 554-557, 1993.
  • Moreover the method of the present invention in its preferred embodiment uses the parameter of the video frame rate for modifying the playback rate of the video portion of the input signal. In this case the datum related to the modification of the playback rate of the input signal set by the user is extrapolated, the new playback rate value of the audio portion of the signal is obtained by the method described above and by means of such new value in combinaion with a table function the new playback rate value of the video portion is calculated modifying the frame-rate value of the output video signal.
  • The invention relates also to an apparatus used for performing the described method and which is the subject of the claims.
  • The apparatus of the present invention is composed of processing means for executing programs having a working memory within which audio-video signals received by said apparatus from a source by input ports are loaded. In addition to the working memory the processing means comprise a program memory wherein a program processing the audio-video signal is loaded or loadable, which program modifies the playback rate of the input audio-video signal and by output ports it provides an audio-video signal with a modified rate and with synchronization between the audio portion and the video portion without causing distorsions and/or artefacts and/or losses of information on said output signal.
  • The size of the modification to the playback rate is set by users: the apparatus has a user interface for entering data and inputs which can be preferably composed of cable or wireless remote control means, such as remote controls commonly used in electronic devices.
  • A possible embodiment of the apparatus of the present invention inside said processing means has an additional memory wherein the output signal with modified rate can be stored, then it can be read out by the user interface and transmitted again without the need of having the input signal. This variant gives a further function to the apparatus of the present invention, it acts as a file for films and television programs: the input signal can be modified and saved in the memory in order to be displayed subsequently and at different playback rates selected by the user on the grounds of his/her own needs by means of the user interface.
  • According to a further characteristic, the apparatus of the present invention, has all the peculiarities of standard decoders on the market, such as for example the possibility of having an internal timer allowing allowing an input audio-video signal to be recorded and modified by means of pre-settings, without the user being required to remotely start the recording.
  • Moreover, once an audio video signal is stored, by the apparatus of the present invention it is possible to select only some portions of the signal to be reproduced and to select different playback rates for each portion.
  • Moreover the apparatus can have more than one input in order to select the source from where the signal is taken, such as for example a television, a player of several storage media (cd, DVD), satellite receivers and any outer sources and it can process several input signals modifying the playback rate thereof with common or different values for each input signal.
  • Potential uses of the present invention are as follows.
  • Use of the method and/or apparatus for slowing down audio-video programs directly recorded on a storage media or directly transmitted to said apparatus by an input interface.
  • Use of the method and/or apparatus for facilitating the comprehension of audio video signals.
  • Use of the method and/or apparatus as a supporting instrument during speech therapies for evaluating individuals, in particular hearing impaired individuals.
  • Use of the method and/or apparatus as a supporting instrument when performing speech and/or training and/or exercise therapies for individuals, in particularly hearing impaired individuals.
  • Use of the method and/or apparatus for learning languages.
  • Use of the method and/or apparatus as an instrument useful for transcritions as written text and possible translations of vocal contents recorded or stated in real time.
  • Advantages of the present invention are clear from what described above.
  • Further improvements of the method and of the apparatus of the present invention are the subject of the subclaims.
  • Characteristics of the invention and advantages deriving therefrom will be more clear from the following descritpion of some embodiments shown in annexed drawings wherein:
    • Fig. 1 is the flow chart schematically summarizing the steps of the method of the present invention;
    • Fig.2 schematically shows the step for checking the synchronization of the method of the present invention;
    • Fig.3 is a block diagram of the structure of the apparatus of the present invention.
  • The figures schematically show the structure and the operation of an embodiment of the method and of the apparatus of the present invention.
  • Figure 1 shows the flow chart summarizing the steps of the method of the present invention according to a possible operation mode.
  • The method of the present invention is used for modifying the playback rate of audio video signals, generating an output video audio signal with a modified rate having a synchronization of the audio portion, the video portion and possible information carried by the signal, without generating distorsions or losses of information with respect to the input signal.
  • The aim of the method of the present invention is also to emphasize pauses that are present by nature in the speech increasing their time length.
  • Figure 1 shows a possible operating mode of the method of the present invention providing the following steps, shown by function blocks.
  • Turning on and selecting the source, denoted by 101: the user, by a remote control device, turns on a device within which the method occurs, then he/she selects the source from which he/she desires to take the signal to be modified;
  • Signal loading, denoted by 102: the signal is loaded within a working memory;
  • Modifying the playback rate, denoted by 103: the user decides whether modifying or not the playback rate.
  • If the user does not desire to modify such rate an output signal equal to the input one is transmitted.
  • If the user desires to modify the playback rate a step processing the signal is performed, denoted by 104: the input signal is taken from the working memory and the audio portion is divided from the video portion and these are individually loaded within processing means.
  • Filtering the signal, denoted by 105: possible noise peaks are eliminated;
  • Detecting components relating to pauses and speech, denoted by 106: the audio portion of the signal is analysed by means of an algorithm in order to detect the voice presence.
  • If the voice is not detected and therefore a pause is detected, a step setting a first playback rate is carried out, denoted by 107, according to a value that can be set by the user or a predetermined value within the device memory.
  • Otherwise, that is when voice is detected and therefore speech is detected, a step setting a second playback rate, denoted by 108 is carried out, according to a value that can be set by the user or a predetermined value within the device memory.
  • By correlating the video portion to the audio portion synchronized therewith it is possible also within the video portion to find the components relating to pauses and speech respectively, such to set such components at the same rates the audio signal components are set; thus the synchronization between the two parts of the signal is not lost.
  • In a particularly advantageous embodiment the playback rate used for setting the signal components relating to pauses is slower than the playback rate used for setting signal components relating to speech.
  • According to a preferred embodiment, signal components of the audio portion relating to pauses are detected by a VAD algorithm (Voice Activity Detector), which algorithm recognizes a temporary drop-off in the amplitude of the audio portion under 6% of the peak, for a time of about 20-50 milliseconds as pauses.
  • As regards the audio portion, the input signal is windowed by using windows, which are then overlapped for reconstructing the signal in order to obtain the desired output samples/second value such not to cause frequency distorsions.
  • As regards the video portion a table function is used which associates to a general change in the rate of any audio-video signal a change of the frame rate value relating to the video portion of the input signal.
  • In has to be noted that, as regards the video portion, the modification of the playback rate provides the frame rate to be increased or decreased by currently usually used methods, while as regards the audio portion the possibility of modifying the playback rate without distortions in frequencies, namely keeping for example the voice tone unchanged, is not banal: in order to achieve the desired effect the method of the present invention preferably uses a particular type of algorithms called SOLA, algorithms and/or methods are known and are described in more details in US 5,717,818 , US 5,175,769 and "an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech IEEE Proceedings of ICASSP-93, vol. II, pp. 554-557, 1993 whose contents is an integral part of the present description.
  • The numeral 109 denotes the check of the synchronization. It is checked for the audio portion and video portion and the check acts upstream and downstream of the processing.
  • An embodiment of a method for verifying the synchronization provides to use synchronization bits, and it is shown and described in more details with reference to figure 2.
  • Obviously it is possible to use also other methods for verifying the synchronization between the audio component and the video component of the signal after having modified the playback rate of the two components in a way corresponding to the changing parameters set by the user.
  • If there is asynchrony between the two signal portions and namely the audio portion and the video portion, the check loads again the input signal and it modifies iteratively the playback rate of the two audio and video portions till reaching the synchonization of the output signal;
  • The numeral 110 denotes the step generating and transmitting the signal which is carried out when the audio portion and the video portion of the signal are synchronized, they are joined again together generating the modified output signal which is then transmitted.
  • Figure 2 shows the step checking the synchronization of the method of the present invention according to the above mentioned example which is applied to a general audio signal for simplicity purposes without considering the fact of dividing components of the audio signal relating to pauses and speech respectively.
  • Obviously the same method can be applied both to components relating to pauses and speech respectively, thus guaranteeing a synchronization between the video signal and the audio signal for both components.
  • The synchronization check upstream of the processing process divides the two audio and video portions of the input signal, denoted in figure by 1 and 2 respectively, into a time sequence of sub-units: to each sub-unit 11 and 22 the synchronization bits 31 and 32 are univocally associated, in figure 2 this is highlighted by a different drawing effect of bits and sub-units; bits are also univocally correlated each other and belong to a sequence of pairs of synchronization bits which follow one another according to a time base that is a clock, having predetermined and fixed arrangement intervals; each pair of bits is divided and univocally associated to each audio sub-unit and video sub-unit of the input signal. Now the signal is processed by processing means 42 and the check verifies the correspondance both between joined synchornization bits 31 and 32, and between audio and video sub-units 11 and 22 and its own associated bit, 31 and 32 respectively.
  • Figure 3 shows a block diagram of the structure of the apparatus 4 of the present invention.
  • The apparatus 4 receives from any source an audio-video signal by the input port 41 communicating with a processing unit 42. Within the processing unit 42 there are provided a working memory 421 where the input signal is loaded, a program memoery 422 where a program processing the audio-video signal is loaded or loadable and a CPU 423 dividing the input signal into the audio portion and video portion and allowing said two portions to be processed: said two portions are then checked by the checking unit 43 acting upstream and downstream of processing means 42 which, by the method described above, checks the synchronization of the modified signal and if it is so it joins again the 2 signal portions and transmits the signal to the output port 44.
  • The apparatus 4 finally has an interface unit 45 communicating with the processing unit 42 and allowing the user to set the desired rate by means of a remote control and to carry out further several actions such as for example to select the source to be used as the input signal or to select the signal to be reproduced among the signals recorded within the working memory 421.

Claims (16)

  1. Method for modifying the playback rate of audio-video signals comprising the following steps:
    a) receiving an input signal from any source,
    b) storing said signal into a working memory of an apparatus,
    c) setting the playback rate of said audio-video signal,
    d) reading out said audio-video signal into said working memory,
    e) separating the audio portion from the video portion of said signal
    f) processing the playback rate both of the audio portion and of the video portion wherein the parameter of the playback rate set at step c) is used in order to define the new frame-rate value of the video portion and the new samples/second value of the audio portion, that have been modified individually and in a balanced way,
    g) joining the audio portion with modified playback rate to the video portion with modified playback rate,
    h) generating and transmitting an output signal providing synchronization between the audio portion and the video portion without leading to distortions and/or artefacts and/or losses of information on said output signal
    characterized in that
    the playback rate of the audio signal is set by finding the audio signal components related to pauses, that is without voice emission, and related to speech, that is voice emission, a different playback rate being defined for said two components of the audio signal, and said different playback rate being applied also for processing the different playback rate of video signal components coinciding with the audio signal components related to pauses and speech.
  2. Method for modifying the playback rate of audio-video signals according to claim 1, characterized in that it provides the following steps:
    processing the audio portion of the audio-video signal for finding the signal components related to pauses and signal components related to speech;
    finding signal components of the video portion related to pauses and speech respectively by means of a correlation of the corresponding signal components of the audio portion synchronized with the video portion which are related to pauses and speech respectively;
    setting the playback rate according to predetermined setting parameters of the audio portion in a different way for signal components related to pauses and speech respectively;
    applying said parameters for differently setting the playback rate to the video portion, for correspondingly set the playback rate of the signal components related to pauses and speech respectively.
  3. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that the playback rate of the audio-video signal components relating to pauses is reduced at a value lower than the playback rate of the audio-video signal components relating to speech.
  4. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that signal components of the audio portion relating to pauses are detected by a VAD algorithm (Voice Activity Detector), which algorithm recognizes a temporary drop-off in the amplitude of the audio portion under 6% of the peak, for a time of about 20-50 milliseconds as pauses.
  5. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that said input signal is processed and said output signal is generated and transmitted under real-time or almost real-time conditions.
  6. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that said input audio-video signal contains information in addition to the audio portion and the video portion.
  7. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that the fact of modifying the playback rate only of the audio portion provides the audio portion of said input signal to be divided into windows and said windows to be overlapped on the audio portion of the output signal by an amount depending on the width of the selected windows and on the value of the modification of the playback rate of the audio-video signal.
  8. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that the playback rate only of the audio portion is modified by using a known algorithm belonging to the algorithm class called SOLA.
  9. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that it provides a further step for checking the synchronization between the audio portion and the video portion, or the synchronization of at least one of said portions with additional information contained into said input signal.
  10. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that said check is made by using time sequences of sets of synchronization bits, made of at least a pair of bits, one after the other according to a time sequence provided with predetermined and fixed arrangement intervals, bits of each set being univocally correlated to one another and univocally associated to each portion of the video sequence, audio sequence or other possible information contained into said signal, such that if synchronization bits are in the same sequence upstream and downstream of the processing of said signal the synchronization between the audio portion and the video portion is preserved into the output signal.
  11. Method for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that said check is made upstream and downstream of the processing of said signal, the value regarding the frame rate of the output signal is extrapolated and the nominal value of the input frame rate is achieved by an inverse function of the application regarding said processing, then the nominal and real frame-rate input values are compared; analogously the samples/second value concerning the input audio signal is checked.
  12. Apparatus for modifying the playback rate of audio-video signals characterized in that it is composed of processing means for executing programs with at least an input for an audio-video signal, an output for said audio-video signal, a user interface for inputting data and/or commands and which processing means comprise a working memory, within said working memory clips and/or a sequence of portions of audio-video files being loadable, and at least a program memory wherein a program for processing the audio-video signal is loaded or loadable, which modifies the playback rate of the input audio-video signal depending on a parameter of the modification of the playback rate set by the user and which outputs an audio-video signal with said new modified rate provided with synchronization between the audio portion and the video portion without leading to distortions and/or artefacts and/or losses of information on said output signal.
  13. Apparatus for modifying the playback rate of audio-video signals according to claim 12, characterized in that said output modified-rate audio-video signal is provided with synchronization between the audio portion, the video portion and possible information in addition to the audio portion and video portion, without leading to distortions and/or artefacts and/or losses of information on said output signal.
  14. Apparatus for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that it has user interface means composed of cable or wireless remote control means.
  15. Apparatus for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that within the memory of said processing unit it is possible to store the output signal from the processing unit itself.
  16. Apparatus for modifying the playback rate of audio-video signals according to one or more of the preceding claims, characterized in that it has at least more than one input and more than one output and said processing means contemporaneously allow the playback rate of individual audio video signals to be modified.
EP10165115A 2005-05-03 2010-06-07 Method and apparatus for modifying the playback rate of audio-video signals Withdrawn EP2261900A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DK10165111.5T DK2227042T3 (en) 2005-05-03 2005-05-03 System and method for sharing network resources between hearing aids

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IT000037A ITGE20090037A1 (en) 2009-06-08 2009-06-08 METHOD AND DEVICE TO MODIFY THE REPRODUCTION SPEED OF AUDIO-VIDEO SIGNALS

Publications (1)

Publication Number Publication Date
EP2261900A1 true EP2261900A1 (en) 2010-12-15

Family

ID=41600772

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10165115A Withdrawn EP2261900A1 (en) 2005-05-03 2010-06-07 Method and apparatus for modifying the playback rate of audio-video signals

Country Status (2)

Country Link
EP (1) EP2261900A1 (en)
IT (1) ITGE20090037A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172456A1 (en) * 2008-01-02 2009-07-02 Samsung Electronics Co., Ltd. Method and apparatus for controlling data processing module
CN107112029A (en) * 2014-12-31 2017-08-29 诺瓦交谈有限责任公司 Method and apparatus for detecting speech pattern and mistake
CN112750436A (en) * 2020-12-29 2021-05-04 上海掌门科技有限公司 Method and equipment for determining target playing speed of voice message

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
US5717818A (en) 1992-08-18 1998-02-10 Hitachi, Ltd. Audio signal storing apparatus having a function for converting speech speed
WO2004077381A1 (en) * 2003-02-28 2004-09-10 Dublin Institute Of Technology A voice playback system
WO2005045830A1 (en) 2003-11-11 2005-05-19 Cosmotan Inc. Time-scale modification method for digital audio signal and digital audio/video signal, and variable speed reproducing method of digital television signal by using the same method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
US5717818A (en) 1992-08-18 1998-02-10 Hitachi, Ltd. Audio signal storing apparatus having a function for converting speech speed
WO2004077381A1 (en) * 2003-02-28 2004-09-10 Dublin Institute Of Technology A voice playback system
WO2005045830A1 (en) 2003-11-11 2005-05-19 Cosmotan Inc. Time-scale modification method for digital audio signal and digital audio/video signal, and variable speed reproducing method of digital television signal by using the same method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech IEEE", PROCEEDINGS OF ICASSP-93, vol. II, 1993, pages 554 - 557
"an overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech-IEEE", PROCEEDINGS OF ICASSP-93, vol. II, 1993, pages 554 - 557

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172456A1 (en) * 2008-01-02 2009-07-02 Samsung Electronics Co., Ltd. Method and apparatus for controlling data processing module
US8245071B2 (en) * 2008-01-02 2012-08-14 Samsung Electronics Co., Ltd. Method and apparatus of processing data independently and synchronizing and outputting processed data
CN107112029A (en) * 2014-12-31 2017-08-29 诺瓦交谈有限责任公司 Method and apparatus for detecting speech pattern and mistake
CN112750436A (en) * 2020-12-29 2021-05-04 上海掌门科技有限公司 Method and equipment for determining target playing speed of voice message

Also Published As

Publication number Publication date
ITGE20090037A1 (en) 2010-12-09

Similar Documents

Publication Publication Date Title
US10475467B2 (en) Systems, methods and devices for intelligent speech recognition and processing
CN108259965B (en) Video editing method and system
US10180981B2 (en) Synchronous audio playback method, apparatus and system
US10158825B2 (en) Adapting a playback of a recording to optimize comprehension
US20160021334A1 (en) Method, Apparatus and System For Regenerating Voice Intonation In Automatically Dubbed Videos
US20100298959A1 (en) Speech reproducing method, speech reproducing device, and computer program
US11430485B2 (en) Systems and methods for mixing synthetic voice with original audio tracks
US10629223B2 (en) Fast playback in media files with reduced impact to speech quality
KR20150057591A (en) Method and apparatus for controlling playing video
US11942093B2 (en) System and method for simultaneous multilingual dubbing of video-audio programs
KR101334366B1 (en) Method and apparatus for varying audio playback speed
JP2014240940A (en) Dictation support device, method and program
WO2016165334A1 (en) Voice processing method and apparatus, and terminal device
EP2261900A1 (en) Method and apparatus for modifying the playback rate of audio-video signals
WO2023276539A1 (en) Voice conversion device, voice conversion method, program, and recording medium
Matamala The ALST project: Technologies for audiovisual translation
CN109712604A (en) A kind of emotional speech synthesis control method and device
KR100383061B1 (en) A learning method using a digital audio with caption data
CN108028055A (en) Information processor, information processing system and program
Arai et al. Seeing closing gesture of articulators affects speech perception of geminate consonants
JP7288530B1 (en) system and program
KR101501705B1 (en) Apparatus and method for generating document using speech data and computer-readable recording medium
KR102025903B1 (en) Apparatus and method for language learning
US20240169999A1 (en) Speech signal processing apparatus, speech signal reproduction system and method for outputting a de-emotionalized speech signal
WO2023238650A1 (en) Conversion device and conversion method

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME RS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110630