WO2009133324A1

WO2009133324A1 - Device and method for vocal reproduction with controlled multi-sensorial perception

Info

Publication number: WO2009133324A1
Application number: PCT/FR2009/000488
Authority: WO
Inventors: Jacques Feldmar; Maryvonne Zimmermann
Original assignee: Jacques Feldmar; Maryvonne Zimmermann
Priority date: 2008-04-28
Filing date: 2009-04-24
Publication date: 2009-11-05
Also published as: FR2930671B1; EP2269183A1; FR2930671A1

Abstract

The present invention relates to a method of vocal reproduction of a reference sound (6) by at least one user (1), said method comprising a step (I) of acquiring input signals, a step (II) of processing said input signals acquired with a view to providing (III) output signals comprising at least one cue relating to a comparison between an input sound signal acquired and the signal of said reference sound (6), and a step of multi-sensorial perception (IV) by the user (I) of at least one of said output signals with a view to allowing said user (1) to achieve said reference sound (6), characterized in that the level of multi-sensorial perception of at least one output signal comprising at least one cue relating to a comparison between an input sound signal acquired and the signal of said reference sound (6) is controlled by adjusting at least one parameter of a signal from among said output signal, said input signal acquired and said signal of the reference sound (6), as a function of the state of performance of the user (1). The present invention also relates to a device for vocal reproduction implementing such a method.

Description

VOICE REPRODUCTION DEVICE AND METHOD WITH CONTROLLED MULTI-SENSORY PERCEPTION

The present invention relates to a device and a method of voice reproduction with controlled multi-sensory perception.

TECHNICAL AREA

The present invention relates to the field of voice training for the reproduction of a reference sound. This reference sound can be a note, a rhythm, a melody, a range or sound sequence to reproduce.

It relates more particularly to a method of voice reproduction of a reference sound by at least one user, said method comprising an input signal acquisition step, a step of processing said input signals acquired for the purpose of providing output signals comprising at least one comparison information between an acquired input sound signal and the signal of said reference sound, and a step of multi-sensory perception by the user of at least one of said output signals by to enable said user to reach said reference sound.

STATE OF THE PRIOR ART

Such a method is intended to be used in particular for applications of learning singing, imitation and musical play, as well as for speech therapy purposes. In this context, it can be implemented as well by an amateur as a professional singer, a speaker or an actor, without this being limiting. It allows the user to train in order to harmonize his vocal functions in order to achieve the reproduction of a reference sound. The vocal functions implemented include for example the control of the diaphragm and the muscles, the vibration of the vocal cords, the control of the articulators. The articulators can be for example the lips, the jaw, the tongue, the veil from the palace, the uvula.

The state of the art in this field includes methods and devices utilizing the physical and mental capabilities of the user. As such, voice reproduction methods and devices use display means to provide a visual feedback to the user regarding the difference between the sound he has produced and the reference sound he wishes to produce. In the field of karaoke, software and can sing in a microphone to reproduce notes whose height and duration are displayed on a screen. At the end of each reproduction session, a similarity score is calculated and displayed on the screen. Such a method is described in US Patent 5,889,224, which relates to the real-time evaluation of the vocal performance of a singer from a karaoke-free melody. For this, the singer's voice and the melody are detected separately. The signal of the singer's voice thus detected is sampled. The sampled data thus obtained is compared with the data of the reference sound to be produced to obtain differential data. These data are used to calculate a similarity score representing the degree of deviation of the singer's voice.

The disadvantage of such a method lies in the inability to exploit the solicitation of the different senses of the user to make it reach the reference sound gradually and intuitively. On the one hand, the different senses are not exploited at the same time, whereas the brain is perfectly capable of integrating information coming from several senses at the same time.

A solution for exploiting multiple senses of the user is described in US patent document 2004/0194610. In this document, a training method consists in causing the user to generate a vocal sound and to adjust the vocal sound of said user in order to achieve a targeted score by implementing sensory feedback means - or effectors. These sensory feedback means are selected from visual, auditory, tactile feedback means, or a combination thereof. These return means indicate the difference between the its vocal product and the targeted note, which allows the user to decrease in real time this difference by adjusting its sound output. The multiplicity of sensory feedback means, as well as their difference of nature, makes it possible to exploit several senses of the user at the same time, which, by integrating multiple information at the same time by the brain, offers the possibility to this one. This is to benefit from more information and thus to improve the intuitive adjustment of its vocal production.

However, this solution has the disadvantage of not making the best use of the different senses of the user, the requested directions not being excited by the effectors optimally. In particular, it is not possible to take into account the perception capabilities of the various senses of the user so as to provide the user with feedback of his production that is dependent on his state of performance. To a lesser extent, the auditory and visual feedbacks do not sufficiently indicate the corrections to the voice and facial movements so that this information is integrated by the brain and that the adjustment is made more intuitive.

Thus, no solution of the state of the art makes it possible to have a method or a device for the vocal reproduction of a reference sound by a user who optimally exploits the physical and mental capacities of the user. , so as to allow it to adjust the sound it produces vis-à-vis the reference sound to produce intuitively.

OBJECT OF THE INVENTION

The aim of the present invention is to remedy this technical problem, by making it possible to optimally exploit the user's perception capacity as well as his state of performance. The solution of the invention lies in the implementation of a system for controlling the level of perception by the user of the information provided by the output signals calculated from acquired input signals.

The solution's approach has been to look for ways to implement multi-sensory perception means to provide the brain with more relevant information that can be better integrated. It has then become apparent that the use of a control system can make it possible to regulate the level of perception of the feedback information of the sound produced, by more particularly adjusting the comparison information between an input signal acquired in relation to the User's sound and reference sound.

For this purpose, the subject of the invention is a voice reproduction method such as of the type mentioned above in which, in addition to the characteristics already mentioned, the level of multi-sensory perception of at least one output signal comprising at least a comparison information between an acquired input sound signal and the reference sound signal is controlled by adjusting at least one parameter of a signal from the output signal, said acquired input signal and said signal of the reference sound, depending on the performance status of the user.

This allows the user to control the perception of voice reproduction himself, in particular the level of difficulty of speech reproduction, the intensity of the output signals he perceives via the means of perception and the processing options. acquired input signals.

Thus this process, consisting of the combination of a conventional method of vocal training and multi-sensory perception whose level is controlled, allows the user to optimize the adjustment of its production according to its perception.

Advantageously, it is provided that the adjustment of at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal is performed automatically and dynamically. The user can thus itself adjust the level of perception of the comparison between the sound that it has produced and the sound of reference as a function, on the one hand, of its state of performance and, on the other hand, of its sensitivity of perception.

Advantageously, it is provided that the adjustment of at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal, is carried out by the user. The user can thus have a real-time adjustment of the level of perception and the comparison between the sound he has produced and the reference sound according to his state of performance as well as, for example, the evolution time of this state.

In one embodiment for using the non-symmetrical role of the two ears and the integration by the brain of the information from the different senses, it is expected that, during the multisensory perception stage, a part at least one of said output signals provided is audibly perceived, said output signals supplied and audibly perceived being constituted so as to provide the user with two different output signals in connection with the signal of said reference sound and a signal input sound acquired.

The use of this asymmetrical - or binaural - auditory perception is known in the field of speech therapy. It consists of exploiting the non-symmetrical role of the ears in order to spatially separate or locate the sounds. This type of hearing is suitable for blind people to transcribe the position of a cursor on a screen. The left-right axis can be encoded by the relative loudness of the sound given by a pair of earphones in each ear. The high-low axis can be encoded by the pitch-note, frequency-of this sound.

Such a method, combining the control of multi-sensory perception and asymmetrical auditory perception, provides the brain with information enabling the user to intuitively correct its production according to its binaural sound perception. According to a first mode of implementation of this auditory perception, it is expected that the output signals provided and audibly perceived come from the same sound source.

According to a second embodiment of this auditory perception, it is provided that the output signals supplied and audibly perceived come from two sound sources spatially separated and arranged to provide each of the user's ear with one of the two different output signals in connection with the signal of said reference sound and an acquired input sound signal.

Preferably, it is provided that the output sound signals supplied to each ear of the user and perceived in an auditory manner consist of a combination of signals among at least one acquired input sound signal, the signal of the reference sound and an indicator of the difference between at least one acquired input sound signal and the signal of said reference sound, said indicator relating to at least one characteristic of said signals. It is thus possible to inject any kind of combination or information between at least one acquired input sound signal and the signal of the reference sound in order to provide the brain with relevant information that it can integrate in order to improve its production according to its perception.

According to a particular embodiment, it is provided that the output sound signals supplied to each ear of the user and audibly perceived comprise respectively at least partly the signal of the reference sound and an acquired input sound signal. The distribution of these signals between the two ears provided by the binaural sound perception then ensures that the user decreases this gap intuitively.

According to a particular embodiment, it is expected that the output sound signals supplied to each ear of the user and audibly perceived are respectively the reference signal and an acquired input sound signal. According to a particular embodiment, it is provided that the output sound signals supplied to each ear of the user and perceived in an auditory manner are respectively the reference signal and the difference between an acquired input sound signal and said input signal. reference.

In a first embodiment of the invention for integrating into the brain the algebraic difference (signed) between the acquired input sound signal and the sound of the reference signal, it is expected that the assignment of sound signals output signals supplied to each ear of the user and audibly perceived as a function of the sign of the difference between an acquired input sound signal and said reference signal.

In a second embodiment of the invention aimed at integrating into the brain the algebraic difference (signed) between the acquired input sound signal and the sound of the reference signal, it is expected that the amplitude of the sound signals output signals supplied to each ear of the user and audibly perceived as a function of the sign of the difference between an acquired input sound signal and said reference signal.

In one embodiment for improving the multi-sensory perception of the user's sound production, it is expected that, during the multi-sensory perception step, at least a portion of said output signals provided will be perceived from in a visual manner, said output signals supplied and perceived in a visual manner being related to at least one acquired input sound signal and the signal of said reference sound. This visual perception acts in addition to the binaural auditory perception, so that it is integrated by the brain in a complementary way to the sound signals provided. The brain integrates the visual and auditory information simultaneously. This allows the user to adjust more optimally its production according to its perception. In a preferred embodiment implementing visual perception, it is provided that the output signals supplied and perceived visually consist of a combination of signals from at least one acquired input sound signal, the sound signal. reference and an indicator of the difference between at least one acquired input sound signal and the signal of said reference sound, said indicator relating to at least one characteristic of said signals. It is thus possible to show the user any kind of combination or information between at least one acquired input sound signal and the signal of the reference sound in order to provide the brain with relevant information.

In another preferred embodiment implementing visual perception, it is provided that the output signals provided and perceived in a visual manner are perceived by the display of a three-dimensional virtual face of correction indicating the movements of the face of the face. necessary to reproduce the reference sound. This display provides the user with the algebraic (signed) difference between what has been produced and what should be produced. Since our senses expect coherent signals, the movements of a speaker's mouth should correspond to the sounds made. If a person sees lip movements that are incompatible with what they hear, they are disturbed. This incompatibility is thus used as visual information integrated by the brain so as to adjust the voice reproduction of the user.

In one embodiment for improving the multi-sensory perception of the user's sound production, it is expected that, during the multi-sensory perception step, at least a portion of said output signals provided will be perceived from in a tactile manner, said output signals supplied and perceived in a tactile manner being related to at least one acquired input sound signal and the signal of said reference sound. This tactile perception acts in addition to the binaural sound perception, and possibly also visual perception, so as to be integrated by the brain in a complementary manner to the sound signals. provided.

Preferably, the voice reproduction method operates in a closed loop. This closed loop between production and sound perception dynamically adjusts the link between production and perception to arrive at the vocal reproduction result of the reference sound.

Advantageously, provision is made for a delay to be introduced at the level of the acquired input signals so as to synchronize said acquired input signals with the output signals supplied. This allows combinations to be made between the input and output signals. Synchronized output, so that the user can integrate in real time the gap between voice production and perception to adjust in real time.

The invention also relates to a voice reproduction device for a reference sound by at least one user, comprising an input signal acquisition system, said input signals comprising at least one input sound signal, a processing system of said acquired input signals adapted to provide output signals comprising at least a comparison information of an acquired input sound signal with the signal of said reference sound, and a multi-sensory perception system of said signals output provided, arranged to allow the user to reach said reference sound. This device comprises a multisensory perception level control system of at least one output signal comprising at least one comparison information between an acquired input sound signal and the signal of said reference sound, said control system comprising means for adjusting at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal, depending on the performance state of the user. This device, consisting of the combination between a conventional device for vocal training and a means of multi-sensory perception and binaural sound, allows the user to optimize the adjustment of its production according to its perception. According to another preferred embodiment of this device, it is provided that it comprises means for recording and storing the acquired input signals and the output signals provided with a view to establishing a reproduction progress indicator. voice of the reference sound by the user. This allows the user to know the evolution of the vocal reproduction of the reference sound, so as to determine itself the progress made.

According to another preferred embodiment of this device, provision is made for the input signal acquisition system to comprise means of perception at least among auditory, visual and tactile perception means, said perception means being arranged so as to provide the user with at least one output signal related to at least one acquired input sound signal and the signal of said reference sound.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be better understood on reading the detailed description of a nonlimiting example of embodiment, accompanied by figures respectively representing: FIG. 1, a block diagram of a device and a method of single voice reproduction. FIG. 2 is a diagram of a single-user voice reproduction device according to one embodiment of the invention; FIG. 3 is a device for auditory perception of a single-user speech reproduction device; according to one embodiment of the invention, FIG. 4, means for visual perception of a single-user speech reproduction device according to one embodiment of the invention, and FIG. 5, a block diagram of FIG. a multi-user voice reproduction device according to the present invention. DETAILED PRESENTATION OF PARTICULAR EMBODIMENTS

In this patent, the performance status of the user will be understood to be the level of reproduction of a reference sound reached by the user, that is to say the difference between the sound that the user has produced and the reference sound. This reference state can be determined according to one or more parameters of the signals respectively of the sound that the user has produced and of the reference sound.

Fig. 1 shows a block diagram of a voice reproduction device and method according to the present invention. This device comprises an acquisition system 2, a treatment system 3, a mufti-sensory perception system 4 and a control system 5.

The acquisition system 2 allows the capture of a plurality of signals from the behavior of the user 1. It realizes the acquisition of input signals, said input signals comprising at least one input sound signal. It comprises acquisition means, including sound acquisition means 21, movement means 22, breathing means 23, touch keys 24, and blast means 25. These means consist respectively of a microphone, an accelerometer, a electrocardiograph, keyboard and spirometer.

In other embodiments, these acquisition means comprise a plurality of microphones, a joystick, a steering wheel, a camera, a stereo-vision device, a carpet, a vibration sensor, a pressure sensor, an electroencephalograph , a propeller, an induction tape or a telephone.

The multi-sensory perception system 4 makes it possible for user 1 to feel the difference between the sound he has produced and the reference sound 6 that he wishes to reproduce, in order to help him reproduce the said reference sound 6. It receives for this purpose the output signals supplied and transmits them to the user 1. It comprises means of sound perception 41 and 42, visual 43, tactile 44 and vibrational 45. These perception means are constituted respectively headphones, a screen, a force feedback wheel and a muscle stimulation electrode.

In other embodiments, these perception means comprise a data display, a plurality of loudspeakers, a sound headset, a braille reading device, a robot or a winder.

The processing system 3 comprises processing means 31 of the input signals acquired so as to provide output signals. The processes are operated so that these output signals comprise at least one piece of information for comparing an input sound signal acquired with the signal of the reference sound 6.

These processing means 31 may consist for example of a computer, a PDA, a DVD, a telephone.

The output signals calculated by the processing means 31 may in particular be acoustic indices, such as the vocal cord voltage, the speech register, the loudness, the prosody or the suprasegmental, the segmental, the vocal tone, the coordination supraglottic or glottic, turbulent air movement, stochastic disturbances of vocal fold vibration, unsolicited vibrations of ventricular folds or ary-epiglottic folds, uncontrolled transitions or non-modal vibrations.

The processing system 3 also includes means for recording and storing the acquired input signals and output signals. These means 32 make it possible to establish a progress indicator for the speech reproduction of the reference sound 6 by the user 1. This indicator can be used, for example, to provide the user with progress information as a function of time in the form of of graphics, or to control the level of perception by automatic and dynamic adjustment of at least one parameter of a signal among said output signal, said acquired input signal and said sound signal of reference 6, depending on the state of performance of the user 1.

The control system 5 controls the multisensory perception level of at least one output signal comprising at least one piece of comparison information between an acquired input sound signal and the signal of said reference sound. it means for adjusting at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal 6, depending on the state of performance of the user 1 .

According to a first embodiment, these means are means for manually adjusting the level of perception, made up of elements that can be manipulated by the user 1, the latter being able to thus adjust the level of multi-sensory perception as a function of his state of perception. performance. According to a second embodiment, these means are automatic and dynamic adjustment means, consisting of calculation elements able to determine the state of performance of the user and to deduce the corresponding level of perception. They can do this by integrating gap information between their product and their reproduction over a wide time interval, which enables the user's performance state to be determined more precisely.

This control system 5 thus makes it possible to adjust the level of multi-sensory perception as a function of the state of performance of the user. For example, for a novice user, the difference between the sound produced and the sound to be reproduced will be very large, and then the control system 5 will ensure a low dynamic level of perception so that the output signals including the information differences between the two sounds are not perceived too harmful. On the contrary, in the case of an expert user, the difference between the sound produced and the sound to be reproduced will be very small, and then the control system 5 will provide a high dynamic so that the user can reach more precisely the its to reproduce. The manual control means may consist for example of a keyboard, a mouse, a mixer, a steering wheel or a joystick. The automatic and dynamic control means may consist for example of a processor.

The transmission of signals between the acquisition systems 2, treatment 3, perception 4 and control is provided by wire. According to other embodiments, this transmission is wireless or performed via a local or external network, for example of the Internet type.

The reference sound signal 6 is placed on a data storage medium in order to supply it to the processing system 3. This support may be for example a standard CD, a midi format file, or any other type of medium allowing recording, of the signal.

Referring still to Figure 1, the voice reproduction method operates in a closed loop. According to this method, the acquisition (I) of input signals is first performed by the acquisition means 21, 22, 23, 24 and 25 of the acquisition system 2. The input signals comprise at least one sound signal corresponding to the sound produced by the user 1.

The acquired input signals are then processed (II) to provide (III) output signals comprising at least one comparison information between an acquired input sound signal and the signal of said reference sound 6.

The treatments performed can be calculations or effects. These calculations are for example, without being limiting, the calculation of the fundamental frequency (pitch, pitch, note), the volume, the intensity, the rhythm, the dynamics (attack, support, release), the timbre, the nasality, vibrato, breath (veiled effect), articulation, averaging, history or progress indication of the user, discrimination of sound, classification of sounds, measurement of signal similarities, pose and motion analysis in images. These effects are for example the change of pitch, the change of tempo, the separation music speech, the reverb, the calibration on a fair note, the shift of octave.

At least one of the output signals is then perceived (IV) by the user 1 in a multi-sensory manner so as to enable said user 1 to reach the reference sound 6.

The signal of the reference sound 6 is supplied (V) to the processing system 3 before the acquisition of the input signals, so as to account for the processing (II) of both the input signals and the signal of the input signal. reference sound 6.

This voice reproduction method also comprises steps (VII), (VIII) and (IX) for controlling the multisensory perception level of at least one output signal comprising at least one piece of information comparing a sound signal of acquired input and the signal of the reference sound 6. This control is achieved by adjusting at least one parameter of a signal among the output signal (VII), the acquired input signal (VIII) and the signal reference sound 6 (IX).

Since the signal perceived by the user 1 is an output signal comprising a comparison information between an acquired input sound signal and the signal of the reference sound 6, it is possible to adjust the output signal, the acquired input signal or the reference sound signal, or a combination of all three, so as to modify the dynamics of the difference between the sound produced and the sound to be reproduced.

This control is carried out manually by the user 1 according to the performance state that he determines himself and the level of perception that he wants, or automatically and dynamically depending on the state of the user's performance determined during the processing step (II).

In this preferred embodiment, the multi-sensory perception (IV) of output signals provided is performed using a combination of auditory, visual and tactile perception modes. According to other embodiments, it may be provided to use only two modes of perception among the three above.

Of the output signals provided and perceived, two are heard in an auditory manner. These output signals are connected to the signal of the reference sound 6 and an acquired input sound signal, and thus comprise a comparison information between reference sound 6 and the sound emitted by the user 1.

These signals are further constituted so as to provide the user 1 with two different output signals. This makes it possible to provide, by comparison of the two signals, perceptible information related to the difference between the sound produced and the sound to be reproduced.

In the embodiment chosen, these two output signals come from two spatially separated sound sources. The two sources are arranged to provide each ear of the user 1 a different signal among the two signals. This can be achieved for example by using two earphones, each earphone being arranged against an ear and emitting a different signal.

According to another embodiment, these two output signals come from the same sound source. The source then emits a single signal containing the two different signals. In this case, the binaural hearing capability of the two user's ears is implemented so as to separate the two signals.

In order to provide the user with the comparison information between the sound produced and the sound to be reproduced, the output sound signals supplied to each ear of the user 1 are respectively the reference signal (6) and an audible signal. acquired entrance. In this case, the user can directly perceive the difference between the sound produced and the sound to be reproduced.

In another embodiment, the output sound signals provided to each ear of the user 1 are respectively the reference signal 6 and the difference between an acquired input sound signal and said reference signal 6.

In another embodiment, one of the two output signals comprises an indicator relating to at least one characteristic of the input signals and the reference sound.

In order to perceive the sign of the difference between the sound produced and the sound to be reproduced, the auditory perception step comprises a sub-step of assigning the two signals according to the sign of the difference between the sound signal of acquired input and the signal of the reference sound 6. Thus, in the case where the input sound is greater than the reference sound, the acquired input sound will be emitted into the left ear and the reference sound into the right ear, this assignment being inverted when the input sound is lower than the reference sound 6.

In another embodiment, the perception of the sign of the difference is effected by adjusting the amplitude of the two signals according to the sign of this difference.

Among the output signals provided and perceived during the multi-sensory perception stage (IV), at least one is perceived visually and another tactile. These output signals are also in connection with the signal of the reference sound 6 and an acquired input sound signal.

In all this speech reproduction method, a delay is introduced at the acquired input signals so as to synchronize them with the output signals provided. This delay thus makes it possible to perform an exact matching between the input and output signals at the level of the multisensory perception step (IV).

Figures 2 to 4 show diagrams of a voice reproduction device according to one embodiment of the invention. With reference to FIG. 2, the device comprises in this embodiment a microphone 50, a central unit 51, a display screen 52, a pair of earphones 53 and a steering wheel 54.

The microphone 50 realizes the acquisition of the sound emitted by the user 1 and the conversion of this emitted sound into an input signal. The microphone is connected to the CPU 51 which performs the processing of the input signals to obtain output signals, as well as the local recording of the data (input and output signals).

The central unit 51 is connected to the steering wheel 54. This steering wheel allows the user to provide manual control of the level of perception of the difference between the sound produced (the sound acquired by the microphone 50) and the sound to be reproduced ( the reference sound). The user 1 turns the steering wheel 54 in one direction or the other so as to decrease or increase the dynamics of said level of multi-sensory perception. He can thus adjust his perception level himself according to his state of performance. In another embodiment, the steering wheel 54 is replaced by a control keyboard adapted to perform the same operations of adjusting the difference between the sound produced and the sound to be reproduced.

The central unit 51 is also connected to the multisensory perception means including the pair of headphones 53 and the display 52.

Referring to Fig. 3, different output signals are transmitted to the earphones 53 'and 53 "of the earphone pair 53. For example, in the case where the amplitude of the input signal is greater than that of the signal of the reference sound, the signal transmitted to the earpiece 53 'is the input signal picked up by the microphone and the signal transmitted to the earphone 53 "is the signal of the reference sound. In the case where the amplitude of the input signal is lower than that of the signal of the reference sound, the signal transmitted to the earpiece 53 'is the signal of the reference sound and the signal transmitted to the earphone 53 "is the input signal picked up by the microphone. It is thus possible for the user 1 to perceive the sign of the difference between the two signals.

With reference to FIG. 4, a three-dimensional virtual face of correction is displayed on the display screen 52. This face indicates the movements of the face of the user 1 necessary for the reproduction of the reference sound. For each display on the screen 52, the three-dimensional virtual face of correction indicates the type of correction to be made (labial, articulatory, etc.) and the curve indicates the difference between the sound produced and the sound to be reproduced, possibly with an indication. progress over time. Among the possible displays on the screen 52, the display 52 'relates to a labial correction, the display 52 "an articulatory correction and the display 52'" a vibratory correction by the breath.

For singing, the position of the body and breathing are indeed essential. The position of the body can be acquired by a camera - or two cameras in the case of stereoscopy - associated with image processing. A software specifically dedicated to the analysis of the face also allows to have information on the lips - in particular their opening, stretching and protrusion - of the user, on the opening of the jaw and on the height of the head relatively to the shoulders. For breathing and breathing, induction bands may be used around the chest and abdominals, and a propeller sensor placed in front of the mouth.

In another embodiment, the device also comprises tactile perception means in addition to the pair of headphones 53 and the display screen 52. These tactile perception means may for example consist of a muscle stimulation electrode.

The previously described embodiments of the present invention are given by way of examples and are in no way limiting. It is understood that the skilled person is able to realize different variants of the invention without departing from the scope of the patent.

In particular, the device and the method can be applied in multi-user applications. With reference to FIG. 5, which shows an example of embodiment with two users (1, 1 '), each user notably has means of acquisition and perception. The processing means are shared via a connection to a local or external network such as the Internet. According to another embodiment, the processing means may be specific to each user. The reference sounds (6,6 ') to be reproduced may be identical or different.

Claims

1 - Process for the vocal reproduction of a reference sound (6) by at least one user (1), said method comprising a step of acquisition (I) of input signals, a step of processing (II) said signals input signal acquired to provide (III) output signals comprising at least one comparison information between an acquired input sound signal and the signal of said reference sound (6), and a multi-sensory perception step ( IV) by the user (1) of at least one of said output signals to enable said user (1) to reach said reference sound (6), characterized in that the level of multi-sensory perception of at least one output signal comprising at least one comparison information between an acquired input sound signal and the signal of said reference sound (6) is controlled by adjusting at least one parameter of a signal from said output signal, said acquired input signal and said sound signal reference number (6), depending on the user's performance status (1).

The method of voice reproduction according to claim 1, wherein the adjustment of at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal (6) is realized automatically and dynamically.

A voice reproduction method according to any of claims 1 or 2, wherein adjusting at least one parameter of a signal of said output signal, said acquired input signal and said sound signal of reference (6), is performed by the user (1).

4 - speech reproduction method according to any one of the preceding claims, wherein, during the multisensory perception step (IV), at least a portion of said output signals provided is audibly perceived, said signals of provided and audibly perceived outputs being constituted so as to provide the user (1) with two signals of different output in connection with the signal of said reference sound (6) and an acquired input sound signal.

5 - speech reproduction method according to claim 4, wherein said output signals provided and audibly received from the same sound source.

The voice reproduction method according to claim 4, wherein said audibly supplied and audibly output signals are from two spatially separated sound sources arranged to provide each of the user's ear (1) with one of two different output signals in connection with the signal of said reference sound (6) and an acquired input sound signal.

7 - speech reproduction method according to any one of claims 4 to 6, wherein the output sound signals provided to each ear of the user (1) and audibly perceived consist of a combination of signals from the minus an acquired input sound signal, the reference sound signal (6) and an indicator of the difference between at least one acquired input sound signal and the signal of said reference sound (6), said indicator relating to least one characteristic of said signals.

The voice reproduction method according to claim 7, wherein the output sound signals supplied to each ear of the user (1) and audibly perceived respectively comprise at least in part the signal of the reference sound (6) and an input sound signal acquired.

9 - speech reproduction method according to claim 8, wherein the output sound signals supplied to each ear of the user (1) and audibly perceived are respectively the reference signal (6) and an input sound signal acquired.

The voice reproduction method according to claim 8, wherein the output audible signals supplied to each ear of the user (1) and audibly perceived are respectively the reference signal (6) and the difference between an acquired input sound signal and said reference signal (6).

11 - speech reproduction method according to any one of claims 8 to 10, wherein the assignment of the output sound signals supplied to each ear of the user (1) and perceived auditory is a function of the sign of the difference between an acquired input sound signal and said reference signal (6).

12 - speech reproduction method according to any one of claims 8 to 10, wherein the amplitude of the output sound signals provided to each ear of the user (1) and heard auditory is a function of the sign of the difference between an acquired input sound signal and said reference signal (6).

13 - A speech reproduction method according to any one of the preceding claims, wherein, during the multi-sensory perception step (IV), at least a portion of said output signals provided is perceived visually, said signals of output provided and perceived visually related to at least one acquired input sound signal and the signal of said reference sound (6).

14. The voice reproduction method as claimed in claim 13, in which the output signals supplied and perceived visually consist of a combination of signals among at least one acquired input sound signal, the signal of the reference sound. ) and an indicator of the difference between at least one acquired input sound signal and the signal of said reference sound (6), said indicator relating to at least one characteristic of said signals.

A voice reproduction method according to any one of claims 13 or 14, wherein the output signals provided and received from visual way are perceived by the display of a three-dimensional virtual face of correction indicating the movements of the face of the user (1) necessary for the reproduction of the reference sound (6).

16 - Voice reproduction method according to any one of the preceding claims, wherein, during the multisensory perception step (IV), at least a portion of said output signals provided is perceived in a tactile manner, said signals of output provided and touch-sensed being related to at least one acquired input sound signal and the signal of said reference sound (6).

17 - speech reproduction method according to any one of the preceding claims, operating in a closed loop.

18. The voice reproduction method as claimed in claim 1, wherein a delay is introduced at the acquired input signals so as to synchronize said acquired input signals with the output signals provided.

19 - Device for the vocal reproduction of a reference sound (6) by at least one user (1), comprising an acquisition system (2) of input signals, said input signals comprising at least one sound signal input, a processing system (3) of said acquired input signals adapted to provide output signals comprising at least one comparison information of an acquired input sound signal with the signal of said reference sound (6) , and a multi-sensory perception system (4) of said output signals provided, arranged to allow the user (1) to reach said reference sound (6), characterized in that it comprises a system for controlling (5) the multisensory perception level of at least one output signal comprising at least one piece of comparison information between an acquired input sound signal and the signal of said reference sound (6), said control system having adjustment means of at least one parameter of a signal among said output signal, said acquired input signal and said reference sound signal (6), depending on the performance state of the user (1).

The voice reproducing apparatus according to claim 19, comprising means for recording and storing (32) the acquired input signals and output signals provided for establishing a progress indicator of the speech reproduction of the sound. reference (6) by the user (1).

21 - voice reproduction device according to any one of claims 19 or 20, wherein the acquisition system (2) of the input signals comprises means for perception at least among auditory perception means (41, 42) , visual (43) and tactile (44,45), said perception means being arranged to provide the user (1) with at least one output signal related to at least one acquired input sound signal and the signal of said reference sound (6).