EP2401856A1

EP2401856A1 - Method for equalizing audio signals in fixed and mobile telephony

Info

Publication number: EP2401856A1
Application number: EP10711756A
Authority: EP
Inventors: Ugo Nevi
Original assignee: Individual
Current assignee: Individual
Priority date: 2009-02-26
Filing date: 2010-02-24
Publication date: 2012-01-04
Also published as: ITRM20090089A1; IT1396715B1; WO2010097829A1

Abstract

In relation to the spectrum of the original signal transmitted by the calling telephone, the voice signal on a telephone channel, received by the receiving telephone, is subjected to a processing procedure which evaluates the composition of the frequency spectrum of the received signal and subjects it to an equalization process comprising the determination of the selective attenuation of some frequencies and the selective amplification of others, on the basis of the deformations/distortions of the signal by the transmission channel. The embodiment allows adjusting and filtering the dynamically amplified frequencies in real time, through a dedicated menu on the cell phone display. A simultaneous conversion of the speech into text is provided for greater comprehensibility.

Description

METHOD FOR EQUALIZING AUDIO SIGNALS IN FIXED AND MOBILE

TELEPHONY

Field of the invention

The present invention refers to equalizing techniques usable in telephony. More particularly, by means of the present technique, the reproduction of the audio frequencies is such that the person using the telephone does not hear the interferences and the disturbance effects usually introduced by conventional microphones/loudspeakers on the user terminal employed in mobile telephony. The invention thus concerns microelectronic devices and circuits which can be used as equalizers as well as the telephone network devices and the related management software set to optimize the transmission of the audio signal.

State of the art KEY

GSM: Global System for Mobile Communications

HZi?: Home Location Register is the data bank used for the permanent preservation of the subscriber data with the relative profiles of the assigned services. MSC: Mobile Switching Center has the function of switching the data and voice traffic towards the GSM network.

VLR: Visitor Location Register is a data bank which contains the user information that is temporarily assigned to the operator due to the roaming functions. VoIP: Voice over IP, transmission of the voice signal on the IP/data network. During a telephone conversation, one of the most important problems deriving from the use of standard reproduction systems, such as those provided by default in cell phones, is the interference produced by the signals and noise coming from different sound sources. This phenomenon comes under the name of audio intermodulation received by the microphone and by the preamplifier, such that all the sounds that are mixed flow into the loudspeaker to cause the simultaneous excitement of the relative coil. In particular, the impossibility of filtering, identifying and listening to the single voice from among a group of sound sources, the sound interference effect due to the arrangement of the microphone and the loudspeaker inside the same cell phone, and the different types and groups of sounds which come, for example, from the outside or from nearby audio apparatuses, often cause the hypacusic user to have ineffective, non-clean, non-clear hearing. Also known at the state of the art is the frequent annoying phenomenon of autophony, which derives from the altered hearing of one's own voice. This disturbance is attributable to the use of the earphone, which even if equalized still causes an alteration of the ear's perception characteristics. Such problems generally cause difficulties in verbal comprehension, and reduce the comfort in using devices - the so-called acoustic apparatuses - which correct and/or aid the hearing of a user with limited auditory capacities. Due to the aforesaid limited perception, such a user may have a closed attitude with respect to the outside world. hi addition, it should be underlined that the ear - situated close to the loudspeakers kept in close contact with each other - is exposed to a signal which exceeds 140 db: indeed, by raising the volume, one actually operates in a "painful" band for the listener. With this type of volume increase of the sound signal, whatever remained healthy of the user's auditory capacity is damaged and thus further degraded, hi practice, a positive feedback process is established: the more the user feels bad, the more he amplifies the sound level of the input signal, the more he damages the listening ear, and hence the user feels even worse. As is known at the state of the art, equalizers are electronic devices set to amplify in a selective manner. Some structural parts of the equalizer 37 are similar to the amplifiers used in audio transreceiving systems, even if, more particularly in the case of voice signal amplification, the amplification of the signal is directed to selectively amplify only some frequencies. Equalizers usually comprise an audio frequency amplifier with different filters with different slopes and frequency responses - an adjustable and selectable frequency response is thus attained. They are usually used for compensating auditory sensitivity by using electronic means that are preconfigured so as to adjust the optimal hearing level that must be attained. hi more general terms, three equalizer types are known for cell phones: i- an actual equalizer for a cell phone (in the sense that it is contained in the same casing as the cell phone), ii- an equalizer integrated in the earphone that is normally used near the ear, iii- an external equalizer, separate from both the cell phone and the earphone, configured as a little box containing the equalizer to be used as an accessory interposed between the earphone and the cell phone.

The person with hypacusia problems avoids operating with two simultaneous amplification systems, in order to prevent generating interferences between the sound reproduction of the cell phone and the prosthesis; such a person would prefer to listen from only one cell phone equipped with a suitable equalizer.

On the other hand, the digital technologies currently applied to conventional prostheses have given life to digitally programmable apparatuses with conventional sound processing circuits (microphone, amplifier, receiver) adjustable by means of an external computer. The audiometric technician is capable of adapting the sound processing to the needs of the user via computer, by means of specific controls. Indeed, digitally programmable techniques are currently known that are easily adaptable to the hearing needs of the user, hi case of variation of the hypacusia level, they can be reprogrammed within certain limits for new hearing needs, the range of diversified applications being available: slight, medium, serious and acute.

Moreover, the digital world has introduced a new philosophy for addressing the needs of the user with limiting hearing capacities. Before, the user was obliged to adapt himself to the acoustic aid; now, the prosthesis can be adapted to the needs of the user. With this aid type, it is possible to modify fifteen, twenty, even thirty different parameters which define the mode and type of sound processing to be sent to the user's ear.

Summary of the invention The object of the present invention is to make an equalizer device tunable by the user, possibly directly on the display of the cell phone, with which the user himself (in particular the user with reduced auditory capacities) can listen to audio frequency sounds without the disturbances deriving from interferences or undesired noise. Another object of the present invention is to provide a technique for equalizing the audio signal in fixed and mobile telephony which allows optimizing the listening for users with reduced auditory capacities, through an interactive procedure for adjusting the selective filters of the audio frequencies used and comprised in the sound signal related to the speech (which the two phones taking part in the conversation exchange). Another object of the invention is to provide a solution which has a cell phone that uses differential microphones having optimized frequency responses and optimized sensitivity characteristics with respect to common microphones.

Finally, another object of the present invention is to provide a technique for equalizing the audio signal in fixed and mobile telephony which employs cell phone hardware components and techniques which are standard in the field of telephony, in cell phone network communications and in fixed telephone network communications, in order to make the system itself low-cost, highly reliable and easy to use. On the basis of these objectives, a programmable and interactive equalizer is provided comprising a differential microphone, dynamically programmable filtering means, an adjustable power amplifier and a sound transducer, all power supplied by the cell phone battery. The capacitive filtering means are provided with a programmable memory and with the relative interface constituted in the form of an integrated circuit which comprises at least one oscillator associated therewith. The keyboard and the display of the cell phone constitute the l/O means which allow the user to insert and display the adjustment parameters.

The power amplifier instead corresponds with the terminal section which drives the loudspeaker power supplied by the same cell phone battery 33, and it too is currently produced as an integrated circuit. Also provided are a series of solutions on the telephone network architecture, in order to provide an aid to the hypacusic user by either arranging an equalization in real time (which, based on the environmental noise conditions, prearranges an equalization on particular frequencies), or by providing for a visual aid in text form which translates the voice signal into a sequence of words in text format that can be interactively viewed on the display 31 of the cell phone, or a PC if one communicates with a VoIP technique on TCP/IP.

Brief description of the drawings

The invention will be better understood from the reading of the following description, provided merely as an example and made with reference to the drawings, wherein:

FIGURE 1 is a block diagram which shows the network connections between the two telephones which intervene in the conversation for the equalization according to the present invention,

FIGURE 2 is a block diagram of the sections composing a cell phone according to the present finding.

Description of preferred embodiments Through the selection menu present on the display 31 of the cell phone, the present finding permits selecting the frequencies appropriate for the auditory capacities of the cell phone user. Indeed, as is known, it is never the increase in volume (amplification) that allows hearing-impaired people to hear, but rather only the improved reception of particular "tones" (frequencies) which if suitably amplified positively affect the sections of a non-optimized frequency response curve.

The principle of the present invention is based on the attainment of a technique which allows the real time adjustment and filtering of the frequencies, dynamically amplified, through a dedicated menu. Such menu is reported on the display 31 of the cell phone, provided to the user in graphical or text mode so that he can arrange a series of procedures for equalizing the incoming sound signal. More specifically, every component of the frequency spectrum of the sound signal is selected in a suitable manner, and the user can operate on the known parameters by following a typically audiometric protocol, which can be executed both at the time of (preset) activation of the cell phone or in real time, each time a communication is established. In the latter case, the clear advantage is to be able to adapt the equalizer to the specific environmental conditions at that time, which can be fairly noisy, disturbed over particular frequencies with respect to others. In other words, on one hand, an equalization operation of the single cell phone can be carried out, which corresponds to a procedure of adjustment between radio base station 2 and cell phone A, B. Or the user can go to a supplier store - a specialized center managed by the network operator, with audiometric center functions - so that a configuration/ initialization of the cell phone is carried out. In both cases, a procedure for testing the auditory capacities of the user is carried out, operating through an interactive menu according to the dichotomous or binary research principle. According to such rule - as when one searches for a word in a vocabulary - the "good" half is excluded each time, until the number of pages is reduced to one: the word is either there or it doesn't have any meaning. In such a manner, the transposition of that known as interpolated search in mathematics is employed for this adjustment type. Li fact, in the present finding, the adjustment of the audio signal spectrum has a typically adaptive feature, since the detection of the noise and disturbances is actuated via the sampling over a reduced spectrum of the audio frequencies, a fine adjustment of dichotomous type being actuated as mentioned above. For example, a frequency of 1000 hertz is sent first, followed by a frequency of 500 hertz and it is asked, as is done at the ophthalmologist, if the first or the second frequency was heard better. If the first (1000 Hz) is heard better, a current interval between 500 and 750 is considered and it is asked if the first or the second was heard better; if however the second frequency (500 Hz) was heard better, the 1000 hertz and 750 hertz frequencies are sent and it is asked if the first or the second frequency was heard better. This process repeats until the least-heard frequency is identified.

As stated above, such adjustment activity can be executed with a preset, just once at a specialized center or each time the user requires it according to his needs, interfacing with a radio base station 2 and activating an interactive menu from one's own cell phone.

The adjustment is based on voice machines which operate with automatic responders, as with the calls to the Call Center. Then, at the time of configuration of the cell phone with the equalizer 37, it is provided that a voice machine responds and requests the typing of various keys 1, 2 or 3 ... in order to have the execution of the selection functions specific for the frequency of the band-pass filter to be activated and the relative amplification. It will be possible to interactively see, on the cell phone screen 31, a display of the frequencies which are activated or deactivated, for example by means of bar diagram. Based on the results of such empirical operative mode for detecting the frequencies heard with the least intensity, the test of the dedicated telephone apparatus according to the present invention then provides for the verification of the insertion gain, which consists of the measurement of the amplification delivered by the equalizer to the eardrum. This is an actual vocal audiometry process with and without competition, since the tests are repeated with and without acoustic equalizer. The capacity to understand common words and phrases is evaluated - this test also provides the functional gain of the telephone.

The clear advantage is that there is no longer the need to keep the volume of the cellular apparatuses high (and thus preventing others from hearing, against their will, what the two people say in their phone conversation). The telephone device used - i.e. the actual cell phone - as reported in Fig. 2 essentially comprises a keyboard 32, a display 31, a control section 30, a radiofrequency section 35, a power supply 33 and an audio conversion section 34 according to the usual components in use. With regard to the present invention, it is important to consider the direct interfacing of the audio section with an equalizer section 37 and that the same audio section 34 supports a loudspeaker LS and two microphones Ml and M2, i.e. of two acoustic detectors with high sensitivity and which have minimized transmitted noise. It in fact comprises two microphones, i.e. two transducers for converting the emitted sound signals into electrical output signals in the presence of acoustic noise; in addition, during the conversation, the transducer is maintained a substantially constant distance from the source, so that the trasducer is adapted to respond to a spatial derivative of the second order of the pressure field correlated with the incoming frequencies. As schematically indicated in Fig. 2, two differential microphones Ml and M2 are used, separated from each other by a certain distance d. hi practice, a microphone Ml is arranged as usual on the lower end of the phone where the user speaks, while the other microphone M2 of directional type is mounted at the top on the opposite end; hence the two microphones, of course not being aimed in the same direction, are capable of the following: i- one, Ml , can acquire the voice which speaks on the phone plus the surrounding noise, while ii- the upper microphone, M2, of directional type only acquires the noise coming from the surrounding environment.

In the equalization procedure, it is provided that the signal to be evaluated coming from the loudspeaker is feed in input sampled and pre-amplified to the microphone Ml . It is then filtered by a band-pass filter having band pass comprised between 375 and 4000 hertz.

Associated with this pair of differential microphones Ml, M2 are differentiated means which allow acquiring an electrical input signal from each of the two microphones, in order to produce, in response to the same, an electrical difference signal which is proportional to the difference between the acoustic signals exiting from the corresponding microphones.

The adjustment and control principle used by the user interfacing with the telephone network also provides for a real time detection system which can operate in different modes according to which a- the called user is the person with lesser auditory capacities, b- the calling user is the person with lesser auditory capacities, and c- both the called party and the calling party are people with diminished auditory capacity.

In the three possible situations, an interactive management and control are provided for the audiometric adjustment, which is operated between the calling and called parties. In a preferred embodiment, the information related to the fact that the calling party A and/r the called party B are people with reduced auditory capacities is acquired by the HLR 10, thus detecting that in relation to the call, suitable information must be sent to the calling party A with regard to the type of conversation and the related modes for carrying out the same. The system then provides for the filtering of the incoming speech: a- based on the fact that the called party has a reduced auditory capacity, b- based on the fact that there is a disturbed communication and it is desired to improve its quality, and based on the criterion c- that it is better to suppress, at the source, the noise and the frequencies which must be sent and which would otherwise be amplified, distorted and further deteriorated.

This is based on the principle that if it is already known that several specific frequencies must not be heard, then it is worth filtering and/or amplifying at the central control, i.e. according to the characteristic parameters of the user's audiometry curve, which is stored in the HLR 10. So that each time it is detected that, from the calling terminal, these frequencies are present in input on the calling phone, filtering is provided thereof, preventing such frequencies from being amplified in amplitude and transferred to the called party as a perceptible signal.

A further embodiment provides for the synchronization of the telephones possessed by both users with limited auditory capacity, for which an equalization on the two phones in real time is useful, in relation to the noise and to the specific environmental disturbances. In other words, the equalization carried out - as described above - just once, may not be the best if one is in a fairly noisy specific environment. In such a setting, an equalization procedure is provided for when the calling party A requests to communicate with a called party B and both users, at the mobile telephone network operator, are marked in the relative HLR 10 as persons with reduced auditory capacities. A dedicated synchronization would moreover be effective for two people with reduced auditory capacities who converse frequently via telephone.

In fact, an adaptative procedure of the voice frequencies, received and transmitted in real time and before the actual conversation, is proposed to both people of the conversation - calling party A and called party B. The dichotomous equalization process, as defined above, is not established on the basis of sample frequencies pre-stored at each radio base station or at a cell phone supplier store (specialized center managed by the network operator), with audiometric center functions. Rather, it is established on the basis of real sample frequencies sent from the telephone A to the telephone B and vice versa, and by acquiring the signal reproduced by the loudspeaker of the first telephone directly detected by the pair of differential microphones Ml , M2 of the same first telephone A and transferring it to the other telephone B, which evaluates it and compares the frequencies spectrum component in relation to the fact that a certain pre-stored standardized frequency data component is expected, while a deformed or distorted component may arrive instead, or it may have a noise component. Hence, in reality, it is the loudspeaker of the cell phone A that emits a series of sample sounds, each of which acquired as test sound to be sent to the other telephone B which evaluates their quality in real time and considers their possible filtering and/or amplification in relation to: i- the acoustic sensitivity curve of the listener in possession of the receiving cell phone; ii- the pre-stored reference spectral conformation; iii- the predictive and adaptive function that is programmed for considering the possible improvement on the second cell phone deriving from the frequency cuts and the amplifications made on the first cell phone.

These same operations will be carried out on the cell phone B in an analogous manner. By reversing the receiving and transmitting, one reaches a satisfactory level of comprehension of the speech without superimposed disturbances. For example, in order to establish the handshake step, before the actual conversation, the first cell phone reproduces the frequencies 250, 280 ..., 300, ... 800 Hertz etc. according to a standard pre-established sequence, and at the same time the receiving cell phone is synchronized for receiving said signals in the pre-established sequence. If a particular frequency is affected by deformations or noises on the receiving telephone, a selective filtering can be activated thereon through the equalization system. Analogously, if a particular component loses power over the course of the voice signal transmission, a selective amplification level is applied thereto.

It should be considered that the detection of the sample signal occurs in real listening and reproduction conditions. This since the signal emitted by the loudspeaker of the transmitting cell phone A is selectively acquired by the same differential microphone of the cell phone A which arranges its transmission.

The described equalization process is typically adaptive since an intervention on particular frequencies predetermined by the equalization algorithm is initially considered. In case of non-success in a first attempt to adjust the requests and frequency components, if a given frequency is still transmitted with distortion or significant noise, then one examines the fairly narrow neighbourhood around the frequency response itself in order to carry out a fine adjustment and to prevent important response frequency components from being cut from the transmitted voice signal, consequently substantially limiting the information content of the voice signal. After a series of tests localized on a particular Δf, the components to be reduced/ eliminated are empirically selected — such components could bring increased disturbance. Meanwhile, the cleanest frequencies are enabled for transmission within said interval Δf. A practical situation exemplifying the flexibility of the present finding could be that of two people with reduced auditory capacities who are speaking on the phone, on the street, close to a noisy compressor in continuous action - in such context, the equalization system according to the present invention eliminates the frequencies closest to the noise component produced by the compressor, thus obtaining a filtering of the typical frequency band of such disturbance. If one supposes that the noise conditions are variable over time (for example, the compressor is tied to a pneumatic hammer which is only occasionally actuated, producing additional disturbance signals with respect to the starting one), then it is clear that the conditions for equalizing the frequencies at play have changed. The technique according to the present invention provides for a real time scanning in order to evaluate if the equalization selection initially actuated is consistent with the type of noisy environment where the speaker is situated.

It is well known that the voice is digitized, sampled and transmitted on the cell phone with extremely reduced samplings. It would even be possible that one value per two thousand is sampled (one millisecond for every two seconds) in order to maintain the intelligibility of the speech and to only lose voice timbre. Thus, it will suffice to insert a detection interval in such sampling which allows monitoring a particular Δf for each Δt of time (for example every second), and actuating a verification with the receiving party (in accordance with the previously mentioned procedures) with regard to the quality of the voice signal that has arrived, and then executing a correction of the equalization. Otherwise, more effectively, the time intervals are used in which there is no speech. The principle is the same: the relative presence of noise perceived by the microphone is detected on a cell phone, and is sent to the called telephone for the relative equalization (also in this case by actuating a filtering of the background noise). Hence, in this solution, the sampling is actuated at every word termination. For example, the vocabulary normally present in the cell phone is used to write the SMS messages. It is well known that usually in mode T9, a functionality is present on an active word which each time facilitates the writing of the SMS. If it is presumed that the function of recognition and conversion of the speech into text is prearranged, the following procedure can then be activated: when it is presumed that the word has terminated, the detection is actuated of the presence of background noise defined as any input signal not attributable to the speech of the cell phone user. On the other hand, for the purpose of the sampling of the noise conditions in relation to the voice signal, it was in any case necessary to actuate a precise evaluation of the duration of the voice signal itself, so as to operate a suitable distinction between the spoken and non-spoken segments. The final aim of such evaluation is the automatic detection and optimization of the output signal. A series of solutions are also known which for the resolution of the "spoken/non-spoken detection" problem are based on an estimate of the energy of the considered signal. The starting point is always that the energy (and thus the power) of a spoken segment is greater than that of a non-spoken segment, and the essential step was the calculation of the absolute value of the samples. A high reliability of the sampling is thus attained, deriving from carrying out the sampled detection of the noise presence at the termination of every voice segment on the basis of two different synchronization instruments, such as: a) the operative steps related to the automatic detection of the termination of the voice signal, based on the principle that the energy and thus the power of the spoken segment is greater than that of the non-spoken segment; b) the utilization of a vocabulary already active on the cell phone, which interworks with an upstream operative functionality executing the speech-to-text conversion.

In other words, a time control is introduced both on the stay time of the voice signal and on the amplitude of the signal itself. With this mode, the comparison is actuated between the acquired signal and the background noise, in the sense that the criterion used states that if the energy of the signal is greater than that detected for the background noise, then the incoming signal can be interpreted as a voice segment. This prevents unusually substantial disturbances from being recognized as voice segments on the basis of the stay time of said signal. The result is that the solution according to the present invention 5 operates a continuous environmental detection, carried out over numerous samples not affected by voice but exclusively by environmental disturbances, from which an extremely reliable average of the spectral components of the disturbances present can be obtained. The interactive voice signal hearing support for users with acoustic deficiencies is based

10 on an operative mode of radio telephone type with functions such as "press for speaking, release for seeing the text" in which the call is activated by pressing on a dedicated key on the keyboard 32e, which operates as a call switch. By pressing the key, the user indicates that he wishes to speak and the terminal is arranged for acquiring the voice input from the microphone. On the other hand, by releasing the key, one will be able to

15. communicate with the person who is activating a forced synchronization pause in the conversation in order to view the relative conversation in a format of text type on the cell phone display 31. One solution for the quick conversion to text consists of the use of a voice recognizer, which are by now highly reliable and low cost. For example, speech recognizers are known, independent of the

20 speaker, which use a pronunciation dictionary: one obtains a sequence of text or phonemes from a speech on the basis of the received words. Based on the sequence of phonemes thus interpreted, a text sequence is generated - corresponding with the words- which must be transmitted as a message of textual type. For this solution, the use of the SMS server is also essential, such server being

25 prearranged for converting the original voice signals into SMS messages.

It is known that the messages are stored and transmitted by a message center, said SMS- Center 15 (SMSC). The message center is the electronic equivalent of the ordinary mail service, since it stores and forwards the messages as soon as they can be routed. It is also known that for managing the mobility of the users, the MSC component must continuously exchange information with a database, said Visitor Location Register (VLR) 14, which temporarily stores the information related to the MS (Mobile Station) A and B which are found in that area (identity of the IMEI user, MSISDN telephone number, authentication parameters, etc.). The MS in question are simply "visiting" the area served by the VLR 14. They can in fact move at any moment within the area served by another VLR.

Generally, it is known that when one sends an SMS message, the essential operations set in action are: • Routing Information Request: First, the SMSC 15 must recover the necessary information for the routing from the members' data bank, i.e. from the HLR, so as to be able to determine the cell to which the cell phone is connected. This process is carried out before message transmission.

• Point to Point Short Message Delivery: This is the procedure which sends the data of the SMS message from the Service Center to the MSC, which in that moment is connected to the cell to which the cell phone is connected.

• Short message waiting indication: This operation is activated when the SMSC 15 is unable to send the SMS message for any technical problem. In this manner, the Service Center requests the HLR to be added to the list of the SMSC in order to be informed when the cell phone is once again accessible.

• Service Center Alert: The HLR 10 informs the SMSC 15 that the cell phone towards which the transmission of an SMS message was previously attempted is now once again accessible and can be reached by the MSC.

It is clear that this sequence of procedures is extremely simplified in the currently illustrated application. First of all, no "Routing Information Request" is necessary, since the SMS which reports the speech-to-text conversion will be sent directly to the calling cell phone; the cell in which such phone is currently operating is clearly known. Also the "Short message waiting indication" has no reason to be activated, since the SMS which reports the speech-to-text conversion will be immediately sent to the calling party, who is waiting for the communication to be switched from voice to SMS text. Finally, the "Service center alert" will also not be activated since there is no need to evaluate if the cell phone returns accessible, since it was the cell phone itself which activated a call and for this reason is accessible throughout the entire signaling time.

A process is thus established that defines - in real time - from "where" (from which cell) and "between whom" the call is taking place, so that there is no need to once again query HLR 10 and/or VLR 14 with regard to the identification information of the SIM - mobile phones involved in order to be able to send them the SMS being written, which derives from the sound component of the speech. In addition, the text/speech synchronization can be "manually" managed by the person, who with the set key can suspend such synchronization, forcing a pause.

Another solution in which a combination is actuated of the voice-sound information with the corresponding text format information is that which provides, for GPRS terminals, the attainment of work sessions such as chat on TCP/IP protocol, so that at the same time that the voice signal is transmitted packetized (for example) in VOIP, a conversion of the speech into text by means of the already named voice synthesis is once again defined and there is the display on the screen of the PC being used. For such purpose, all the functionalities can be used that were already implemented in the relative session layer pertaining to the logical resources set for organizing the conversation between the submission entities. This in order to structure and synchronize the data exchange so as to be able to suspend, resume and terminate it in an orderly manner. The role of the session layer is essential since it has the task of masking any interruptions of the transport service as much as possible and maintaining a continuous logic in the evolution of the session connection.

The voice/text service, managed directly by the user, ensures that the telephone user does not notice possible temporary blocks or failures in the connection during communication, the event usually being perceived as a simple delay in the response of the remote system. Also in TCP/IP GPRS context, reference is made to any one voice recognition technique for the conversion of the speech into text.

Based on the described solutions, a series of embodiments are immediately applicable which are derived from apparatuses of audio type, such as: intercommunication and/or video door entry systems, interphones with one or more channels, headphones, intercommunication headphones (for free hands activities), voice machines and guides for instructions or for automatic responders.

On the other hand, there are many conditions in which a person with perfectly fine hearing would need to double check the communication underway, both visual and auditory, and in any case with maximum audio quality. For example, in many military activities, in situations where an operator receives environmental noise "in his headphones". This is the case for crane operators, for drivers of special transport vehicles, in conditions where one must identify a messenger/errand/delivery person, or generally for those who require access to a clearly-defined public or private area. The component sections and functionalities carried out for the latter applications are essentially those described above. The effectiveness of the use of a double microphone is clear - this out a differentiation of the input audio signals in order to separate the actual voice component from the random input disturbances. Analogously, it is essential to use a voice recognizer which translates the pronounced words in real time into a relative written text, with the possibility for the synchronized display - between the two parties - through a device of "press for display, release for speaking" type.

Also necessary is the adjustment and filtering in real time of the frequencies, dynamically amplified through a dedicated menu. At the same time, the use of algorithms is requested which allow effectively distinguishing and predicting the speech phases from the non-speech phases, in order to carry out the real time sampling of the environmental noise conditions - as random as they can be, subjected to environmental disturbances of various type.

The resolution of the "spoken/non-spoken detection" problem is still based on an estimate of the energy of the considered signal. The starting point is that the energy (and thus the power) of a spoken segment is greater than that of a non-spoken segment. The sampled detection of the noise presence at the termination of every voice segment is carried out on the basis of two different synchronization instruments: a- the operative steps related to the automatic detection of the termination of the voice signal based on the principle that the energy and thus the power of the spoken segment is greater than that of the non-spoken segment; b- the presence of a vocabulary already active on the interconnection device, which interworks with an upstream operative functionality executing the speech-to-text conversion.

The same use of the GSM/GPRS telecommunications networks, described above, can be easily integrated in the various just-mentioned embodiments; one need only think of an intercommunication system or an interphone which instead of functioning with a single direct or switched cable for the interconnection, interposes a connection in TCP/IP with the establishment of two parallel sessions, such as VoIP and Chat, between the two people. The complexity of the solution therefore allows attaining the advantages of the textual match corresponding with the speech.

Advantages and industrial use of the finding In summary, the special features of the techniques for equalizing the conversation according to the present invention greatly assist people with limited acoustic capacities, since they have:

- maximum adaptability to the auditory needs of the user;

- maximum sound reproduction quality and complete reprogrammability with a different fixed /mobile phone;

- application range for all hypacusia levels: slight, medium, serious and acute. The cell phone according to the present invention is an expert system capable of identifying the voice presence and consequently adapting the amplification modes of the apparatus. THe techniques for equalizing the audio signal in fixed and mobile telephony of the finding according to the present invention can be implemented not only on the basis of the classic audiometric reports (audiometry, vocal, impedancemetric, etc.) but also on the basis of the sound processing capacities of the patient, as can be detected in real time at the time of conversation.

This allows improving the comprehension of the voice in the presence of noise, and reducing the acoustic overstimulation in the absence of speech. The latter possibility is particularly important for reducing the acoustic fatigue of elderly people who frequent very noisy places. It is understood that the above description and the attached drawings are only intended to illustrate the present invention. For those skilled in the art of the field, it is obvious that the invention can be varied and modified also in other modes, without departing from the fundamental object of the invention as defined by the enclosed claims.

Claims

L A technique for equalizing the voice signal which is transmitted through a telephone channel, characterized by executing, at the receiving telephone (B), a processing procedure which evaluates the composition of the frequency spectrum of the signal received via the transmission channel in relation to the spectrum of the original signal transmitted by the calling telephone (A) and by executing the processing and equalization of the received signal by means of real time filtering of the frequencies composing the spectrum, followed by the selective attenuation of some frequencies and the selective amplification of others as a function of the current environmental noise conditions and the auditory qualities of the receiver.

2. A technique for equalizing the voice signal which is transmitted through a telephone channel according to claim 1, characterized in that it provides for a procedure for analyzing and processing the signal itself in real time, each time that a communication is established, the equalization being able to adapt to the specific environmental conditions at that moment.

3. A technique for equalizing the voice signal which is transmitted through a telephone channel according to claim 1, characterized in that it provides that, at each telephone, the procedure for processing the signal which operates on the composition of the spectrum is based on a parameter configuration established on the basis of the auditory characteristics of the user, executed just once at the time of the initial configuration of the telephone.

4. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that both calling persons (A,

B) of the conversation are made to cooperate in a procedure for adapting the voice frequencies that are received and transmitted in real time, before the actual conversation.

5. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it provides for the synchronization of the telephones possessed by users who both have limited auditory capacities, by prearranging the equalization on both phones in real time as a function of the noise and specific environmental disturbances of the particular communication underway, filtering and amplifying some speech frequencies, already in input, at the calling telephone (A) in relation to the fact that the communication occurs in noise conditions and that the called party has a reduced auditory capacity on specific frequencies of the voice signal spectrum.

6. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it supplies an aid to the hypacusic user both by arranging an equalization in real time which operates on the basis of the environmental noise conditions on particular frequencies, and by providing for a visual aid in text form which translates the voice signal into sequences of words in text format, which can be interactively viewed on the cell phone display.

7. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it carries out a sample detection in real time, during the same call by conducting the random sample detection of the noise presence at the termination of each voice segment, on the basis of two different synchronization instruments: i- the presence of a vocabulary already active on the telephone, which interworks with an upstream operative functionality executing the speech-to-text conversion; ii- the automatic detection of the termination, word for word, of the voice signal based on the principle that the energy and thus the power of the spoken segment is greater than that of the non-spoken segment; and then introducing a time control, both on the stay time of the voice signal and on the amplitude of the signal itself, based on the principle that only if the energy of the signal is greater than that detected for the background noise can the input signal be interpreted as a voice segment.

8. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that the aid based on the interactive display of a text corresponding to the voice signal for the user with acoustic deficiencies is managed with functions such as "press for speaking, release for seeing the related text" in which the call is activated by pressing on a dedicated key on the keyboard (32), which operates as a call switch, the synchronization in the conversation being made possible with the pressing or releasing of the key, such synchronization allowing the visualization on the telephone display (31) of the related conversion of the speech into text format.

9. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it carries out the combination of the voice-sound information with the corresponding data in text format, by means of the use of GPRS terminals with which, together with the speech communication function, an instantaneous message activity is also carried out such as chat or instant messaging on TCP/IP protocol, so that at the same time that the voice signal is transmitted packetized in VOIP, a conversion of the speech into text by means of voice recognition is carried out along with the relative transmission on parallel dedicated session.

10. An apparatus for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it uses a telephone device provided with an acoustic detector with high sensitivity comprising two microphones (Ml, M2) with relative transducers for converting the emitted sound signals into electrical output signals, separated from each other by a certain distance (d), such that one (Ml) acquires the voice which speaks on the phone plus the surrounding noise, while the other (M2) of directional type only acquires the noise coming from the surrounding environment, associating, with the pair of differential microphones (Ml, M2), differentiation means which allow acquiring an electrical input signal from each of the two microphones so as to produce, in response to the same, a difference signal which is proportional to the difference between the acoustic signals exiting from the corresponding microphones.

11. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it carries out an equalization of the signal based on the real sample frequencies sent from the first telephone (A) to the second telephone (B) and vice versa, and by acquiring the signal reproduced by the loudspeaker of the first telephone (A) directly detected by the pair of differential microphones of the same first telephone (A) and transferring it to the other (B), which evaluates it and compares the spectral component in relation to the fact that a pre-stored standardized frequency data component is expected, while a distorted component may arrive instead, or it may have a noise component.

12. A technique for equalizing the voice signal transmitted through a telephone channel according to each of the preceding claims, characterized in that it carries out the combination of the voice-sound information with the corresponding information in text format in devices and apparatuses of audio type such as intercommunication and/or video door entry systems, interphones with one or more channels, headphones, intercommunication headphones, voice machines and guides for automatic responders, also in conditions in which a perfectly capable user needs to double check the communication underway, both visual and auditory, and in any case requires maximum audio quality, and in which it is desired to have more information on the person who requests access to a clearly-defined public or private area, providing for the storage of the sequence of scanned texts in the speech-text conversion in order to determine the identification of the person and the authenticity of the supplied information.