MXPA97010468A

MXPA97010468A - System and method for enhanced intelligibility of voice messages

Info

Publication number: MXPA97010468A
Application number: MXPA/A/1997/010468A
Authority: MX
Inventors: Meier Rochkind Mark
Original assignee: At&T Corp
Priority date: 1996-12-31
Filing date: 1997-12-19
Publication date: 1998-10-30

Abstract

A system and method is provided for playing back a recorded voice message, and, in particular, for automatically playing back a spoken numeric portion of the message at a rate that is slower than the rate for playing back the remaining portion of the recorded voice message. A voice messaging system receives and analyzes the voice message. Specifically, the messaging system determines whether the voice message includes spoken numeric information and, if so, determines the relative position of the spoken numericinformation within the message.the computer systemástores both the voice message and the positional information in a storage device. Upon message and the positional information in a storage device. Upon playback of the message, the messaging system retrieves the stored voice message and positional information from the storage device. As the voice message is played back, the messaging system processes the positional information. When the positional information indicates that a particular portion of a voice message includes spoken numeric information, that particular portion is played back at a decreased speed.

Description

SYSTEM AND METHOD FOR IMPROVED INTELLIGIBILITY OF VOICE MESSAGES Field of the invention The present invention is directed to a system and method for improved intelligibility of a voice message. More particularly, the present invention relates to the reproduction of spoken numerical information at a rate that is slower than the rest of the voice message.

BACKGROUND OF THE INVENTION Voice message systems are common today.

Many companies have private voice mail systems built into their local telephone network. Additionally, many households have telephone answering machines. Even telephone companies offer voice messaging services. Today's voice messaging systems offer users a variety of playback options. Users can return the message a few seconds or advance the message a few seconds. Users can also increase the playback speed. At faster speeds, messages can be reviewed at a higher speed, thereby increasing efficiency. While most of a particular message will normally be intelligible at an increased playback speed, such as for example 1.75 times the normal speed, certain portions of the message may be unintelligible at that speed. The numerical information can be particularly difficult to understand at an increased speed. Even at normal speeds, the REF: 25897 numerical information that is unfamiliar to the listening user may be difficult to understand. One reason for the unintelligibility of numerical data is that many people tend to quickly pronounce familiar numbers, such as telephone numbers. For example, when a message is recorded or recorded, many people speak slowly when making their sentences. However, when these same people pronounce a phone number that is familiar to them, the speed of their speech increases. Therefore, when the message is reproduced, the listening user may have difficulty understanding the numerical information and may be required to replay the message several times before he or she adequately understands the details of the entire message. Even if a listening user can understand the information the first time it is played, the user may be writing the number and thus may need the numerical information to be played back at a slower speed. Existing voice messaging systems provide users with the ability to increase and / or decrease the playback speed of a message. One such system is described in U.S. Patent No. 5,386,493, issued to Degen et al., Entitled "Apparatus and Method For Playing Back Audio At Faster Or Slower Rates Without Pitch Distortion". expressly incorporated herein by reference. However, in such systems, the entire message is played at the selected playback speed. Thus, if the listening user wishes to decrease the speed of reproduction of a telephone number, the reproduction speed of the entire message is decreased. By manual control, a user could accelerate and slow down the playback speed of a message as it is played.

BRIEF DESCRIPTION OF THE INVENTION The present invention consists of a system and method for reproducing a recorded voice message and in particular, for reproducing a numerical spoken portion of the message automatically, at a rate that is slower than the speed for reproducing the remaining portion. of the registered voice message. A voice messaging system receives and analyzes the voice message. Specifically, the messaging system determines whether the voice message includes spoken numeric information and if so, determines the relative position of the numerical information spoken within the message. The messaging system stores the voice message and the position information in a storage device. After the reproduction of the message, the computer system retrieves the stored voice message and the position information of the storage device. As the voice message is reproduced, the computer system processes the position information. When the position information indicates that a particular portion of a voice message includes numeric spoken information, that particular portion is reproduced at a decreased rate. The method for determining position information is included as part of this invention.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a diagram of the system, of an exemplary embodiment of the present invention; Figure 2A is a flow chart of the overall logical flow of the exemplary mode, in which a caller leaves a message for a called party; Figure 2B is a flow chart of the overall logical flow of an exemplary embodiment, in which a user accesses his mailbox; Figure 3 is a flowchart of the processing and recording stage of Figure 2A and 2B; Figure 4 illustrates an exemplary message and series of binary digits of the corresponding position; Figure 5 is a flowchart of the option adjustment step of Figure 2B; and Figure 6 is a flow diagram of the processing and reproduction step of Figure 2B.

DETAILED DESCRIPTION Referring now to the drawings and initially to Figure 1, a system diagram of an exemplary embodiment of the present invention is illustrated. A telephone switching system 110 selectively establishes communication connections between the call stations 120 (e.g., telephones, computer work stations, facsimile machines) and between the call stations and a voice messaging system 130. A call station 120 is connected to the voice messaging system 130, as a result of, for example, i) the call station 120 directly dialing a telephone number of the messaging system 130 or ii) the call station 120 fails in establish a connection with another call station whose service is provided by the messaging system 130 (for example, the other call station may be busy or the other call station does not "pick up", that is, it does not answer in the course of a predetermined amount of time). The switching system 110 may be for example a secondary telephone exchange, a telephone switching center (exchange, end station) or a local business telephone system and is generally well known in the prior art. In an exemplary embodiment, the messaging system 130 consists of a processor 131 in communication with an audio unit 132 (to generate audio signals to the callers, under the control of the processor 131), an analog to digital converter (A / D) 133, a dynamic memory 134 (for example RAM) and a storage device 135, such as for example a disk array. Referring now to the flow chart of Figure 2A, the overall logical flow of an exemplary mode is described, wherein a caller leaves a message for a called party whose telephone device is either busy or does not answer. The technique by which a messaging system connects to a caller, who has been redirected to the messaging system by the busy or no-answer condition, with the voice-mail of the called party, is well known. After connecting a call station 120 to the messaging system 130 (in particular, the voice mailbox of the called party), the messaging system 130 plays a recorded indication (by means of the audio unit 132) to the station. call 120, which invites the caller to register a message for a particular user (step 210). The caller responds by speaking a message in the telephone microphone in the call station 120. The caller who leaves a message can include in the message the caller's telephone number, in such a way that the person for whom the message is left can answer the call. The messaging system 130 processes the message as the message is entered and recorded or recorded on a storage device 135 (step 212). In particular, the message is stored as "mail" in the mailbox of the called party. Then the caller can hear the message, re-register the message (step 230) or disconnect from the messaging system (step 240). In the flow diagram of figure 2B, the total logic of an exemplary mode is illustrated, where a user accesses his mailbox for the purpose of sending or retrieving messages. In this instance, the user communicates by telephone with the messaging system 130, to connect the call station 120 of the user to the messaging system 130. First the messaging system executes a sequence of communication input procedure (step 215) by which the user identifies himself with the system. The user is then offered a menu of options (step 225). The user may choose to send a message (steps 220 and 239), listen to recorded messages left or sent by others (steps 227, 237, 247 and 249) or engage in the execution of other mailbox options, such as personalization (step 235) ). Instead of executing any option or after becoming involved in the selected options, the user can be disconnected (step 255).

If the user chooses to send a message, the messaging system asks the user for the address (es) to which (s) the message will be sent (step 229) and then asks the user to mention the message that is going to leave. The message is processed and registered (step 239). Then the user can be involved in any of the menu options offered (step 225). If the user chooses to listen to the messages left or sent by others, the user selects from the inventory of stored messages (step 227) and can optionally choose the reproduction options, such as acceleration and braking which determine the reproduction (steps 237 and 247) . After adjusting the playback options or instead of doing this, the messaging system retrieves the selected message and processes it for playback (step 249). If the user selects to adjust the reproduction options, such as the speed at which the user's messages are reproduced (step 249), the messaging system adjusts the options according to the user's preferences (step 247).

Recording or Registration: The flow chart of Figure 3 shows the processing and recording (or registration) stage (212 or 239) in more detail. The messaging system 110 receives the caller's voice message in the form of a message signal. The message signal is converted to a digital signal comprising audio samples by the A / D converter 133 if the signal is no longer in a digital form (step 320) and then stored in a temporary or intermediate memory (step 330). ). The temporary or intermediate memory may be located in the dynamic memory 134, in the storage device 135 or in a combination of both. Then, the messaging system 130 analyzes the voice message in terms of spoken numerical information. Specifically, the system 130 determines whether spoken numbers (such as, for example, series of digits) are within the voice message and if so, determines the relative position of the numbers spoken within the voice message. In exemplary mode, system 130 investigates the message in terms of spoken numbers, such as series of individual digit numbers spoken, multi-digit numbers and combinations of both (step 340). The series of spoken digits includes, for example, "one-two-three". The multi-digit numbers include "thirteen" and "one hundred". A combination would include for example "one eight hundred" and "twenty seven". In an alternative embodiment, the system 130 could also investigate numbers of individual spoken digits at the expense of an increased error rate. For example, system 130 may have difficulty distinguishing between homophones such as "two", "for" and "also", "four" and "for" and "for" or "eight" and "as". When the processor 131 within the messaging system 130 is sufficiently fast, it may not be necessary to store the digital message signal in temporary memory (step 330). In such a case, step 340 would follow directly from step 320. The analysis of the voice message may be carried out by using any of a variety of speech recognition and speech configuration recognition techniques. For example, stored templates consisting of samples of voice signals of spoken numbers could be compared to portions of the voice message, a correspondence indicating the presence of a spoken number within the voice message. Other techniques are described in the North American patent No. ,509,104, issued to Lee et al., Entitled "Speech Recognition Employing Key Word Modeling and Non-Key Word Modeling "and U.S. Patent No. 4,783,804, issued to Juang et al., Entitled" Hidden Markov Model Speech Recognition Arrangement ", both expressly incorporated herein by reference, for each audio sample that includes At least a portion of a spoken number, the messaging system associates a "1" with that sample., since a complete spoken number generally consists of at least several audio samples long, a series of several consecutive samples will be associated with that spoken number. For each of the other audio samples, the system will associate a "0". Therefore, a series of digits of zeros ("0") and ones ("1"), that is, a series of binary digits (or signal) of position is associated with each processed message (350). In an alternative mode, a "0" could be associated with the remaining audio samples. In addition, a single bit could be associated with a plurality of audio samples. Once the entire message is processed, the digital message signal and the series of associated position binary digits are compressed and then stored in an appropriate location (mailbox) in the storage device 135 (step 360). In the exemplary mode, messages are compressed before they are stored, due to the economy they produce.

When the user is a caller who wishes to leave a voice message for a called party whose telephone device is either busy or does not answer, the mailbox is the mailbox of the called party. When the user is a caller who wishes to send a voice message to another party, the mailbox is the caller's mailbox. Alternatively, the message signal and the series of binary position digits can be stored in a general-purpose database along with the telephone number (mailbox number) so that they can be retrieved by providing the appropriate telephone number. (mailbox number) to the administration or control system of the database. In an alternative embodiment, the position information may simply comprise the relative start and end positions of the numeric information within the message, in terms of numbers of audio samples (eg, initiol = sample 12000, fini = sample 16000, start2 = sample 30000, end 2 = sample 30300) or in terms of relative time (for example initiol = 32.2 seconds, finí = 40.5 seconds). Figure 4 shows the text of a sample message 410 and a series 420 of corresponding position binary digits. The series of binary digits 420 of the position includes a subset of ones (430) corresponding to the audio samples that include at least a portion of a spoken number.

Adjustment of options: The flow diagram of figure 5 provides details of step 247 of adjustment of options of figure 2B. In the exemplary embodiment of the present invention, the user is allowed to adjust two options. Specifically, the user can enable or disable the "enhanced intelligibility mode" and the user can also adjust the message playback speed (eg, 1.25 x normal, 1.5 x normal, 1.75 x normal, etc.). After enabling the "enhanced intelligibility mode" any spoken number detected in the messages played back to the user will then automatically be reproduced at a slower speed than the rest of the message. The default settings are the "enhanced intelligibility mode" enabled and the playback speed of the message is set to "1", that is, 1 x (times) the normal speed. Even when the playback speed is set to 1 x normal, the "improved intelligibility mode" will cause the series of digits and numbers set to be reproduced at a slower speed, such as 0.75 x normal. For disk phones in exemplary mode, the default options are always active. In an exemplary embodiment of the present invention, the messaging system 130 asks the user to change the predetermined options (step 510). If the user chooses to change the predetermined options by indicating "yes" in response to the indication in step 510, the messaging system 130 will ask the user to disable the "enhanced intelligibility mode". If the user chooses to do this, a flag will be properly re-established (step 525) and the messaging system will request a change of the playback speed (step 530). The user can choose to modify the playback speed or not. If not, the user goes to step 540 and leaves the option processing via step 550. If the user chooses to modify the playback speed, this is carried out in step 535. Once changed at a rate of, for example, 1.5 x normal or 0.75 x normal, the playback speed remains at this speed set by the rest of the session or until the speed is changed one more time. The user then proceeds through step 540 and exits option processing via step 550. In an alternative embodiment of the present invention, the calling user may be given the option of directly adjusting the playback speed. For example, the caller can be allowed to press "075", "150" or "125" indicating playback speeds of 0.75, 1.50 and 1.25 x normal speed respectively. Regardless of how the choice of playback speed is indicated, the playback speed can be adjusted such that it persists through the sessions for a particular user's mailbox. In such modality, the user does not need to be involved in the processing of options in each session. The user can indicate responses to requests by pressing a button on the keyboard (to generate a DTMF signal), by issuing a response (and the use of Automatic Speech Recognition) or via some other processing scheme.

Reproduction: The flow chart of Figure 6 provides the details of the process and reproduction step 249 of Figure 2B, which is carried out for each message. If a user chooses to play a particular message (step 227), the messaging system 130 first retrieves the stored digital message signal and the series of associated position binary digits that were previously stored for that message (step 610). Additionally the "improved intelligibility" flag and the playback speed stored by the caller are retrieved (step 610). Next, the "improved intelligibility" flag is tested (step 620). If the flag of "improved intelligibility" is set to "disabled", then the entire message is decompressed, if compressed and played back to the caller by means of the audio unit 132 at the recovered playback speed (step 630). U.S. Patent No. 5,386,493 describes a method for reproducing messages at slower or faster rates without distorting the tone (for example, the "squirrel" effect is eliminated when the messages are reproduced at a high speed). However, if the flag of "improved intelligibility" is set to "enabled", the series of binary digits of the position is processed (step 640). Specifically, the processor 131 of the messaging system 130 analyzes the series of bits of binary digits of the position to determine if the series of binary digits includes some bit set to "1" (indicating the presence of numbers spoken in the message) . Otherwise, the entire message is decompressed, if compressed and played back to the caller by means of the audio unit 132 at the recovered playback speed (step 630). If the series of binary digits of the position includes bits set to "1", the processor 131 causes the messaging system 130 to sequentially reproduce each of the audio samples, wherein the audio samples corresponding to the zeros in the series of binary digits of the position are reproduced through the audio unit 132 at the recovered playback speed, while the audio samples corresponding to the ones are reproduced by means of the audio unit 132 at a slower speed than the recovered reproduction speed (step 650). The slower speed may be for example a predetermined speed by the messaging system 131 (fixed or a function of some other parameter, such as the recovered playback speed) or alternatively it may be set by the user. As "1" (ones) and "0" (zeros) are found in the series of binary digits of the position and as the reproduction speed is correspondingly decreased or increased, changes in speed can be carried out when using some restriction function, in such a way that the effect is uniform and not annoying. For example, the speed can be increased or decreased gradually.

Other alternative embodiments: While the present invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and detail may be made therein., without deviating from the spirit and scope of the invention. For example, a digital message may be received and stored in the storage device 135 and then processed to the real-time playback time 131. This would require specialized circuits and / or a messaging system 130 having a powerful processor (or multiple processors). ). When the detection of the series of numbers spoken for the first time during the reproduction of the message is carried out, no information of the position needs to be stored. In exemplary mode, the detection of the series of digits, the series of numbers or series of combinations of digits and numbers is carried out before the message is compressed. However, the spoken numbers could be detected after a message is compressed first, either by processing the message in compressed form or by processing it after compression. The present invention could also be incorporated into a domestic answering machine, provided sufficient computing power is available (at least for the speech analysis portion of the system). A less powerful processor could be used if the message is processed in a batch mode. It is noted that in relation to this date, the best method known to the applicant, to carry out the said invention, is that which is clear from the present description of the invention. Having described the invention as above, it is claimed as property, what is contained in the following

Claims

Claims 1. A method for recording and reproducing a voice message, a first portion of the voice message includes at least one spoken number of interest and a second portion of the voice message, wherein the second portion is devoid of all spoken number. of interest, characterized in that it comprises the steps of: (a) receiving the voice message; (b) detecting the first portion within the received voice message; (c) recording the voice message in a storage device; (d) recovering the voice message from the storage device; (e) reproducing the second portion at a first speed; and (f) reproducing the first detected portion at a second speed, wherein the second speed is slower than the first speed.
2. The method according to claim 1, characterized in that it further comprises the steps of: (g) detecting a position of the first portion within the received voice message; (h) storing information related to the position detected in the storage device; (i) recover the stored information from the storage device; and (j) carrying out steps (e) and (f) as a function of the recovered information.
3. The method according to claim 2, characterized in that the voice message comprises a plurality of audio samples and wherein the information is stored as a series of binary digits, each bit is associated with at least one of the plurality of samples audio within the voice message.
4. The method according to claim 1, characterized in that the spoken number consists of series of numerical digits.
5. A method for reproducing a voice message, a first portion of the voice message includes at least one spoken number of interest and a second portion of the spoken message, wherein the second portion is devoid of any spoken number of interest, characterized in that it comprises the stages of: (a) receiving the voice message; (b) detecting the first portion within the received voice message; (c) reproducing the second portion at a first speed; and (d) reproducing the first detected portion at a second speed, wherein the second speed is slower than the first speed.
6. The method according to claim 5, characterized in that it further comprises the steps of: (e) detecting a position of the first portion within the received voice message; (f) generating a position signal as a function of the detected position; and (h) carrying out steps (c) and (d) as a function of the position signal.
7. The method according to claim 5, characterized in that the received voice message consists of a plurality of audio samples and wherein the signal of the position consists of a plurality of bits, each bit is associated with at least one of the plurality of audio samples.