AU2003100447A4

AU2003100447A4 - Automated transcription process

Info

Publication number: AU2003100447A4
Application number: AU2003100447A
Authority: AU
Inventors: Alan Sharpe
Original assignee: ACTIVECALL Pty Ltd
Current assignee: ACTIVECALL Pty Ltd
Priority date: 2002-06-26
Filing date: 2003-06-10
Publication date: 2003-08-07
Anticipated expiration: 2011-06-10
Also published as: AU2003100447B4

Description

P/00/011 Regulation 3.2B

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR AN INNOVATION PATENT Name of Applicant: Address of Applicant: Actual Inventor: Address for Service: Activecall Pty Ltd TMA House 87 Brown Street EAST PERTH WA 6004 Alan SHARPE Griffith Hack, Patent and Trade Mark Attorneys, 6th Floor, 256 Adelaide Terrace, Perth, Western Australia, 6000.

Complete Specification for the invention entitled: AUTOMATED TRANSCRIPTION PROCESS Details of Associated Provisional Applications: PS3208 filed 26 June 2003 The following is a full description of this invention, including the best method of performing it known to me:- 2 AUTOMATED TRANSCRIPTION PROCESS FIELD OF THE INVENTION [0001] The present invention relates to a process of automatically transcribing a voice message received by telephone into text and relates particularly, but not exclusively, to an automated transcription process for recording a caller's name and address details in textual format.

BACKGROUND TO THE INVENTION [0002] Companies and various organisations frequently advertise or promote their products and services on television, radio or in the print media and provide a telephone number that interested people can ring for further information or to register their interest. In order to cope with the sudden large influx of telephone inquiries precipitated by such promotions, the companies or organisations usually employ a call centre to receive the calls and transcribe the caller's details. The names and addresses are then supplied to the company or organisation to be used as a database of potential clients or interested persons to whom additional information is posted. For popular promotions even a large call centre may not have sufficient operators to cope with the number of callers during peak periods. Callers are then put on hold and must wait in a queue until the next available operator becomes free to take the call. This can be very frustrating for callers and many become tired of waiting and simply hang up.

[0003] Some call centres provide a voice message recording system whereby a caller can leave a voice message which is then recorded by the system. However, this system still requires someone to play back all of the voice messages at a later time and transcribe them into text using a typewriter or word processor. This can be time consuming and requires the employment of additional personnel. Furthermore, voice messages can sometimes be difficult to transcribe due to poor transmission quality or difficulties interpreting the caller's pronunciation of certain words. There is no way of checking in such cases whether or not the transcribed message accurately records the caller's intended voice message.

3 SUMMARY OF THE INVENTION [0004] The present invention was developed with a view to providing an automated transcription process that is able to automatically transcribe a caller's name and address details in a manner that minimises the need for human intervention. However, it will be apparent that the same process may be used in a variety of other applications such as, for example, registering and/or recording votes in a radio or television competition or as part of an answering service.

[0005] For the purposes of this specification it will be clearly understood that the word "comprising" means "including but not limited to", and that the word "comprises" has a corresponding meaning.

[0006] According to one aspect of the present invention there is provided a process of automatically transcribing a voice message received by telephone into text, the process comprising the steps of: prompting a caller to state a first part of the message in accordance with a predetermined protocol; converting the first part of the message into a digital format using speech recognition software; comparing said first part of the message in digital format with a predefined listing of possible matching message parts stored in a database in digital format; selecting a possible matching message part from the database, converting it into an audio signal format and playing it back to the caller for verification; if the caller confirms that the message part played back is the same as the first part of the message stated by the caller, then repeating steps to for a second or further parts of the message, if any; and, converting each part of the message into a text file based on said confirmed message parts whereby, in use, said text file can be exported into other applications for further processing on behalf of the caller.

4 [0007] Preferably, step of the process comprises the further steps of: (el) if the caller indicates that the message part played back is not the same as the part of the message stated by the caller, prompting the caller to spell the part of the message letter by letter; (e2) converting each letter spelled by the caller into a digital format using speech recognition software; (e3) comparing each letter in digital format with a listing of letters stored in a database in digital format; (e4) selecting possible matching letters from the database, converting the matching letters into an audio signal format and playing them back to the caller for verification; and if the caller confirms that the letters played back are the same as the letters spelled by the caller, then repeating steps to for a second or further part of the message, if any.

[0008] Typically, step of the process comprises the still further step of: (e6) if the caller indicates that the letters played back are not the same as the letters spelled by the caller, then manual transcription of the caller's information occurs.

[0009] In one embodiment of the process according to the invention, the voice message comprises the name and address of the caller. Typically the first part of the message is the caller's surname and a second part of the message is the caller's first name.

[0010] Preferably further parts of the message include: the postcode of the caller's address (ii) the street name of the caller's address (iii) the street number of the caller's address.

BRIEF DESCRIPTION OF DRAWING [0011] In order to facilitate a more comprehensive understanding of the nature of the invention, a preferred embodiment of the automated transcription process will now be described in detail, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a flow chart of the sequence of steps implemented in a preferred embodiment of the automated transcription process in accordance with the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENT [0012] The automated transcription process according to the invention is preferably implemented using a computer system connected to a telephone system that is capable of handling multiple incoming calls. The computer system incorporates state-of-the-art voice recognition software and is able to access one or more databases used for matching of incoming information. The entire process is fully automated and controlled by computer software up until the point where a caller may be switched to an operator, after which point the process may be completed manually.

[0013] The illustrated embodiment relates to an automated transcription process designed to transcribe the caller's name and address details into textual format, which can then be exported into other software applications for further processing on behalf of the caller. For example, word processing software may automatically generate a cover letter and address label using the name and address details in a mail merge, which is then printed and posted to the caller together with a product brochure.

[0014] A preferred embodiment of the automated transcription process will now be described with reference to Figure 1.

[0015] The system answers an incoming call from a caller at 100 and delivers the 6 appropriate thank you and welcome message. It then requests the caller to state their surname at 102. The system then coverts the surname into a digital format using speech recognition software and compares this first part of the message in digital format with a predefined listing of possible matching surnames stored in a name database at 104. A core speech recognition engine employed in the system typically divides the surname into separate syllables or phonetic units during the process of converting it into digital format. The database of names likewise incorporates a predefined listing of surnames broken down into identifiable combinations of phonetic units. The generation of this database requires considerably expertise in order to enable the speech recognition engine to recognise particular names according to their phonetic combinations, taking into account variations in pronunciation and accents. This database can also be periodically expanded and updated as required.

[0016] The system then selects a possible matching surname from the name database at 106 and then converts it into an audio signal format ready for playback. The system also requests the caller state their first name at 108, and then performs a similar process of converting the first name into a digital format using the speech recognition software, comparing the first name in digital format at 111 with a listing of possible matching first names stored in a name database in digital format, and selecting a possible matching first name at 112 from the database before converting it into an audio signal format ready for play back. Both the first name and the surname are then played back to the caller at 114 for verification. If the caller confirms at 116 that both the first name and the surname are correct, then essentially the same process is repeated for the second part of the message, which in this case is the caller's postcode and suburb which will be described further below.

[0017] However, if the caller indicates at 116 that one or both the names are incorrect, the system asks the caller at 120 which names are incorrect and then requests the caller to spell the name letter by letter at 122. The system then converts each letter spelled by the caller into a digital format using speech recognition software and compares each letter in digital format with a listing of letters stored in a database in digital format. It 7 selects possible matching letters from the database, converts the matching letters into an audio signal format and then plays the letters back to the caller for verification at 124. If the caller indicates that the spelling is correct at 126, the system then repeats the same process for the second name if necessary, before requesting the caller to state the second part of the message at 118. However, if the caller indicates at 126 that the spelling of the name is incorrect, the system then switches the caller to an operator who will then manually transcribe the information so as to complete the call. Alternatively, the caller's details are stored as an audio file, which may be used at a convenient time to later manually transcribe the information. A choice of either of the alternatives may be given.

[0018] The caller is requested at 118 to state the postcode of the suburb in which they live. The system then converts the postcode into a digital format using speech recognition software and then compares the digitised postcode with a listing of possible matching postcodes stored in a database in digital format. Alternatively, the caller is requested at 118 to punch in their postcode using the number keys on their telephone handset. The system then selects the closest possible matching postcode from the database and matches this with the corresponding name of the suburb. The name of the suburb is converted into an audio signal format and played back to the caller at 128 for verification. If the caller confirms at 130 that the name of the suburb is correct, the system then proceeds to prompt the caller to state the next part of the message at 132, which in this case is the street name of the caller's address. However, if the caller indicates at 130 that the name of the suburb is incorrect, the system requests that the caller at 134 to spell the name of the suburb letter by letter, similar to 122 above. If after repeating this process the system is still unable to match the correct suburb, the caller is then switched to an operator for manual completion of the call.

[0019] The caller is requested at 132 to state the street name of their address. When the caller states the street name at 134 the system converts it into a digital format using speech recognition software and then compares the street name in digital format with a predefined listing of possible matching street names stored in a database in digital 8 format. If it is able to select a possible matching street name from the database at 136, it then converts it into an audio signal format and plays it back to the caller at 138 for verification. If the caller confirms that the street name is correct at 140 it then proceeds to the next stage in the process and prompts the caller to provide the next part of the message at 146, which in this case is the house number of the caller's address (see Figure However, if the system is unable to find a matching street name in the database at 136, the caller is requested to spell the street name letter by letter at 142. The system then performs a similar process to that described above of matching each of the letters converted to digital format with corresponding letters stored in the database in digital format. Each of the possible matching letters of the street name are then played back to the caller at 144. If the caller confirms at 148 that the spelling of the street name is correct, the system then proceeds to request the caller to provide the house number at 146. However, if the caller indicates at 148 that the spelling of the street name is incorrect, the system switches the caller to an operator who will complete the process of transcribing the required information manually.

When the caller states the house number at 150, the system converts the number into a digital format using the speech recognition software, and compares the digitised house number with a predefined listing of possible matching numbers stored in a database in digital format. A possible matching number is selected from the database and converted into an audio signal format so that it can be played back to the caller at 152 for verification. Alternatively, the called is required at 150 to punch in the street number using the number keys on their telephone handset. If the caller indicates at 154 that the number is incorrect, the caller is then requested to restate the house number at 156. The process of conversion, comparison, selection and playing back the number at 152 is then repeated. After two rejections, the caller will be switched to an operator for manual completion of the transcription process.

If the caller confirms at 154 that the house number is correct, the full street address in 9 digital format is checked against an address database at 158. When the full street address has been located in the database it is played back to the caller at 160 for verification. When the caller has confirmed at 162 that the full street address is correct, both the name and address are stored at 164 in digital format as a text file that can be exported into other applications for further processing on behalf of the caller. However, if the caller indicates at 162 that the full street address is incorrect, the caller is switched to an operator at 166 who will complete the transcription process manually.

[0020] Now that a preferred embodiment of the automated transcription process has been described in detail, it will be apparent that it provides some significant advantages, including the follow: it can greatly expand the capacity of a call centre without the need for additional human operators; (ii) it can help to eliminate the need to place callers on hold for extended periods of time, which is frustrating for callers and bad for business; (iii) it can be readily modified to transcribe a wide variety of voice messages, including messages in languages other than English.

[0021] Numerous variations and modifications will suggest themselves to persons skilled in the telecommunications, computer and speech recognition arts, in addition to those already described, without departing from the basic inventive concepts. For example, when the system asks a caller to spell out a word or words letter by letter, the caller may be given the option of keying in the letters using their telephone hand set, if they have an alpha numeric keyboard on their telephone, in addition to or instead of spelling out each letter verbally. Similarly, when requested to provide the postcode of their suburb and their street number, the caller may be invited to key in the number using their telephone keypad, instead of or as well as stating the postcode verbally. All such variations and modifications are to be considered within the scope of the present invention, the nature of which is to be determined from the foregoing description and the appended claims.

Claims

1. A process of automatically transcribing a voice message received by telephone into text, the process comprising the steps of: prompting a caller to state a first part of the message in accordance with a predetermined protocol; converting the first part of the message into a digital format using speech recognition software; comparing said first part of the message in digital format with a predefined listing of possible matching message parts stored in a database in digital format; selecting a possible matching message part from the database, converting it into an audio signal format and playing it back to the caller for verification; if the caller confirms that the message part played back is the same as the first part of the message stated by the caller, then repeating steps to for a second or further parts of the message, if any; and, converting each part of the message into a text file based on said confirmed message parts whereby, in use, said text file can be exported into other applications for further processing on behalf of the caller.

2. A process of automatically transcribing a message as defined in claim 1, wherein step of the process comprises the further steps of: (el) if the caller indicates that the message part played back is not the same as the part of the message stated by the caller, prompting the caller to spell the part of the message letter by letter; (e2) converting each letter spelled by the caller into a digital format using speech recognition software; (e3) comparing each letter in digital format with a listing of letters stored in a database in digital format; (e4) selecting possible matching letters from the database, converting the matching 11 letters into an audio signal format and playing them back to the caller for verification; and if the caller confirms that the letters played back are the same as the letters spelled by the caller, then repeating steps to for a second or further part of the message, if any.

3. A process of automatically transcribing a message as defined in claim 1, wherein step of the process comprises the still further step of: (e6) if the caller indicates that the letters played back are not the same as the letters spelled by the caller, then manual transcription of the caller's information occurs.

4. A process of automatically transcribing a message as defined in claim 1, wherein the voice message comprises the name and address of the caller and said database has the names and addresses of all persons in a defined geographical area stored in digital format. A process of automatically transcribing a message as defined in claim 4, wherein the first part of the message is the caller's surname and a second part of the message is the caller's first name and the further parts of the message include: (iv) the postcode of the caller's address the street name of the caller's address (vi) the street number of the caller's address. 12 Dated this 10 th day of June 2003 ACTIVECALL PTY LTD By Their Patent Attorneys GRIFFITH HACK Fellows Institute of Patent and Trade Mark Attorneys of Australia