GB2523821A - Method of record keeping and a record keeping device - Google Patents

Method of record keeping and a record keeping device Download PDF

Info

Publication number
GB2523821A
GB2523821A GB1404033.1A GB201404033A GB2523821A GB 2523821 A GB2523821 A GB 2523821A GB 201404033 A GB201404033 A GB 201404033A GB 2523821 A GB2523821 A GB 2523821A
Authority
GB
United Kingdom
Prior art keywords
record
entry
communication device
operable
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1404033.1A
Other versions
GB201404033D0 (en
Inventor
Stephen Andrew Humphries
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Europe Ltd
Original Assignee
Sony Computer Entertainment Europe Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Computer Entertainment Europe Ltd filed Critical Sony Computer Entertainment Europe Ltd
Priority to GB1404033.1A priority Critical patent/GB2523821A/en
Publication of GB201404033D0 publication Critical patent/GB201404033D0/en
Publication of GB2523821A publication Critical patent/GB2523821A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/638Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/109Time management, e.g. calendars, reminders, meetings or time accounting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Computer Interaction (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A communication device 100 processes speech in a two-way voice call via Speech Recognition (SR, 170) which identifies words or phrases which are relevant to a data-entry template and pass them to a record-keeping application 180 which assembles a record entry 190. Aspects of the invention include triggering an alert if a time conflict is detected from keywords linked to a schedule calendar or diary, and displaying the assembled record on a wearable display (500, fig. 1).

Description

I
METHOD OF RECORD KEEPING AND A RECORD KEEPING DEVICE
The present invention relates to a method of record keeping and a record keeping device.
Voice commands on mobile devices are becoming popular. For example, recent versions of the Google ® Android ® operating system allow for voice control of certain applications, as listed here: Hence For example it is possible to use a predetennined instruction fbllowed by specific details to create a diary entry in the mobile device. The example given in the link above is to use the predetermined instruction "Create a calendar event" followed by three details in sequence: an
event description, the day/date, and the time.
However, whilst using speech in this way instead of a keyboard can be convenient, there is still scope to improve the utility of speech inputs to mobile devices.
The present invention seeks to improve such utility.
In a first aspect, a communication device is provided in accordance with claim 1.
In another aspect, a communication system is provided in accordance with claim 0.
In another aspect, a method of record keeping is provided in accordance with claim 11.
Further respective aspects and features of the invention are defined in the appended claims.
Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which: -Figure 1 is a schematic diagram of a communication system in accordance with an embodiment of the present invention, shown in conjunction with other communication elements; -Figure 2 is a schematic diagram of a communication device in accordance with an embodiment of the present invention; and -Figure 3 is a flow diagram of a method of record keeping in accordance with an embodiment of the present invention.
A method of record keeping and a record keeping device are disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to a person skilled in the art that these specific details need not be employed to practice the present invention, Conversely, specific details known to the person skilled in the art are omifted for the purposes of clarity where appropriate.
With reference to Figure 1, a mobile phone (such as a so-called smart phone') 100 is operable to make a call via one or more networks 300 to another phone 200, which may or may not also be a smart phone.
The call process typically comprises the mobile phone 100 uploading audio packets 101 to the receiving network, and downloading audio packets 102 from the receiving network. The downloaded packets are reassembled to generate an audio signal for the user of the mobile phone. Typically, the quality of the audio signal downloaded in this manner is poorer than the quality of audio obtained from the user via the microphone 110 of the mobile phone.
The or each network routes the packets between the phones. Audio packets 101 uploaded from the mobile phone are hence received as downloaded packets 201 by the other phone. Meanwhile, the other phone uploads voice packets 202 for eventual download to the mobile phone.
In this manner, two-way (duplex) communication between the two phones is possible, enabling a conversation between the local user of the mobile phone and a remote correspondent on the other phone.
Referring now to Figure 2, this shows a schematic diagram of features of the mobile phone 100.
The mobile phone receives audio from the microphone 110 (shown in Figure 1) and processes it using an audio-in processor 120, which may for example comprise an analogue to digital converter. The digitised audio data is then passed (possibly indirectly, for example via the central processor 150) to the modulator/demodulator 130, which packetizes the audio data, modulates it on an RF signal and sends it for transmission via antenna 135, as audio packet stream 101.
Meanwhile, the mobile phone receives at the antenna an audio packet stream 102 from the network, which is passed to the modulator/demodulator to demodulate and depacketise. The resulting audio data is passed (similarly possibly indirectly) to an audio-out processor 140, which may for example comprise a digital to analogue converter. The resulting analogue signal is then output via an integral speaker or connected ear piece (not shown) or retransmifted to a wireless earpiece via a second transmission scheme such as Bluetooth ® (not shown).
The modulator/demodulator and communications with the network are controlled by a call control processor 160, which may be part of a central processor 150. It will be appreciated that the audio in and out processors may also be part of the central processor 150, as may some or all of the modulator/demodulator unit, for example as part of a so-called system-on-a-chip.
lii an embodiment of the present invention, the central processor is operable to run a record- keeping application 180, which may be stored in the phone's memory (not shown). The record-keeping application may, as non-limiting examples, be a calendar/diary application, or a To Do list application. As such, the application may hold or create one or more record entries 190.
In addition, the central processor (or a co-processor, not shown) is operable to mn a voice recognition application 170.
In an embodiment of the present invention, the voice recognition application receives local digital audio data originating from the audio-in processor, originating from the user of the mobile phone, and also receives remote digital audio data originating from the correspondent on the other phone. It will be appreciated that the local and remote audio data may be routed to the voice recognition application via any reasonable data path within the mobile phone.
In an embodiment of the present invention, the voice recognition application operates on the local and remote audio signals separately, to avoid recognition problems caused by speech overlap between the parties. However the relative ordering or timing of detected speech from both parties can be used when determining the context of key words and phrases.
Typically the speech in each of the audio signals is turned into a so-called Mel-cepstmm, in which successive segments of the audio are converted to the frequency domain using a fast Fourier transform (FFT), and then the values at each frequency are collated into a series of overlapping bins of increasing bandwidth as they progress up the frequency scale. The bands follow the so-called Mel scale. This serves two purposes -firstly, it removes substantially all characteristics of vocal pitch from the formant structure of the speech (reducing unhelpful variability in detection), and secondly, it tends to more evenly distribute the discriminatory ability of the data between the generated bins, as compared with the raw FFT values.
The overlapping bins (for example, 11 or 19 bins) are then FFT'd again, to generate the so-called cepstrum (a spectrum of a spectrum). This transforms the Mel-warped formant structure of the speech into its constituent frequency' components, and hence is known as the Mel-cepstrum.
Typically the first H or 19 Mel-ceptrum coefficients, typically together with their first derivatives with respect to time, are used as observation values in a trained Hidden Markov model, or neural network, or other recognition system. The recognition system then outputs the most likely phoneme sequence to correspond with the observed values.
lii an embodiment of the present invention, the Mel-ceptrum coefficients and optionally their derivatives are transmitted as data 401 to a server 400 that runs the recognition system and returns either the phoneme sequence or words based upon the phoneme sequence. This allows for the system to work on devices that either have insufficient processing power to perform the complete recognition process in real time themselves, or that prioritise battery life over the convenience of locally running the recognition system. Hence the voice recognition application may perform the complete recognition process, or may act as a front end pre-processor and receiver for a remote server-based recognition system.
In an embodiment of the present invention, the voice recognition application is arranged to extract information from either half of the conversation that is relevant to the currently running organiser application. The voice recognition application may provide an application interface allowing different organiser applications to specify the types of information they can use.
As is described later herein, the application may be already running, or may be a default application activated when the call is made, or may be an application that is activated in response to a detected key word or phrase.
In the case that more than one compatible application is mnning, then depending on the computing resources available the voice recognition application may serve one application with the highest priority (where such priorities are either manufacturer and/or user set), or may serve each application that identifies itself via the application interface as processing resources allow.
In embodiments of the present invention, as suggested above a function of the voice recognition application is to provide text data relevant to the or each application, by recognising and transcribing words within the speech of both parties to the phone call.
Notably for a diary entry, for example, relevant information may come from both parties to the call.
Hence in a first sample conversation, a user is calling their doctor's surgery to make an appointment: Surgery: "Hello, this is the Surgery, Jane speaking. How may I help you?" Caller: "Hello, I'd like to make an appointment to see the Doctor please" Surgery: "Of course, let me see when the next slot is available" Caller: Waits Surgery: "Hello? I can make an appointment for Wednesday the 4th of December at 3pm for you. Is that ok?" Caller: "Yes. Today at 3pm is fine thank you." Surgery:"Ok. I've booked you in to see Doctor Roberts at 3pm. Please try and arrive at least 10 minutes before your appointment. Thank you" Caller: "Ok, Thank you. Goodbye" Subsequently, a diary entry may be populated with the following information: Date: December 4th, 2013 Time: 3pm Alarm: 250pm Reason: Doctor's Appointment Location: The Local Surgery Notes: Spoke to Jane', Doctor Roberts', Call length 3 Mins The date information was given explicitly by Jane, and confirmed implicitly as today by the caller. The time information was also given explicitly by Jane, and confirmed explicitly by the caller. Linguistic cues such as December' and pm' can be used to classify the spoken numbers as dates or times. Hence in this instance, there can be high confidence in these entries.
When subsequently displaying the record, entries with a high confidence level may be colour-coded (for example, in green) so that when a user reviews the record, they can focus attention on those entries with lower confidence that are more likely to need correction, which may for example be coloured red. Where the recognition system returns probability values as well as phonemes or words, these can also be used to colour code the entries.
In one embodiment the alarm time is just a pre-set 10 minute advance provided by the diary application, which may be user-adjustable. In another embodiment, the system is able to parse the instruction to arrive 10 minutes earlier, and cross-reference it with the appointment time.
The reason' may be an example of a so-called template or schema; the call contained the words surgery', doctor' and appointment'. Optionally these words ca be used to trigger a template of entries needed for making bookings, and specifically bookings to see a doctor. Hence the reason given is the name of the template, as the specific phrase Doctor's Appointment' did not occur in the conversation. Other templates will be apparent to the skilled person that may require different information, such as Ordering a Taxi', or Restaurant Booking'. Templates are described in more detail later herein.
The location may be mentioned in the conversation or known from previous diary entries, or may be obtained by loolcing up the address corresponding to the phone number that was called, either in locally held contact details or via the internet.
The notes are free fields for text likely to be relevant to any conversation, in particular recognised names, and optionally times or locations that were not automatically associated with a particular field of the record. It will be appreciated that some text may have no parsed meaning to example diary application and so may be placed here, whilst others (such as date and time) will be identified as relevant and can be presented to the diary application in a form that the diary can use when creating the entry.
To assist with such identification, templates or schema as mentioned above can be used to improve the accuracy with which records for a particular application or situation are created.
Firstly, at the recognition stage templates can be used to weight vocabulary selections in the look-up dictionary used to convert phoneme sequences into words; these dictionaries typically have grammatical constraints and may also weight similar sounding words according to relative frequency, so that in ambiguous circumstances the most grammatically correct and statistically likely word is suggested. Clearly in different templates such as calling a doctor's surgery, or booking a taxi, the relative probabilities of some words will be likely to increase, and the template, or metadata associated with the template, can be used to change these relative probabilities accordingly so as to improve the accuracy of word selection.
Secondly, at the record assembly stage, different templates may prompt different fields to be populated; for example a taxi booking template may provide for both from' and to' locations.
Different templates may also have optional branches associated with them; for example, booking templates may have an arrive by' optional field that is populated if an additional earlier absolute time is given, or if a relative time is given, as in the first sample conversation above.
It will be appreciated that whilst different templates may be triggered by detected keywords, more generally there may be a default template for an application. Hence for example the diary may have a default template comprising date and time information, and then a note field for other transcribed terms. Meanwhile a to-do application may have a default template that is a note
field and a done/not done field.
As noted above, different applications and/or templates may run concurrently, or, particularly where processing power is limited, may run in sequence. Hence for example a Contacts' application may be triggered when the user makes or receives a call with an unlcnown number (i.e. not a number in the existing contacts list). The associated template may comprise the number field and also a name field that may be quickly populated during the initial greetings of the user and the remote correspondent. The contact application may then dose or pause while a new application is opened in response to a detected keyword or phrase, such as appointment' or meet up'.
Where applications or templates are activated in response to keywords, clearly some relevant information may have already been discussed prior to activation, Consequently, in an embodiment of the present invention, the preceding recognised text is retained in a memory (not shown) so that it can be used as required to populate the newly opened template entries.
Regardless of the current template selected, and notably unlike free transcription of speech, in an embodiment of the present invention the speech recognition application seeks to populate the entries or fields of the template using appropriate information uttered by either participant in the conversation, and that infonnation may occur in any order, An advantage of using a template is that the communication device can then be responsive to a predetennined subset of cues within the free and unstructured dialogue between the two correspondents on the call, The information can therefore be parsed in a number of ways.
Firstly, linguistic cues can be used, such as may name is.,,', ,,, speaking', or hello.,,' for a name, or meet at,,,', opens at,,,', or see you at.,,', for a time or date, as noted previously.
Some cues may be globally searched for, whilst others may be specific to a particular template.
These cues can be used to classify the type of information a word or phrase represents (date, time, number, name, location or the like).
Next, a template may specify what types of data it expects in some or all of its fields.
Hence in a taxi or flight booking example, two different times may be expected; a pick-up time and drop-off time. The times can be identified using linguistic cues and then used to populate the
fields.
Simple heuristics may then be used to associate the earlier identified time with the pick-up field and the later time with the drop-off field. In a similar manner, in a restaurant booking example the fields and times may be 7:30 arrival for an 8:00 table.
In an embodiment of the present invention, the mobile phone recognises speech whilst the call is being made, and uses recognised words and/or phrases to populate a record entry of one or more fields, in an application. This is typically then presented to the user once the call has ended, for user verification and acceptance. At this point the user can modify any field to correct or add to it as applicable before accepting (or discarding) the entry.
In this way, the user advantageously does not have to memorise the details needed for the record entry during the conversation, or subsequently enter it themselves on their phone.
However, another problem with having a record keeping application like a diary on a smart phone is that when arranging a new appointment or booking, it can be difficult to intermpt the call in order to look at existing entries in the diary on the phone's display in order to confirm whether you are free or have a conflict.
Consequently in an embodiment of the present invention, diary entries created in the manner described above are compared with existing entries to determine if there is a clash. Typically the application performs this comparison, by alternatively an OS of the mobile device can interrogate the application for data and perform the comparison instead.
In the simplest form, the clash may be that two entries are for the same time, In a more complex form, the clash may be that the two entries are immediately adjacent. In a still more complex form, the clash may be that the locations associated with the two entries are too far apart for the user to reach the later appointment in the time available between them.
The mobile phone can alert the user to the clash by adding a warning sound to the reproduced audio (and/or a vibration or other detectable alert) to prompt the user to look at the phone, and displaying details of the clash on-screen.
Hence in a second sample conversation, the user is again calling their doctor's surgery to make an appointment: Surgery: "Hello, this is the Surgery, Jane speaking. How may I help you?" Caller: "Hello, I'd like to make an appointment to see the Doctor please" Surgery: "Of course, let me see when the next slot is available" Caller: Waits Surgery: "Hello? I can make an appointment for Wednesday the 4th of December at 3pm for you. Is that ok?" Caller: "Yes. Today at 3pm is fine thank you." Device: Error sound Device: Shows a calendar warning message -School Alec//rig Today w 3pm' Caller: "Oh, no I can't do 3pm today. How about later today?" Surgery: "Oh Ok. Let me see. Is 4pm OK for you?" Device: Success sound Caller: "Yes, that sounds fine" Surgery:"Ok. I've booked you in to see Doctor Roberts at 4pm. Please try and arrive at least 10 minutes before your appointment. Thank you" Caller: "Ok, Thank you. Goodbye" In this case, the resulting suggested diary entry will be Date: December 4th, 2013 Time: 4pm Alarm:3.5Opm Reason: Doctor's Appointment Location: The Local Surgery Notes: Spoke to Jane', Doctor Roberts', Call length 4 Mins Optionally the success sound may be used to indicate that any calculated diary entry is clear, or may only be used when a clash has previously been alerted (as in the above second example).
The activation of such sounds may be user selectable. I0
lii the above embodiments, reference has been made to a mobile phone or smart phone.
However, it will be appreciated that any device capable of supporting two-way conversations together with an ancillary application that can be populated with user data, and at least the initial processing of a speech recognition application (for example up to the generation of Mel-cepstrum coefficients or the equivalent for transmission to a back-end sewer) may use the above described techniques.
Hence for example in addition to mobile phones, the invention may be provided in association with voice over IP (VoIP) applications such as Skype ® and any device capable of running such a VoIP application, such as a PDA, tablet, laptop, PC, networked TV, games console and the like.
Hence for example in-game chat between players of videogame consoles may be monitored in this fashion, for example to set up a diary reminder on a console for players to meet again to play a game online, In this case, details such as the IDs of the other players in the group, the game and the hosting server may be template fields as well as the date and time.
Optionally, the ancillary application may itself communicate some of the obtained data to a further application remote to the device; hence for example if a television programme is recommended by a friend during a conversation, the details may be passed to a programme scheduling application on the user's mobile phone. This in turn may contact servers of the user's TV provider, which then sends settings to the user's set top box to either record the programme or set a reminder. Again this can be subject to approval by the user at the end of the call.
Similarly optionally, devices that act as peripherals to such a communications device may be integrated into the technique; hence a smart watch 500 or other wearable display that communicates with such a communications device (for example via a Bluetooth ® link 501) may be used to display clash warnings, or to display the ongoing population of the diary entry (or any record entry) during the call so that the user can see what relevant information the device has already captured or needs clarifying.
Hence in a summary embodiment of the present invention, a communication device 100 (such as a smart phone or VoIP device as noted previously), operable to hold a two-way voice call with a remote communication device, comprises speech recognition means 170 operable to process speech from each of two correspondents in the two-way voice call. As noted previously, this speech recognition means may generate transcribed text locally, or may pre-process the speech for transmission to a remote server, and receive the transcribed speech back. The communication
II
device also comprises a record-keeping application means 180 (for example a diary application, to-do list application, or video recording request application etc., as run on the processor 150), having associated with it a data-entry template as described previously (not shown). Hence the data-entry template may specify one or more data entry fields and the type of data expected in each field. The speech recognition means is then operable to identify words or phrases from either of the two correspondents in the two-way voice call that are relevant to the associated data-entry template. The speech recognition means is then operable to pass identified words or phrases to the record-keeping application means, and the record-keeping application means is operable to assemble a record entry 190 for the record-keeping application during the course of the two-way voice call, responsive to the identified words or phrases. Hence the entries or fields in the record entry may be populated responsive to the type of identified word or phrase, for example as indicated by the context surrounding such words and phrases as described herein.
It will be appreciated that the speech recognition means and record-keeping application means may both be applications running on a central processor of the communication device.
In an instance of the summary embodiment, as noted above the speech recognition means comprises communication means operable to transmit data representative of speech from each of the two correspondents in the two-way voice call to a remote server 400, and operable to receive transcribed speech data from the remote server. It will be appreciated that the speech recognition means in this case may effectively comprise those other parts of the communication device already present for such general data communications.
In an instance of the summary embodiment, the record keeping application means is associated with one or more selected from the list consisting of: i. a default data-entry template; and ii. a data-entry template selected in response to one or more identified words or phrases.
As noted previously, such templates may be presented in parallel or concurrently.
In an instance of the summary embodiment, the record keeping application means indicates the format of a data-entry template to the speech recognition means. As noted previously, this may be done via an API or any suitable notification protocol. Hence the fields and data types of a template may be communicated to the speech recognition means by any compatible application in a standardised manner.
In an instance of the summary embodiment, identified words or phrases are matched to entries in the associated data-entry template according to type. Hence as noted previously, a time entry in a template will be populated with words or phrases identified as being times. The identification can be simple; for example the linguistic cues mentioned previously can set the type for subsequent numbers; hence see you at 6pm' includes at' and pm', indicating that the 6' is a time, Meanwhile, see you at number 6, the high street' includes the words number' and indicators of a location, suggesting the number is part of a location entry. Dates, names and locations can be similarly parsed.
In an instance of the summary embodiment, the associated data-entry template comprises a time of event entry, and the record keeping application means comprises comparison means operable to compare the record being assembled with existing records, to detect any time conflict.
Consequently in this instance, the record keeping application means is operable to trigger an alert signal if a time conflict is detected. As noted previously, the alert may be an error sound superposed onto the signal already being output to the user.
In an instance of the summary embodiment, the record keeping application means is operable to display the assembled record to the user when the two-way voice call is complete, as this will be when the user is able to look at the device. However it will also be appreciated that if the communication device has detected the use of a wireless headset such as a BlueTooth © headset, then optionally the record keeping application means can display the record entry being assembled during the course of the two-way voice call as it is not necessary in this case for the user to hold the device to their ear. Similarly some devices, such as Skype © enabled TYs, can display the record as it is being assembled, In an instance of the summary embodiment, the communication device comprises a user input means (typically a soft button contextually selected for the purpose of indicating approval, but potentially a dedicated hardware button), and the record keeping application means is operable to retain the assembled record if the user indicates approval of the assembled record with the user input means, In a summary embodiment of the present invention, a communication system comprises a communication device implementing techniques as described herein, and a wearable display 500 such as a smart watch or pendant display that is operable to wirelessly communicate with the communication device. The communication device is arranged in operation to transmit data representative of the record entry being assembled during the course of the two-way voice call to In I-., the wearable display, and the wearable display is operable to then display the record entry being assembled during the course of the two-way voice call. Optionally, the wearable display may similarly comprise a user input means through which the user can indicate approval of the assembled record, and this approval is then transmitted back to the communication device.
Referring now to Figure 3, a method of record keeping for a two-way voice call comprises: -In a first step slO, associating a data-entry template with a record-keeping application; -In a second step s20, processing speech for speech recognition from each of two correspondents in the two-way voice calL -In a third step s30, identifying words or phrases from either of the two correspondents in the two-way voice call that are relevant to the associated data-entry template; and -In a fourth step s40, assembling a record entry for the record-keeping application during the course of the two-way voice call, responsive to the identified words or phrases.
It will be apparent to a person slcilled in the art that variations in the above method corresponding to operation of the various embodiments of the apparatus as described and claimed herein are considered within the scope of the present invention, including but not limited to: -transmitting data representative of speech from each of the two correspondents in the two-way voice call to a remote server, and receiving transcribed speech data from the remote server; -associating one of a plurality of data-entry templates with the record-keeping application in response to one or more identified words or phrases; and -where the associated data-entry template comprises a time-of-event entry, comparing the record being assembled with existing records, to detect any time conflict, and if a time conflict is detected, triggering an alert signal.
It will be appreciated that the above methods may be carried out on conventional hardware suitably adapted as applicable by software instruction or by the inclusion or substitution of dedicated hardware.
Thus the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a computer program product comprising processor implementable instructions stored on a tangible non-transitory machine-readable medium such as a floppy dislc, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable circuit suitable to use in adapting the conventional equivalent device. Separately, such a computer program may be transmifted via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these of other networks. I5

Claims (15)

  1. CLAIMS1. A communication device (100) operable to hold a two-way voice call with a remote communication device, the communication device comprising: speech recognition means (170) operable to process speech from each of two correspondents in the two-way voice call; and record-keeping application means (180) having associated with it a data-entry template; and in which: the speech recognition means is operable to identify words or phrases from either of the two correspondents in the two-way voice call that are relevant to the associated data-entry template; the speech recognition means is operable to pass identified words or phrases to the record-keeping application means; and the record-keeping application means is operable to assemble a record entry (190) for the record-keeping application during the course of the two-way voice call, responsive to the identified words or phrases.
  2. 2. A communication device according to claim 1, in which the speech recognition means comprises: communication means operable to transmit data representative of speech from each of the two correspondents in the two-way voice call to a remote server (400), and operable to receive transcribed speech data from the remote server.
  3. 3. A communication device according to claim 1 or claim 2, in which the record keeping application means is associated with one or more selected from the list consisting of: i. a default data-entry template; and ii. a data-entry template selected in response to one or more identified words or phrases.
  4. 4. A communication device according to any one of claims t to 3, in which the record keeping application means indicates the format of a data-entry template to the speech recognition means.
  5. 5. A communication device according to any preceding claim, in which identified words or phrases are matched to entries in the associated data-entry template according to type.
  6. 6. A communication device according to any preceding claim in which: the associated data-entry template comprises a time of event entry; and in which the record keeping application means comprises comparison means operable to compare the record being assembled with existing records, to detect any time conflict.
  7. 7. A communication device according to claim 6, in which the record keeping application means is operable to trigger an alert signal if a time conflict is detected.
  8. 8. A communication device according to any preceding claim in which the record keeping application means is operable to display the assembled record to the user once the two-way voice call is complete.
  9. 9. A communication device according to claim 8, in which the communication device comprises: a user input means; and the record keeping application means is operable to retain the assembled record if the user indicates approval of the assembled record with the user input means.
  10. 0. A communication system, comprising: A communication device (100) according to any proceeding claim; and A wearable display (500) operable to wirelessly communicate with the communication device; and in which: the communication device is arranged in operation to transmit data representative of the record entry being assembled during the course of the two-way voice call; and the wearable display is operable to display the record entry being assembled during the course of the two-way voice call.
  11. 11. A method of record keeping for a two-way voice call, comprising the steps of associating a data-entry template with a record-keeping application; processing speech for speech recognition from each of two correspondents in the two-way voice call; identify words or phrases from either of the two correspondents in the two-way voice call that are relevant to the associated data-entry template; and assemble a record entry for the record-keeping application during the course of the two-way voice call, responsive to the identified words or phrases.
  12. 12. A method of record keeping according to claim 11, comprising the steps of: tansmitting data representative of speech from each of the two correspondents in the two-way voice call to a remote server; and receiving transcribed speech data from the remote server.
  13. H, A method of record keeping according to claim I or claim 12, comprising the step of associating one of a plurality of data-entry templates with the record-keeping application in response to one or more identified words or phrases.
  14. 14. A method of record keeping according to any one of claims 11 to 13 in which the associated data-entry template comprises a time-of-event entry, comprising the steps of: comparing the record being assembled with existing records, to detect any time conflict; and if a time conflict is detected, tiggenng an alert signal.
  15. 15. A computer program for implementing the steps of any preceding method claim.
GB1404033.1A 2014-03-07 2014-03-07 Method of record keeping and a record keeping device Withdrawn GB2523821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1404033.1A GB2523821A (en) 2014-03-07 2014-03-07 Method of record keeping and a record keeping device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1404033.1A GB2523821A (en) 2014-03-07 2014-03-07 Method of record keeping and a record keeping device

Publications (2)

Publication Number Publication Date
GB201404033D0 GB201404033D0 (en) 2014-04-23
GB2523821A true GB2523821A (en) 2015-09-09

Family

ID=50554676

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1404033.1A Withdrawn GB2523821A (en) 2014-03-07 2014-03-07 Method of record keeping and a record keeping device

Country Status (1)

Country Link
GB (1) GB2523821A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004083981A2 (en) * 2003-03-20 2004-09-30 Creo Inc. System and methods for storing and presenting personal information
EP1793571A1 (en) * 2005-11-30 2007-06-06 Alcatel Lucent Calendar interface for digital communications
WO2008137563A2 (en) * 2007-05-03 2008-11-13 Sonus Networks, Inc. Service integration on a network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004083981A2 (en) * 2003-03-20 2004-09-30 Creo Inc. System and methods for storing and presenting personal information
EP1793571A1 (en) * 2005-11-30 2007-06-06 Alcatel Lucent Calendar interface for digital communications
WO2008137563A2 (en) * 2007-05-03 2008-11-13 Sonus Networks, Inc. Service integration on a network

Also Published As

Publication number Publication date
GB201404033D0 (en) 2014-04-23

Similar Documents

Publication Publication Date Title
US11810554B2 (en) Audio message extraction
US11944437B2 (en) Determination of content services
JP6945695B2 (en) Utterance classifier
US20210193176A1 (en) Context-based detection of end-point of utterance
US20230267921A1 (en) Systems and methods for determining whether to trigger a voice capable device based on speaking cadence
US10540970B2 (en) Architectures and topologies for vehicle-based, voice-controlled devices
US9443527B1 (en) Speech recognition capability generation and control
US9633674B2 (en) System and method for detecting errors in interactions with a voice-based digital assistant
US20100138224A1 (en) Non-disruptive side conversation information retrieval
US9583107B2 (en) Continuous speech transcription performance indication
US9704478B1 (en) Audio output masking for improved automatic speech recognition
US8909534B1 (en) Speech recognition training
US8510103B2 (en) System and method for voice recognition
US20150348538A1 (en) Speech summary and action item generation
KR102097710B1 (en) Apparatus and method for separating of dialogue
CN107622768B (en) Audio cutting device
JP2014063088A (en) Voice recognition device, voice recognition system, voice recognition method and voice recognition program
US20230206897A1 (en) Electronic apparatus and method for controlling thereof
US10002611B1 (en) Asynchronous audio messaging
US11580954B2 (en) Systems and methods of handling speech audio stream interruptions
JP2015087649A (en) Utterance control device, method, utterance system, program, and utterance device
GB2523821A (en) Method of record keeping and a record keeping device
US20120330666A1 (en) Method, system and processor-readable media for automatically vocalizing user pre-selected sporting event scores
US10965391B1 (en) Content streaming with bi-directional communication
JP2017122930A (en) Speech controller unit, method, speech system, and program

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)