GB2508790A

GB2508790A - Method and System for Allocating Voice Data to Sections of a Document

Info

Publication number: GB2508790A
Application number: GB201212745A
Authority: GB
Inventors: Conor James Moran; Andrew Gough
Original assignee: MOBILEREPORT Ltd
Current assignee: MOBILEREPORT Ltd
Priority date: 2012-07-18
Filing date: 2012-07-18
Publication date: 2014-06-18
Also published as: GB201212745D0

Abstract

A system and method of machine allocation of voice data to one or more sections of a document according to keywords of the sections of the document, the method comprising (i) recording the voice data (10), (ii) using a voice recognition device to transcribe at least part of the voice data into voice data text (12), (iii) extracting a transcribed word from the voice data text (14), (iv) comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word (16), (v) assigning a score to the one or more matching sections comprising the transcribed word using one or more predetermined rules (18), (vi) repeating steps (iii) to (v) for any further transcribed words in the voice data text (20), (vii) using the scores to determine one or more best matching sections (22), and (viii) allocating the voice data text to the one or more best matching sections of the document (24), or (ix) communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document (26).

Description

Method and System for Allocating Voice Data to Sections of a Document The invention relates to a method and system for allocating voice data to sections of a document, particularly, but not exclusively, a document comprising a template for a report.

There are many different types of documents, for examples, reports which are commonly produced by dictation of voice data onto a recording device, subsequent manual transcription of the voice data into one or more files of voice data text and manual allocation of the voice data text into appropriate sections of the report. For example, a surveyor may visit a building to be surveyed and record voice data (dictations) relating to various parts of the building. The voice data is then transcribed, usually by a secretary, into voice data text comprising a block of text relating to each of the parts of the building. It is labour intensive and time consuming to use a secretary to transcribe the voice data. The surveyor then has to allocate appropriate parts of the voice data text to appropriate sections of the report, which is also time consuming.

Machine transcription of voice data into one or more files of voice data text is becoming increasingly popular. However, this can be unreliable and inaccurate, and this still leaves the issue of allocation of the voice data text to appropriate sections of the report. It would be useful if the allocation of the voice data text into sections of the report could also be performed by a machine.

According to a first aspect of the invention there is provided a method of machine allocation of voice data to one or more sections of a document according to keywords of the sections of the document, the method comprising (i) recording the voice data, (ii) using a voice recognition device to transcribe at least part of the voice data into voice data text, (Hi) extracting a transcribed word from the voice data text, (iv) comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word, (v) assigning a score to the one or more matching sections comprising the transcribed word using one or more predetermined rules, (vi) repeating steps (iii) to (v) for any further transcribed words in the voice data text, (vii) using the scores to determine one or more best matching sections, and (viii) allocating the voice data text to the one or more best matching sections of the document, or (ix) communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document.

The voice data may comprise a word. This could be a word which is expected to match at least one keyword of at least one section of the document. The voice data may comprise two or more words. These could be words which are expected to match keywords of sections of the document. The voice data may comprise a portion of free speech. This could form a narrative for the document and comprise one or more sentences comprising one or more words which are expected to match keywords of sections of the document. The keywords of a document may be predetermined by a creator of the document according to a genre of the document. For example, the document may be a template or report for a building survey, and the predetermined keywords for the document would then relate to surveying buildings.

Using the voice recognition device to transcribe the voice data into voice data text may comprise using the device to identify and transcribe one or more words of the voice data in realtime. This could take place, for example, as the voice data is recorded.

Comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections may comprise comparing the transcribed word with one or more keywords in headings of the sections. Additionally or alternatively, comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections may comprise comparing the transcribed word with one or more keywords in bodies of the sections.

Assigning a score to the one or more matching sections comprising the transcribed word may comprise assigning a score according to a time of occurrence of the transcribed word in the voice data. Assigning a score to the one or more matching sections comprising the transcribed word may comprise assigning a score according to a frequency of occurrence of the transcribed word in the voice data. Assigning a score to the one or more matching sections comprising the transcribed word may comprise assigning a score according to hierarchical positioning of the one or more matching sections in the document.

Using the scores to determine one or more best matching sections may comprise accumulating scores for each matching section and determining the matching section or matching sections with a highest score as the best matching section or best matching sections.

Allocating the voice data text to the one or more best matching sections of the document may be done in real time or on termination of recording of the voice data or at a later time.

Communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document may comprise displaying the one or more best matching sections to the user and allowing the user to direct allocation of the voice data text to one or more sections of their choice. Communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document may comprise displaying the one or more best matching sections and one or more further sections within a certain probability of the one or more best matching sections to the user and allowing the user to direct allocation of the voice data text to one or more sections of their choice.

The method may comprise a further step between step (iii) and step (iv) comprising comparing the transcribed word with a keyword dictionary comprising the keywords of the sections of the document, and, if the transcribed word is not in the keyword dictionary, repeating for any further transcribed words in the voice data text, or, if the transcribed word is in the keyword dictionary, continuing to step (iv).

According to a second aspect of the invention there is provided a voice data allocation system for allocation of voice data to one or more sections of a document according to keywords of the sections of the document, the system comprising a recorder for recording the voice data, a voice recognition device for transcribing the voice data into voice data text, and an allocation device comprising an extractor for extracting a transcribed word from the voice data text, a comparator for comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word, a scorer for assigning a score to the one or more matching sections using one or more predetermined rules, a best match determinator for using the scores to determine one or more best matching sections, and an allocator for allocating the voice data text to the one or more best matching sections of the document or communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text.

According to a third aspect of the invention there is provided a voice data allocation device comprising an extractor for extracting a transcribed word from voice data text, a comparator for comparing the transcribed word with keywords of sections of a document to ascertain one or more matching sections comprising the transcribed word, a scorer for assigning a score to the one or more matching sections using one or more predetermined rules, a best match determinator for using the scores to determine one or more best matching sections, and an allocator for allocating the voice data text to the best matching section or to the best matching sections of the document or communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text.

An embodiment of the invention will now be described by way of example only with reference to the accompanying drawings, in which: Figure 1 is a schematic representation of a voice data allocation system according to the second aspect of the invention comprising a voice data allocation device according to the third aspect of the invention, Figure 2 is a flow chart representing the method of the first aspect of the invention, Figure 3 is a flow chart representing further details of the method of the first aspect of the invention, and Figure 4 is a flow chart representing hierarchy score determination details of the method of the first aspect of the invention.

Referring to Figure 1, the voice data allocation system 1 comprises a recorder 3, a voice recognition device 5 and a voice data allocation device 7. The allocation device 7 comprises an extractor, a comparator, a scorer, a best match determinator, and an allocator (not shown separately). It will be appreciated that the allocation system 1 may comprise other, conventional, modules, such as an input module and an output module. The allocation system 1 may be provided in a portable computing device such as a smart phone, a PDA, a tablet, a laptop computer or any other similar device. This allows a user of the system to take the allocation system on site' where the voice data for a document is to be produced, e.g. a user can take the allocation system to a site to be surveyed for recording of voice data and allocation of voice data text for a surveyor's report. Alternatively, or additionally, the allocation system 1 may be provided in a non portable computing device such as a desk top computer. To provide the allocation system 1 in a computing device, the recorder 3, the voice recognition device 5 and the allocation device 7 are loaded, as software, onto the computing device and make use of the components of the computing device such as a display component, power source, microphone etc. Alternatively, the allocation device 7 is loaded, as software, onto the computing device and makes use of a recorder and a voice recognition device of, or associated with, the computing system, and makes use of other components of the computing device such as a display component, power source etc. To use the voice data allocation system 1, the user first chooses a document into which the voice data is to be allocated. The voice data allocation system 1 may comprise or have access to one or more documents which are presented to the user. The documents may comprise templates for various types of report, such as surveyor's reports, fire reports, etc. The user may have previously created one or more such templates. This may be done by identifying sections of the template, establishing a hierarchical position in the template for each section of the template, and assigning a keyword or keywords to the sections of the template. The keywords for the sections of the template may be keywords in headings of the sections and/or keywords in bodies of the sections.

Once the document is chosen, the user uses the allocation system 1 to allocate voice data text to appropriate sections of the document. Referring to Figure 2, the user records the voice data (step 10). It will be appreciated that this may done in a variety of ways, for example, the user may press a button or key on the computing device to activate the recorder 3 of the allocation system 1 to start recording of voice data, deliver the voice data and press a button or key on the computing device to deactivate the recorder 3 to stop recording of voice data. Alternatively, the user may hold down a button or key of the computing device throughout delivery of the voice data to enable recording of the data by the recorder 3. During recording of the voice data or once recording has been completed, the voice recognition device S is used to transcribe the voice data into voice data text (step 12). During recording/transcribing of the voice data or once recording/transcribing has been completed, the voice data allocation device 7 operates to allocate the voice data to an appropriate section or sections of the document. A first transcribed word is extracted from the voice data text (step 14). The transcribed word is compared with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word (step 16). A score is assigned to the one or more matching sections comprising the transcribed word using one or more predetermined rules (step 18). Steps 14, 16 and 18 are then repeated for any remaining transcribed words in the voice data text (step 20). Using the scores, the one or more best matching sections is determined (step 22) and the voice data text allocated to the one or more best matching sections of the document (step 24) by the voice data allocation device 7. Alternatively, the voice data allocation device 7 communicates the one or more best matching sections of the document to the user and allows the user to direct allocation of the voice data text to one or more sections of the document (step 26). It will be appreciated that multiple voice data, e.g. in the form of multiple samples of voice data, may be recorded, transcribed and allocated to sections of a document as described above. The samples of voice data could comprise one or more words or one or more portions of free speech. The samples of voice data could, for example, be contained in separate recordings made using the recorder 3.

Referring to Figure 3, further details of the method of allocation of voice data text to a document will now be described. Once the voice data has been recorded and transcribed into one or more transcribed words, TW(n), a first transcribed word, TW(1), is extracted from the voice data text. This is first compared with keywords in a dictionary which has been created from all keywords of all sections of the document. If the transcribed word TW(1) is not in the dictionary, then the method moves on to the next transcribed word of the voice data text. If the transcribed word TW(1) is in the dictionary, then the further steps of the method are carried out for this word. In this way, if any transcribed word is not a keyword, then time is not spent separately comparing it with keywords of each of the sections of the document. Assuming that the transcribed word TW(1) is found in the dictionary, a time mark is calculated for TW(1) according to a time of occurrence of TW(1) in the voice data text. The closer the time of occurrence of a transcribed word is to the start of the voice data text, the higher the time mark will be. The transcribed word TW(1) is then compared to the or each keyword of a first section, 5(1), of the sections 5(m) of the document. If TW(1) matches a keyword in section 5(1) i.e. section 5(1) is a matching section of the document, then the time mark for TW(1) is assigned to section 5(1) as a score. Comparison of the transcribed word TW(1) is repeated for all other sections 5(m) of the document. Each time a matching section is found, the time mark for TW(1) is assigned to that section as a score.

A score is then assigned to the one or more matching sections of the transcribed word TW(1) according to a hierarchical positioning of the one or more matching sections in the document. In this embodiment, the sections of the document are arranged within the document in a hierarchical structure including one or more level one sections, which comprise one or more level two sections, which, in turn, comprise one or more level three sections, etc. The level one sections are said to be the parent sections of the level two sections, and the level two sections are said to be the parent sections of the level three sections, etc. The previously described hierarchical structure of the document is by way of example only, it will be appreciated that the document can comprise sections laid out in other hierarchical structures. Referring to Figure 4, for each matching section MS(r) of the transcribed word TW(1), a hierarchy score is determined as follows. For a first matching section MS(1), it is determined if this matching section has a parent section. If matching section M5(1) does not have a parent section, the matching section MS(1) hierarchy score, HS(1), is determined to be zero, and the method is started for the next matching section MS(2). If matching section MS(1) does have a parent section, it is then determined if the parent section has a score. If the parent section does not have a score, the matching section MS(1) hierarchy score, HS(1), is determined to be zero, and the method is started for the next matching section MS(2). If the parent section does have a score, the matching section MS(1) hierarchy score, HS(1), is determined to be a weighted value of the parent section score, and the method is started for the next matching section MS(2). The method is repeated for each of the matching sections, MS(r), of the document for the transcribed word TW(1). In this way, the matching sections may automatically inherit part of the scores of keywords of their parent sections.

A second transcribed word, TW(2), and third transcribed word, TW(3), etc. are then extracted from the voice data text file and the process described above repeated for each word. If two or more of the transcribed words are the same word and match a keyword of one or more sections of the document, a score will be assigned to the one or more matching sections for each occurrence of the word. In this way, a score is assigned to the one or more matching sections according to a frequency of occurrence of the word in the voice data text, in addition to a score being assigned to the one or more matching sections according to a time of occurrence of the word in the voice data text and a score being assigned to the one or more matching sections according to a hierarchical positioning of the one or more matching sections in the document.

The one or more best matching sections of the document are then determined, by using the scores. This involves accumulating the scores (time scores, frequency scores and hierarchy scores) assigned to each matching section and determining the one or more matching sections with a highest score to be the one or more best matching sections.

The voice data allocation device 7 then allocates the voice data text to the one or more best matching sections. Alternatively, the voice data allocation device 7 displays the one or more best matching sections to the user who is allowed to direct allocation of the voice data text to one or more sections of the document. In this way, the user can agree with the voice data allocation device choice of the one or more best matching sections or override the voice data allocation device choice of the one or more best matching sections and choose a different section or sections for allocation of the voice data text.

Claims

CLAIMS1. A method of machine allocation of voice data to one or more sections of a document according to keywords of the sections of the document, the method comprising (i) recording the voice data, (ii) using a voice recognition device to transcribe at least part of the voice data into voice data text, (Hi) extracting a transcribed word from the voice data text, (iv) comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word, (v) assigning a score to the one or more matching sections comprising the transcribed word using one or more predetermined rules, (vi) repeating steps (Hi) to (v) for any further transcribed words in the voice data text, (vii) using the scores to determine one or more best matching sections, and (viii) allocating the voice data text to the one or more best matching sections of the document, or (ix) communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document.
2. A method according to claim 1 in which the voice data comprises a word which is expected to match at least one keyword of at least one section of the document.
3. A method according to claim 1 in which the voice data comprises two or more words which are expected to match keywords of sections of the document.
4. A method according to claim 1 in which the voice data comprises a portion of free speech which forms a narrative for the document and comprises one or more sentences comprising one or more words which are expected to match keywords of sections of the document.
5. A method according to any preceding claim in which the keywords of a document are predetermined by a creator of the document according to a genre of the document.
6. A method according to any preceding claim in which using the voice recognition device to transcribe the voice data into voice data text comprises using the device to identify and transcribe one or more words of the voice data in real time.
7. A method according to any preceding claim in which comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprises comparing the transcribed word with one or more keywords in headings of the sections.
8. A method according to any preceding claim in which comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprises comparing the transcribed word with one or more keywords in bodies of the sections.
9. A method according to any preceding claim in which assigning a score to the one or more matching sections comprising the transcribed word comprises assigning a score according to a time of occurrence of the transcribed word in the voice data.
10. A method according to any preceding claim in which assigning a score to the one or more matching sections comprising the transcribed word comprises assigning a score according to a frequency of occurrence of the transcribed word in the voice data.
11. A method according to any preceding claim in which assigning a score to the one or more matching sections comprising the transcribed word comprises assigning a score according to hierarchical positioning of the one or more matching sections in the document.
12. A method according to any preceding claim in which using the scores to determine one or more best matching sections comprises accumulating scores for each matching section and determining the matching section or matching sections with a highest score as the best matching section or best matching sections.
13. A method according to any preceding claim in which communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document comprises displaying the one or more best matching sections to the user and allowing the user to direct allocation the voice data text to one or more sections of their choice.
14. A method according to any of claims 1 to 12 in which communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text to one or more sections of the document comprises displaying the one or more best matching sections and one or more further sections within a certain probability of the one or more best matching sections to the user and allowing the user to direct allocation the voice data text to one or more sections of their choice.
15. A method according to any preceding claim in which comprising a further step between step (iii) and step (iv) comprising comparing the transcribed word with a keyword dictionary comprising the keywords of the sections of the document, and, if the transcribed word is not in the keyword dictionary, repeating for any further transcribed words in the voice data text, or, if the transcribed word is in the keyword dictionary, continuing to step (iv).
16. A voice data allocation system for allocation of voice data to one or more sections of a document according to keywords of the sections of the document, the system comprising a recorder for recording the voice data, a voice recognition device for transcribing the voice data into voice data text, and an allocation device comprising an extractor for extracting a transcribed word from the voice data text, a comparator for comparing the transcribed word with the keywords of the sections of the document to ascertain one or more matching sections comprising the transcribed word, a scorer for assigning a score to the one or more matching sections using one or more predetermined rules, a best match determinator for using the scores to determine one or more best matching sections, and an allocator for allocating the voice data text to the one or more best matching sections of the document or communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text.
17. A voice data allocation device comprising an extractor for extracting a transcribed word from voice data text, a comparator for comparing the transcribed word with keywords of sections of a document to ascertain one or more matching sections comprising the transcribed word, a scorer for assigning a score to the one or more matching sections using one or more predetermined rules, a best match determinator for using the scores to determine one or more best matching sections, and an allocator for allocating the voice data text to the best matching section or to the best matching sections of the document or communicating the one or more best matching sections of the document to a user and allowing the user to direct allocation of the voice data text.
18. A method of machine allocation of voice data to one or more sections of a document according to keywords of the sections of the document substantially as described herein with reference to the accompanying drawings.
19. A voice data allocation system for allocation of voice data to one or more sections of a document according to keywords of the sections of the document substantially as described herein with reference to the accompanying drawings.
20. A voice data allocation device substantially as described herein with reference to the accompanying drawings.