GB2541792A

GB2541792A - Audio file

Info

Publication number: GB2541792A
Application number: GB1611637.8A
Authority: GB
Inventors: John Lewis Simon; Stuart Harris Marc
Original assignee: BigHand Ltd
Current assignee: BigHand Ltd
Priority date: 2015-08-28
Filing date: 2016-06-30
Publication date: 2017-03-01
Anticipated expiration: 2036-06-30
Also published as: GB201515382D0; GB2541792B; GB201611637D0

Abstract

A digital dictation file comprises a recording comprising a plurality of recording portions. Each portion of at least first and second portions of the recording is captured from a microphone and a playback marker defining a position in the digital dictation for each portion to start playing is determined. A recording mode (e.g. overwrite, insert, deletion) of each portion is determined and the portions are stored in the file, appending the second recording portion to the first recording portion. Recording makers defining the start of each portion in the file are determined. The recording maker, playback marker and the recording mode are stored in an entry of a transaction table for each portion and table is stored in or associated with the file. The method allows a user to insert or overwrite audio by appending the audio to the end of the recording in the file. The transaction table is maintained to keep a record of the different portions of audio and where they have been inserted into the file.

Description

Audio File

Background [0001] Dictation software has improved with developments in recording technologies. Previously, dictation had been recorded on magnetic storage media (such as tapes) and sent to be typed. Digital Dictation software has been developed so that dictations can be recorded digitally (for example, this may be done using a computer and a digital microphone). The dictation may then be sent to a typist over a network, so that magnetic tapes are not required. This speeds up the process of providing a dictation to a typist.

[0002] Digital audio files can be more easily manipulated than magnetic tapes. For example, with modern digital dictation software, it is possible to insert sections of audio data into a recording. With magnetic tapes, it is only possible to overwrite the existing audio data.

[0003] The audio that is sent across the network may be in a number of possible formats. A variety of different file types can be used to encode the audio data into a digital file.

These may include mp3, wma, vogg, wave, aac, flac and the like. Each of these file types has a certain compression rate to attempt to reach a compromise between the size of the audio file and the quality of the audio after the file is decoded. The type of audio file that is most appropriate for a particular use may depend on the application. Moreover, different encodings may be more appropriate for different types of audio files (for example, speech may be best stored using a different file type to classical music).

[0004] Prior art devices use a number of different audio file types. However, the way in which the audio data is written to disk is fairly standard. The audio data is stored in memory and then encoded to the required file type when the dictation is complete. If the user wishes to change the audio of the dictation, the file is decoded and the audio is placed back into memory for editing. This approach can lead to a high use of computing resources, especially if the dictation is relatively long.

Summary [0005] Against this background, the present invention provides an improved method of storing a digital dictation according to claim 1. The present invention also provides a method of playing a digital dictation according to claim 12 and a method of deleting audio from a digital dictation according to claim 18.

[0006] It has been noted that it is desirable when dictating to rewind the dictation and insert portions of audio or overwrite previous portions of audio. This is possible with digital recording techniques. However, as mentioned above, this requires the file to loaded from disk into memory, the portion of audio to be inserted and the remaining audio to be reordered, then the audio to be written to disk again. This can be slow and wasteful of resources.

[0007] The present invention therefore provides a method of storing a digital dictation that overcomes this problem. The method of the claimed invention allows the user to insert or overwrite audio by appending the audio to the end of the recording in the file. The recording in the file may be an audio stream. This approach is less time consuming and wasteful of resources. A transaction table is maintained to keep a record of the different portions of audio and where they have been inserted into the file. This allows the digital dictation software to run more efficiently, whilst retaining the overwrite and insert functionality that is advantageous to the user.

[0008] The present invention also provides a method of reading a digital dictation stored according to the method of the present invention. The method involves reading the transaction table from the digital dictation and creating an audio map that relates each section of the dictation (as it is intended to be played) with a corresponding section of the audio stored in the file (which are stored in the order they were recorded).

[0009] A method of deleting audio from a digital dictation stored according to the method of the present invention is also provided.

[0010] This invention provides a method of storing a digital dictation in a file comprising a recording. The recording comprises a plurality of recording portions. Each recording portion may be generated when a user records a part of the dictation by pressing the record button on a digital Dictaphone, for example. However, when the dictation is played, these portions are not necessarily played in the same sequence as that in which they are stored in the file. This may be because the user has inserted portions into the dictation or overwritten some of the previous portions.

[0011] The method comprises receiving a first recording portion of digital audio that has been captured from a microphone. The method further comprises generating a first playback marker defining a position in the digital dictation for the first recording to start to play. In other words, determining the correct playback location in the audio file (as it currently exists) of the start of the first recording portion and generating a marker to indicate this. This can be done by the user selecting the position in which they want to insert the audio using a piece of software with a GUI, for example. If this recording portion is the very first portion of audio of the dictation then this will be the start of the dictation (in other words, this may be defined as a playback position of 0:00). However, if other portions have already been recorded then the playback position of the first sample of the portion may be at the end of the dictation (if adding to the dictation) or at the beginning or in the middle of the dictation (if inserting or overwriting).

[0012] The method further comprises determining a first recording mode of the first recording portion. This can be determined by a user selecting a recording mode using software or their recording device. Alternatively, a default recording mode may be used.

[0013] The method further comprises storing the first recording portion in the file, storing the first recording marker defining the start of the first recording portion in the file, storing the first playback marker and storing the first recording mode in a first entry of a transaction table. In other words, writing the audio data to disk and noting where in the dictation it should be played (and how - whether by inserting or overwriting, for example) in a lookup table. The lookup table describes the changes that are made to the dictation in chronological order and the order that they are placed is relevant because the dictation changes as sections of audio are added, replaced or removed.

[0014] The method further comprises receiving a second recording portion of digital audio that has been captured from a microphone after the first recording portion of audio has been captured. In other words, the second portion is a new section of speech that is recorded at a later time than the first portion.

[0015] The method further comprises receiving a second playback marker defining a position in the digital dictation for the second recording portion to start to play. As with the first portion, the second portion may be added to the end of the recording or may be inserted elsewhere.

[0016] The method further comprises determining a second recording mode of the second recording portion. As with the first recording mode, this can be determined by a user selecting an insert mode using software or their recording device or using a default recording mode.

[0017] The method further comprises storing the second recording portion in the file by appending the second recording portion to the first recording portion. In other words, the second portion is appended to the end of the audio stream in the file but may actually be played at a different position in the dictation. Advantageously, this means that it is not necessary to read the whole file into memory to insert portions of audio. Moreover, it is not necessary to re-write all the portions and partial portions of the dictation that will be moved along in the dictation as a result of the insertion.

[0018] The method further comprises storing a second recording marker defining the start of the second recording portion in the file, the second playback marker and the second recording mode in a second entry of the transaction table. As with the first entry, the audio data is written to disk and its location in the dictation and recording mode is noted in a lookup table.

[0019] The method further comprises storing the transaction table associated with the file. In other words, the transaction table is stored to disk in association with the file (which contains a recording that comprises a plurality of portions) so that the file can be correctly assembled when it is read. The recording may be in the form of an audio stream in the file.

[0020] The method may further comprise creating a consolidated audio map to describe how to play back the digital dictation from the plurality of recording portions within the file.

In other words, to relate the positions of the dictation to the positions of the audio in the file. The audio map provides a view of what the dictation looks like at the current time. This is different to the transaction table. Each entry of the transaction table describes a relative change compared to the state of the dictation after the previous entry. To generate the audio map, each entry from the transaction table is read in turn and processed. Each entry of the transaction table contains, a recording mode (such as insert, overwrite or delete), a playback marker, and a recording marker (indicating where the audio portion to insert is stored).

[0021] The method of generating the audio map then further comprises determining for each entry a length of a portion to which the recording marker relates (this can be done in a number of ways, as discussed later) and incorporating each entry into the audio map by performing one of the following methods, depending on the recording mode of the entry.

Insert [0022] Use the recording map to identify the part of the recording that currently corresponds with the playback marker and create a new recording marker at that point in the recording (this recording marker will be used in the last step). Increase the position of all playback markers that are greater than the playback marker in the entry by the length of the audio stream. In other words, move all the audio that will end up after the current stream back by the length of the current stream. In actual fact, no audio is moved, rather the pointers in the audio map are moved along in time. This creates a space in the map in which pointers to the next stream of audio can be inserted.

[0023] Then, relate the playback marker in the entry to recording marker in the entry (place the pointer to the start of the audio stream at the start of the gap that has been created).

[0024] Next, create a new playback marker in the dictation at a position defined by the playback marker plus the length of the audio and relate the new playback marker to the new recording marker. This creates a pointer in the audio map to indicate which portion of the audio to play after the inserted portion has been played. In effect, this pointer returns to the place in the recording that would have been playing had the insert not been made.

Overwrite [0025] Use the recording map to identify the part of the recording that currently corresponds with a point defined by the playback marker plus the length of audio and create a new recording marker at that point in the recording (this recording marker will be used in the last step). Remove all playback markers from the audio map that are greater than the playback position in the entry but less than the playback position in the entry plus the length of the audio. In other words, erase the pointers to the audio that is no longer present in the dictation (because it has been overwritten) and leave a gap for the next recording portion to be placed into.

[0026] Then, as with the insert method, relate the playback marker in the entry to the recording marker in the entry, create a new playback marker in the dictation at a position defined by the playback marker plus the length of the audio and relate the new playback marker to the new recording marker. This creates a pointer in the audio map to indicate which portion of the audio to play after the inserted portion has been played. Unlike with the insert mode, this pointer does not return to the place in the recording that would have been playing had the insert not been made. Instead, this pointer returns to a place in the recording later than that, as if the audio that was previously present in the audio map had been overwritten. In actual fact, no data is overwritten, only the pointers to the audio are removed.

Append [0027] Append may be performed implicitly by inserting or overwriting at the end of a dictation. Alternatively a separate record mode may be provided for append. In this case, relate the playback marker in the entry to the recording marker in the entry. It may also be necessary to indicate the length of the recording portion so that the correct length of the map is maintained.

[0028] The method can further comprise playing back the recorded audio in the correct order, if the user wishes to review the dictation. The method then further comprises receiving a request for a segment of the digital dictation at a specified playback position. In other words, a request to retrieve the piece of audio requested by the user.

[0029] On receipt of this request, the audio map can be used to identify the corresponding segment of audio in one of the recording portions. Then a segment of audio data from the file that corresponds to the requested segment of the digital dictation at the specified playback position is read and then returned (this may mean that the segment is played through speakers on the user’s computer).

[0030] The second playback marker may be earlier in the digital dictation than the first playback marker plus a length of the first recording portion. This is because the user may have rewound the recording and either inserted or overwritten the audio in a particular place.

[0031] One or more of the segments of the first portion of audio may not have a corresponding playback position in the digital dictation. This can happen if segments of the first portion have been overwritten by the second portion.

[0032] One or more segments of the first portion of audio may have a corresponding playback positions in the digital dictation after the second playback position. The corresponding playback positions in the dictation are defined in the audio map. This can happen if the second portion has been inserted into the first portion and the remainder of the first portion is played after the second portion completes.

[0033] A first and second recording mode may be stored in the first and second entries of the transaction table, respectively. This recording mode indicates whether the transaction corresponds to an overwrite, insert, append, or delete operation, for example.

[0034] The last segment of the first recording portion and the first segment of the second recording portion may not be sequential in the digital dictation when it is played. This is because the second portion may start before the first portion has ended during playback. However, the first and second portions will be stored sequentially in the file.

[0035] The method may further comprise capturing a portion of audio from a microphone and converting the portion of audio to portion of digital audio data. The method is used for storing digital dictations and the most likely source of these is a digital microphone. For example, this may be plugged into the user’s computer.

[0036] The method may further comprise storing the length of the first and second recording portions in the first and second entries of the transaction table, respectively. The length of the recording portion can be useful in creating the audio map. However, it is not essential as this can be calculated from the length once the recording portion has been read. It may be possible to identify the end of the recording portion from some digital marker. Alternatively, no such marker may be present and the recording portion may be read from the file using the length provided in the transaction table to identify where the portion ends. The length may be a length in time or may be a length of data (for example in bits or bytes).

[0037] The first and second recording portion of digital audio can be encoded before being stored in the file. The encoding scheme used can be one of any number of know encoding systems, such as those mentioned in the background section above. Encoding the audio can reduce the size of the audio file on disk and so improve utilisation of resources.

[0038] The first and second recording portions of digital audio can be encrypted before being stored in the file. This advantageously allows the dictation to be stored securely so that only users with the relevant key can decrypt the audio and listen to the data.

[0039] A method of playing a digital dictation stored in a file is also provided. This method is similar to the methods described above, in relation to generating an audio map and playing back audio. The comments in relation to those methods therefore apply equally to this method.

[0040] This method differs from the methods described above in that the recording portions of digital audio need to be read from the file and the transaction table in association with the file also needs to be read. The transaction table can be stored in the same file as the recording portions or it can be kept separately.

[0041] A delete mode may also be provided. In this case, generating the audio map may further comprise a method for incorporating an entry whose record mode is delete. This method is performed by using the recording map to identify the part of the recording that currently corresponds with a point defined by the playback marker plus the length of audio. This is the point of audio that will be returned to after the delete. If this falls at the end of the file then the step of creating an additional recording marker may be skipped (as there is no audio to return to after the delete).

[0042] Then, the method comprises removing playback markers that are after the playback marker in the entry but earlier than the playback marker in the entry plus the length of audio specified in the entry. In other words, delete all pointers to the section of audio that is to be deleted.

[0043] Then, decrease the positions of the playback markers that are after the playback marker in the entry plus the length of audio specified in the delete entry by the length of audio specified in the entry. In other words, all the audio after the gap that has been created by the delete is moved to the start of the gap so that the gap is filled. In actual fact, no audio is moved. Instead, the pointers in the audio map are moved back in time.

[0044] Each entry of the transaction table may further comprise a length of audio. Determining a length of a portion audio to which the recording marker relates may then comprise reading the length of audio from the entry of the transaction table.

[0045] Alternatively, determining a length of a portion audio to which the recording marker relates may comprise reading a recording marker from a next entry of the transaction table and calculating the difference to determine the length.

[0046] When playing back audio from the digital dictation, a playback buffer to contain the audio from the digital dictation may be created. The method may then further comprise checking whether the playback buffer is full and requesting a next sequential segment of the digital dictation at the next sequential playback position if the buffer is not full. In this way, the quality of play back can be improved by obtaining segments of audio for playback ahead of time.

[0047] If the audio has been decoded then the playback method may include the step of decoding the segment of audio data.

[0048] If the audio has been encrypted then the method may comprise decrypting the segment of audio data.

[0049] A method of deleting audio from a digital dictation stored in a file is also provided. The digital dictation is defined by a transaction table stored in association with the file and comprising a plurality of entries, as discussed above in relation to storing the dictation and playing the dictation back. The method of deleting audio comprises receiving a playback marker indicating the position of the audio in the digital dictation that is to be deleted (where the deletion starts), receiving a length of the audio in the digital dictation that is to be deleted (where the deletion stops), and storing the playback marker and length in a new entry in the transaction table. In other words, deleting a section of audio is made very simple by the present invention. There is no need to erase the audio from the file and shuffle the remaining audio along, which can be wasteful of resources. Instead, it is possible to simply append an instruction to the transaction table that will cause the section to be deleted from the audio map (and so it will not be played).

[0050] In an alternative method, an offset of each segment of the audio may be defined. Each segment of audio in each recording [portion is then separated from the start of the recording portion by an offset (which may be zero for the first segment). Relating a playback position of the digital dictation to a segment of audio from one of the recording portions can then be done by storing the playback marker, the recording marker and the offset of the segment in the audio map.

[0051] Reading a segment of audio data from the file that corresponds to the requested segment of the digital dictation at the specified playback position can then be performed using the recording marker of the corresponding recording portion and the offset of the segment within the recording portion to locate the required segment. This can be done as an alternative to creating a new playback marker and recording marker to indicate where in the recording the player should return to after the end of an inserted section is reached.

[0052] The audio map of the present invention is described above as comprising recording and playback markers. Alternatively, the recording portions may be described as comprising segments of audio. These may be individual units that are indivisible or may alternatively be large chunks of the recording portion. If a stream is to have audio inserted into the middle of it (in the latter case), the recording portion may be split into further segments so that the new audio is placed in between segments. A new reference point may be added to the audio map to account for this. This may advantageously prevent any gaps or glitches from forming in the dictation audio. In this way, the audio may be thought of as a strip that is sliced up and reassembled in the correct order.

Brief Description of the Drawings [0053] Figure 1 shows a container for the audio file.

[0054] Figure 2 shows an example Security Stream Layout.

[0055] Figure 3 shows example encryption specific information.

[0056] Figure 4 shows the transaction stream structure.

[0057] Figure 5 shows the Transaction Record Layout.

[0058] Figure 6 shows the Descriptions of the fields and values of the Transaction Record.

[0059] Figure 7 shows example values for the modes of operation for the transaction records.

[0060] Figure 8 shows an example Transaction Stream Table.

[0061] Figure 9 shows how the example audio from Figure 8 could be stored on disk.

[0062] Figure 10 shows the location of the audio from the transaction table of Figure 8 once it has been loaded into memory.

[0063] The Recording Flow is shown in Figure 11.

[0064] The Playback Flow is shown in Figure 12.

[0065] The Audio Deletion Flow is shown in Figure 13.

[0066] Figure 14 shows the Recording Recovery Flow.

[0067] Figure 15 shows the BigHand Hub.

[0068] Figure 16 shows an example of the BigHand Proofing Window.

[0069] Figure 17 shows the speech recognition management process in more detail.

[0070] Figure 18 shows the components that are involved in speech recognition.

[0071] Figure 19 shows an example of the BigHand Capacity Manager.

[0072] Figure 20 shows an example of Global Effort Configuration.

[0073] Figure 21 shows an example Work Type Effort Estimation Configuration.

[0074] Figure 22 is a flowchart showing BigHand Effort Calculation.

[0075] Figure 23 shows an example of the BigHand Now New Task Screen.

[0076] Figure 24 shows an example of the BigHand Now Form.

Detailed Description [0077] The audio file provided by the present invention (BigHand Audio File, BHF) is used to store audio data for digital dictations within systems. It enables audio be files to be stored and efficiently edited with operations such as overwrite, insert and delete, without having to rewrite the file on disk. This provides a fast and efficient storage mechanism for audio files that reduces disk I/O. This reduces the delay that may be incurred when performing various operations. It also has the ability to encrypt the audio on disk to ensure that the data is securely secured and can only be accessed by people in possession of the relevant keys.

[0078] The BHF file format utilises the Compound File Binary File Format data structure (as described in the Advanced Authoring Format (AAF) Low-Level Container Specification v1.0.1), to enable multiple streams of data to be stored within a single file on a file system. The different streams within a BHF file are used to describe its version, encryption details, the type of audio that is stored within the file and the actual audio data.

[0079] Figure 1 shows the BHF container. The BHF file that is stored on disk is a container of four streams each of which can be appended to store more data as edits are made to the audio data, this enables all the information in an audio recording to be transferred securely as a single entity.

[0080] Figure 2 shows an example Security Stream Layout. The presence of a Security stream indicates that the file is encrypted and contains information about the encryption mode that has been used to create the file.

[0081] It contains the version of that encryption that has been used allowing the encryption mechanism to be changed if necessary in the future and for software components to determine whether they are capable of dealing with this type of encryption.

[0082] Figure 3 shows example encryption specific information. The sort of encryption data that may be included here is shown. This provides enough information to decrypt the audio data, once the appropriate key is provided.

[0083] Figure 4 shows the transaction stream structure. The transaction stream is used to store the actions that have been performed on a file. When a file is opened, the transaction stream is used to build an audio map, which maps the logical audio stream to the correct file position on disk.

[0084] The transaction stream consists of multiple transactions records (discussed below). Each of the transaction records in this example are separated by a new line, as shown in Figure 4.

[0085] Each transaction record denotes an action that has occurred on the audio stream when it was in its previous state as represented by the transaction record. In this example, the data for each transaction record is stored on a single line and each value is separated by commas see Figure 5.

[0086] Figure 6 shows the Descriptions of the fields and values of the Transaction Record. “File position” indicates the start position of the audio within the file (also referred to as the address of the audio stream). “Position” indicates the position of the audio within the file (also referred to as the playback position). “Mode” indicates the type of transaction (this is also discussed below, in relation to Figure 7). “Length” indicates the length of the data in this transaction. However, in other examples, this could be the length of the audio, rather than the length of the data.

[0087] The transactions record includes the position and length in the data stream of any audio that was included in that the transaction. It also includes the position of the audio map so that it can be applied in the correct location when the in memory model is built. It also includes the mode of the operation for the transaction, which can be overwrite, insert or delete. Figure 7 shows example values for the modes of operation for the transaction records.

[0088] The audio map is the map in memory between the audio stream and how it has been stored in disk. This is created by replaying each structure transaction that has been recorded on the transaction stream in order to the in-memory model.

[0089] Figure 8 shows an example Transaction Stream Table. This represents the transaction table as it exists on disk. This provides the raw detail of the how the audio has been edited and where it is stored on disk. An extra key column is included to identify how this data maps to audio position and file position.

[0090] Figure 9 displays the audio from the transaction table in Figure 8 as it is stored on disk. This figure shows that each section (or stream) of audio that has been recorded is appended on the previous section (or stream). However, its position in the audio recording is not necessarily related to this, as can be seen in Figures 8 and 10. Figure 9 also shows that no data is included for the “delete” operation. This is because removing data from the audio map (as opposed to adding) does not create any new audio data to be stored on disk.

[0091] Figure 10 shows the location of the audio from the transaction table of Figure 8 once it has been loaded into memory. It shows how the audio data on disk maps to the audio segments in memory when compared with Figure 9 and the order in which it will been played back. As can be seen, the third transaction has been inserted in the middle of the first transaction, splitting it in two. Moreover, the last section of audio has been deleted. When the audio data is played back, it will be returned in this order.

[0092] The wav stream contains a WAVE header (as described in the WAVE audio file specifications), which describes the type of audio compression that has been used on the audio recording to enable it to be played back correctly.

[0093] The data stream contains the audio data that may be encrypted depending on the Security stream presence and setting. Audio is always appended to this stream and mapped back to the correct location using the transaction stream (as mentioned above).

[0094] There are several different operations that can occur when using a BHF file.

Figures 11 to 14 describe the behaviour of those operations.

[0095] The Recording Flow is shown in Figure 11. This demonstrates how audio is added to the file, along with the transactions that represent the audio’s position within the recording, and the decisions about when the audio is encrypted.

[0096] The Playback Flow is shown in Figure 12. This illustrates how audio is retrieved from a BHF file audio for playback.

[0097] The Audio Deletion Flow is shown in Figure 13. This demonstrates the process for deleting sections of audio from the recording.

[0098] If for any reason the application process terminates during recording, a recovery will be performed the next time the file is opened. This is performed by determining if there is any open recording transaction, calculating how much data was recorded to the stream and then closing the stream. Figure 14 shows the Recording Recovery Flow. This allows the audio to be recovered and helps to reduce data loss that can occur is there is an error.

[0099] As mentioned previously, the audio file format provided by the present invention has particular advantages when used with digital dictation software. One example of digital dictation software in which this file format is may be used is the Digital Dictation Workflow software. This software will be described in more detail below.

[00100] Figure 15 shows the BigHand Hub, which is the core of the Digital Dictation Workflow software. The Hub provides a central overview of work within the system. It enables users to create tasks and dictations, and gives users a view of the work that they have in In Progress, and that has been assigned. As well as this, the BigHand Hub provides the ability to view work throughout other departments, depending on the configuration.

[00101] Workflow data is presented in the Hub. The data is presented so that work can be filtered according to its status (such as, In Progress, Overdue and Pending). The grid also allows work to be grouped. This enables users to view their work in a way that is suitable to their priorities.

[00102] Speech Recognition Management may also be supported. BigHand integrates with third party speech recognition engines to facilitate the automated transcription of text, audio files that are input into the BigHand System through the recorder or various other sources such as mobile devices. With the invention of the BigHand Proofing Window (see Figure 16) within the BigHand Hub, the management of user’s speech recognition profiles has been wholly encapsulated within the BigHand software suite. This greatly simplifies the process of managing and correcting users’ speech recognition profiles.

[00103] Figure 16 shows an example of the BigHand Proofing Window.

[00104] The process works by BigHand utilising its workflow engine to move dictations between automated and manual steps to manage the following actions: 1. Creating or loading the correct user’s profile into a third party speech recognition engine; 2. Transcribing the text; 3. Returning the transcribed text to the user; 4. Capturing corrections to text; and 5. Automating submission of the corrections back to the speech recognition engine to train that user’s profiles, if they deem it as a good example to train on (a correction may be a good training example if there were no abnormal audio quality problems such as external noise, which you would not want to train your profile on, for example).

[0100] Figure 17 shows the speech recognition management process in more detail and Figure 18 shows the components that are involved in speech recognition.

[0101] An example of the BigFland Capacity Manager is shown in Figure 19. This dashboard combines powerful analytics and management tools into a single package to present the user with the information needed to manage their document production teams effectively.

[0102] The dashboard pulls information from the BigFland workflow and lets the user clearly see: • the number of tasks that have been submitted to each document production team; • the number of tasks that have been completed; and • how many tasks are still outstanding.

[0103] Additionally, the dashboard provides an estimate of the time it will take the teams to complete the outstanding work, which means that it is possible to plan more accurately.

[0104] BigFland Capacity Manager allows the user to: • See how quickly teams are completing work; • Calculate the secretarial effort required to complete outstanding tasks; • See the percentage of tasks completed within Service Level Agreements (SLAs); and • View the number of tasks being submitted into the BigHand workflow.

[0105] When viewing the dashboard, the information may be viewed as the overview of: • Number of tasks; • Duration (total length of audio if dealing dictations); and/or • Effort (the total amount of effort).

[0106] Figure 20 shows an example of Global Effort Configuration. Effort calculation within BigHand can be configured in several ways to allow users flexibility over how effort is calculated. At the top level, a global effort multiplier can be defined (see Figure 20) which will be applied to all dictations unless it has been overridden by specifying a ‘work type’ in its Meta data that as an overriding effort calculation specified (see Figure 21).

[0107] Figure 21 shows an example Work Type Effort Estimation Configuration. Effort calculation can be configured in a number of different ways. Four examples of this are: 1. A specified amount of time based on the work type; 2. A multiple of the length of a dictation plus a specified amount of time depending on the work type; 3. Use the global multiplier if configured; and/or 4. Use no effort estimation.

[0108] This effort calculation is then used to calculate start times based on the due by time, which is set based on the priority of a task, the start by time can be viewed in the client & capacity manager to enable operators to determine the order in which work should be carried out.

[0109] Figure 22 is a flowchart showing BigHand Effort Calculation.

[0110] BigHand Now is an application that lets the user turn their tasks into fully auditable, digital workflow entries. Tasks can be created from voice, email, electronic or paper-based requests, from document production requests to reprographics and travel bookings.

[0111] Tasks can be created quickly and easily by completing a pre-configured form and attaching any accompanying files. Once in the workflow, tasks can be assigned to a specific team or team member for processing and monitored through to completion.

[0112] Figure 23 shows an example of the BigHand Now New Task Screen. The New Task screen allows users to create a task. By selecting the work type, the application determines which form (if any) it should show when the user clicks next. The Requested By field can be modified to be another user if the user has the relevant rights.

[0113] Figure 24 shows an example of the BigHand Now Form. Once the user clicks “next”, they are presented with the relevant Form. The is defined in the database and specifies the type and number of fields and metadata such as label, the field the selected value should be saved into and whether a field is mandatory. The values of dropdowns, checkboxes and radio buttons are defined through a dynamic data values mechanism, which enables the software to display a flexible number of options, depending on what is returned.

[0114] Once submitted a Now Task enters the BigHand workflow system where it can be opened within the BigHand Hub, and dynamically routed to the relevant people based on its meta data moving through the workflow until it is complete.

[0115] While the audio file of this invention has been described in relation to Digital Dictation software, other uses for this audio file exist.

Claims

CLAIMS:

1. A method of storing a digital dictation in a file comprising a recording, the recording comprising a plurality of recording portions, the method comprising: receiving a first recording portion of digital audio that has been captured from a microphone; generating a first playback marker defining a position in the digital dictation for the first recording portion to start to play; determining a first recording mode of the first recording portion; storing the first recording portion in the file; storing a first recording marker defining the start of the first recording portion in the file, the first playback marker and the first recording mode in a first entry of a transaction table; receiving a second recording portion of digital audio that has been captured from a microphone after the first recording portion of audio has been captured; generating a second playback marker defining a position in the digital dictation for the second recording portion to start to play; determining a second recording mode of the second recording portion; storing the second recording portion in the file by appending the second recording portion to the first recording portion; storing a second recording marker defining the start of the second recording portion in the file, the second playback marker and the second recording mode in a second entry of the transaction table; and storing the transaction table associated with the file.

2. The method according to claim 1, wherein the first and second recording modes are selected from a list comprising insert, overwrite, and delete.

3. The method according to claim 1 or claim 2, further comprising generating an audio map to describe how to play back the digital dictation from the plurality of recording portions within the file by: reading each entry from the transaction table, each entry comprising: a recording mode, a playback marker, and a recording marker; determining for each entry, a length of a portion audio to which the recording marker relates; incorporating each entry into the audio map by performing one of the following methods, depending on the recording mode of the entry: a) if the mode is insert then: use the recording map to identify the part of the recording that currently corresponds with the playback marker and create a new recording marker at that point in the recording, increase the position of all playback markers that are greater than the playback marker in the entry by the length of audio, relate the playback marker in the entry to the recording marker in the entry, and create a new playback marker in the dictation at a position defined by the playback marker plus the length of audio and relate the new playback marker to the new recording marker; or b) if the mode is overwrite then: use the recording map to identify the part of the recording that currently corresponds with a point defined by the playback marker plus the length of audio and create a new recording marker at that point in the recording, remove all playback markers from the audio map that are greater than the playback position in the entry but less than the playback position in the entry plus the length of the audio, relate the playback marker in the entry to the recording marker in the entry, and create a new playback marker in the dictation at a position defined by the playback marker plus the length of audio and relate the new playback marker to the new recording marker.

4. The method according to claim 3, further comprising: receiving a request for a segment of the digital dictation at a specified playback position; identifying from the audio map the corresponding segment of audio in the recording; reading a segment of audio data from the file that corresponds to the requested segment of the digital dictation at the specified playback position; returning the segment of audio data.

5. The method according to claim 3 or claim 4, wherein at least one segment of the first recording portion does not have a corresponding playback position in the digital dictation, according to the audio map.

6. The method according to claim 3 or claim 4, wherein at least one segment of the first recording portion has a corresponding playback position in the digital dictation according to the audio map that occurs later in the digital dictation than the second playback marker.

7. The method according to any preceding claim, wherein a last segment of the first recording portion and a first segment of the second recording portion are not sequential in the digital dictation when it is played.

8. The method according to any preceding claim, further comprising capturing a portion of audio from a microphone and converting the portion of audio to a portion of digital audio data.

9. The method according to any preceding claim, further comprising storing the length of the first and second recording portions in the first and second entries of the transaction table, respectively.

10. The method according to any preceding claim, further comprising: encoding the first recording portion of digital audio; and encoding the second recording portion of digital audio.

11. The method according to any preceding claim, further comprising: encrypting the first recording portion of digital audio; and encrypting the second recording portion of digital audio.

12. A method of playing a digital dictation stored in a file comprising a recording, the recording comprising a plurality of recording portions, the method comprising: reading a transaction table associated with the file, the transaction table comprising a plurality of entries; generating an audio map to describe how to play back the digital dictation from the plurality of recording portions within the file by: reading each entry from the transaction table, each entry comprising: a recording mode, a playback marker, and a recording marker; determining for each entry, a length of a portion audio to which the recording marker relates; incorporating each entry into the audio map by performing one of the following methods, depending on the recording mode of the entry: a) if the mode is insert then: use the recording map to identify the part of the recording that currently corresponds with the playback marker and create a new recording marker at that point in the recording, increase the position of all playback markers that are greater than the playback marker in the entry by the length of audio, relate the playback marker in the entry to the recording marker in the entry, and create a new playback marker in the dictation at a position defined by the playback marker plus the length of audio and relate the new playback marker to the new recording marker; or b) if the mode is overwrite then: use the recording map to identify the part of the recording that currently corresponds with a point defined by the playback marker plus the length of audio and create a new recording marker at that point in the recording, remove all playback markers from the audio map that are greater than the playback position in the entry but less than the playback position in the entry plus the length of the audio, relate the playback marker in the entry to the recording marker in the entry, and create a new playback marker in the dictation at a position defined by the playback marker plus the length of audio and relate the new playback marker to the new recording marker; receiving a request for a segment of the digital dictation at a specified playback position; identifying from the audio map the corresponding segment of audio in the recording; reading a segment of audio data from the file that corresponds to the requested segment of the digital dictation at the specified playback position; and returning the segment of audio data.

13. The method according to any of claims 3 to12, wherein incorporating each entry into the audio map further comprises: c) if the recording mode is delete then: if the playback marker plus the length of audio is not at the end of the dictation, use the recording map to identify the part of the recording that currently corresponds with a point defined by the playback marker plus the length of audio and create a new recording marker at that point in the recording; remove all playback markers that are after the playback marker in the entry but earlier than the playback marker in the entry plus the length of audio in the entry, and decrease the positions of the playback markers that are after the playback marker in the entry plus the length of audio specified in the delete entry by the length of audio specified in the entry; and if a new recording marker has been created, relate the new playback marker in the entry to the new recording marker.

14. The method according to any of claims 3 to 13, wherein: determining a length of a portion audio to which the recording marker relates comprises reading a recording marker from a next entry of the transaction table and calculating the difference to determine the length; or each entry from the transaction table further comprises a length of audio and determining a length of a portion audio to which the recording marker relates comprises reading the length of audio from the entry of the transaction table.

15. The method according to any of claims 3 to 14, further comprising: creating a playback buffer to contain the audio from the digital dictation; checking whether the playback buffer is full; and requesting a next sequential segment of the digital dictation at the next sequential playback position.

16. The method according to any of claims 3 to 15, further comprising decoding the segment of audio data.

17. The method according to any of claims 3 to 16, further comprising decrypting the segment of audio data.

18. A method of deleting audio from a digital dictation stored in a file, the digital dictation being defined by a transaction table stored in association with the file, the transaction table comprising a plurality of entries, the method comprising: receiving a playback marker indicating the position of the audio in the digital dictation that is to be deleted; receiving a length of the audio in the digital dictation that is to be deleted; and storing the playback marker and the length in a new entry in the transaction table.