GB2290683A

GB2290683A - Editing recorded material

Info

Publication number: GB2290683A
Application number: GB9412305A
Authority: GB
Inventors: John Lawrence; Martin Porter; Michael Joseph Kemp
Original assignee: STUDIO AUDIO AND VIDEO Ltd
Current assignee: STUDIO AUDIO AND VIDEO Ltd
Priority date: 1994-06-20
Filing date: 1994-06-20
Publication date: 1996-01-03
Anticipated expiration: 2014-06-20
Also published as: GB9412305D0; GB2290683B

Abstract

A method of, and apparatus for editing recorded material stored in the form of electrical signals in or on a storage medium, to give a desired duration to a reproduction of the recorded material, by establishing a threshold corresponding to an information content below which the recorded material is dispensible and altering the duration of one or more of the periods during which the recorded material falls below that threshold, to obtain the desired duration for the reproduction. More specifically a method and apparatus for duration editing of sound recordings is described, in which the threshold represents a given signal amplitude, which may be selected with reference to stored data indicative of the amplitude of the originally recorded signal with time, and derived from it. The recording apparatus includes circuitry for recording the audio signals and profile data derived from it on a disc. The profile value read out from the disc are compared with a user adjustable threshold to modify the duration of the recorded material. <IMAGE>

Description

Method and Means for Editing Recorded Material The present application relates to a method and means for editing recorded material, in particular editing such material to a required duration.

In accordance with the present invention, duration, or length, editing of a pre-recorded signal is performed by operating only on that part of the recorded material below a predetermined threshold level, thereby enabling the duration or length of the recording to be modified without affecting the quality or pitch of the recorded material lying above that threshold level.

The invention is particularly applicable to the editing of recorded speech but may be used on any recorded material comprising "quiet" periods, during which there is no recorded material whose integrity is to be maintained.

The embodiment of the invention described herein with reference to the seven figures of the accompanying drawings, is of recorded speech editing apparatus and incorporates a method of storing amplitude information allowing the necessary calculations to be performed sufficiently fast to permit interactive selection of the time compression parameters.

Using the described embodiment, an operator is able readily to review duration edited material and revise if it should sound unacceptable.

ererring to the crawing, an audio recording -is made by means of the digital audio system of Figure 1.

The microphone 10 picks up speech which is amplified in amplifier 11 and fed to analogue to digital converter 12, and thence as digital data signals to the digital signal processor system 13. These signals may be buffered on local memory (not shown) and transferred to the disk system 14. The signals may also be passed on to digital to analogue converter 15 and amplifier 16 to a loudspeaker 17 for monitoring.

Digital audio may also be entered directly to the DSP system, for example from digital audio recordings, and may be fed out digitally to a digital recorder for further use.

Digital audio signals may also be read from the disk 14 into DSP 13 and fed out to digital to analogue converter 15 as above to listen back to a recording made on the disk.

The signals as they pass through the DSP may be processed in ways to be described. The whole system is under the control of a computer system 18 with its own program disk 19, graphic display monitor 20, keyboard 21 and mouse 22 by which method the operator interacts with the control software which instructs the DSP to perform the desired processes.

The control software is capable of playing material recorded on disk in an edited form in which a piece from one part of a recording on tape can be played, followed by another piece from the same recording or a different recording, and so on. Two pieces can be played at the same time and the volume of each can be independently controlled. Typically when paying edited material, at the end of one piece of audio it is faded out rapidly as another piece being played is faded in rapidly. This can result in a "seamless" join to the listener. Further details of the playing of material from disk subject to an edit list may be found, for example, in "The Art of Digital Audio" by John Watkinson, Focal Press revised reprint 1989 pp447-450.

Fig 3 shows a typical printout of a simple edit decision list (EDL) (also sometimes known as an edit list). The column 24 shows the time (from pressing "play") that an event occurs. The name of the recording (track) is shown in column 25 (all the examples here are from the same track). Column 26 shows the elapsed time through the track at which the first sample to be played at the EDL time occurs. For example, entry 1 shows that at EDL time 4.2 seconds the track "inbz4820" will be played from the start of the track (time 0). The track will stop playing when the time through the track shown in column 27 is reached.

In the case of event 1 this will be 14.2 seconds after starting, or 18.4 seconds after pressing "play". Event 2 shows that another clip (or part of a track) will be played 13.4 seconds after starting (i.e. overlapping by 5 seconds). The audio to be played will be from 9.2 seconds through the original track and will play for 10.505 seconds. In the example, it will be seen that event 1 and event 2 are actually playing identical bits of audio over the 5 second overlap. Usually this is not the case as examination of the other items would show.

Fig 4 shows a typical graphical representation of the 4 such clips. The axis from left to right represents play time, and the "clips" are shown to look like pieces of cut audio tape on two alternating rows. Clip 1 thus fades in and plays for a while, then fades out while clip 2 is fading in. Clip 2 then plays until it fades out, etc.

By means of known techniques the user is able to position a cursor over various clips and using the mouse he can drag items around, instruct the computer to change the length of crossfades, undo edits etc. These changes result in modifications to the EDL of fig 3. The internal representation of the EDL is similar to fig 3 and any appropriate detailed listing may be stored in a computer providing this functionality.

This invention provides an automatic way of generating such a list, which may then be viewed and auditioned by the user of the system and manipulated using these standard tools to his satisfaction.

The processes required to implement the technique of this invention are as follows.

During the recording process a separate recording of profile data is stored on the disk along with the audio using the method shown in fig 2.

Audio is presented in digital form to the digital input 1 and is fed directly to the recording disk drive on which it is stored digitally.

The audio is also fed, sample by sample, to the modulus extractor 3 which simply removes the sign of the digital audio sample. For example the digital audio signal typically represents numbers from -32,000 to +32,000 as a 16-bit signed integer. Positive numbers are passed through the modulus extractor unchanged, whilst negative numbers are converted to minus the input number, e.g. -3 is converted to +3, -10,000 to +10,000 etc.

The output from the modulus extractor 3 is fed both to a comparator 4 and a store 5. The data is not entered into the store unless a "write" signal is applied at input 6 to the store. The output of the store is applied to the comparator 4 and a write signal is generated if the number from modulus extractor 3 exceeds the contents already in the store (from previous steps through this process). Thus the store is updated if a number larger than that already in the store is applied. In this way the store always contains the largest amplitude signal that has appeared at the input since it was last cleared.

A counter 7 counts the incoming samples and every time a predetermined count is reached an signal is generated from output 8 which is fed to the clear input of store 5, the clear input of counter 7, and the write input of store 9.

This causes the current contents of store 5 to be transferred to store 9, and the both the counter and store 5 to be zeroed.

In this way the maximum amplitude sample in a period defined by counter 7 is available in store 9, and remains valid until the same period has elapsed again, when store 9 is updated with the maximum amplitude sample in the next period.

The data in store 9 is arranged under control of the computer system already described to be transferred to disk 2 along with the audio as a separate and much smaller file than the audio. The data in store 9 is arranged to be updated about 100 times per second whereas the digital audio signal presents new samples to the disk typically 48,000 times per second.

Note that it is also possible to reverse the direction of data flow along line 23 so that a recording already made on the disk 2 may be replayed into the circuitry of fig 2 to generate a profile file from store 9 which is recorded back onto the disk.

It is also possible to further reduce the size of the profile by reducing the number of bits stored in store 9 to 8 bits for storage on disk. Store 9 typically contains 16 to 24 bits indicating the amplitude achieved in a given segment. This may be converted to 256 levels according to a modified logarithmic law to indicate approximate levels in decibels (dB) represented by a range of values. This is typically done in a sequence of compares with a table of 256 values. For example, the value 255 can be coded if the amplitude is a maximum (typically Ox7fff to Ox8000 in a 16-bit system), the value 254 may be used for any store 9 value between this level and 0.2dB lower, i.e., between Ox7dl5 and Ox7fff), the value 253 for levels between -0.4dB and -0.2dB (i.e. store 9 values between values Ox7a3c and Ox7d15), and so on.At -30db the steps can be widened to 0.25dB, e.g. value 105 in the profile represents an amplitude between -30dB and-30.25dB, or a store 9 value between Ox04c and OxO3ee. At -40dB steps of 0.5dB can be used and at -50dB steps of ldB can be used. Thus levels down to about -90dB can be coded with high precision at the higher levels and less precision at the lower levels.

The profile file and the audio file are now stored on disk and may be used to perform the calculations required to generate an appropriate EDL for the automatically edited speech recording.

Fig 5 shows the first stage in the process.

The user selects a threshold level (for example on potentiometer 30 which is converted to digital form in analogue to digital converter 31) below which the speech is assumed to be between words, and above which words may be in progress. This level will depend on the background noise, the variation in speech level etc, and may be refined by the user listening to the results of the operation and adjusting the level. The process is sufficiently fast that an optimum setting can be reached interactively and rapidly.

The output of 31 is fed to the B input of comparator 32 and the profile values are fed successively into the A input.

Any profile values above the threshold presented at B cause the counter 33 to increment, and any profile valves below the threshold cause counter 34 to increment. After the entire profile file has been processed (which can be very fast since a 10 minute speech will only require 60,000 compares operations which can be performed in a fraction of a second).

Calculations are now performed on the contents of counters 33 and 34 which are best represented by flow diagram fig 6.

The user is required to enter a further number which represents the ratio by which he wants to change the duration of audio. This can be entered by keyboard, by another potentiometer and analogue to digital converter, or by any other convenient means.

At step 35 the total length of the audio (in terms of profile unit periods, in the example above lOmS units) is calculated by summing the two counters.

At step 36 this length is multiplied by the ratio entered by the user to determine the new length. If the ratio is less than 1 at step 37 then step 38 is executed and the number of units to lose is calculated. If this exceeds the value of counter 34 (at step 39) the process cannot be done and the user is prompted (at step 40) to set a higher threshold or a higher ratio.

If step 39 determines that the number of periods to lose is less than or equal to counter 34, the proportion P of units to change the duration of the audio below the threshold (as counted in counter 34) is calculated at step 42.

Similarly, if at step 37 the ratio is greater than unity then the proportion P of units to change the duration of the audio below the threshold is calculated at step 44.

It is now possible to generate an EDL to reflect the results of these calculations. This is described in the simple case where no fades are used on the clips, they are simply cut when they fall below the threshold set.

A marker is inserted in the EDL and the two variables X and Y which respectively accumulate the duration below the threshold and duration above the threshold in units of time as used in the profile are set to zero.

Fig 7 shows the process whereby the EDL is generated.

A marker is placed at the start of the EDL at which the first clip from the speech track will be placed, the values X (the period in profile units above the threshold) and Y (the period in profile units below the threshold) are zeroed. The first profile value unit is then read from the profile.

If the first unit in the profile file is above the threshold (as tested at step 50) the sample number in the original track is recorded at step 51. This is used as the start of the clip to be played from the original track.

The value X is incremented at step 52 and the next unit is fetched from the file at step 53. At 54 it is determined whether the next profile unit is above the user threshold.

If it is, step 52 is repeated, and X is incremented repeatedly until a profile unit is encountered below the threshold.

At this point step 55 is executed, and the sample number of the source file where the audio fell below the audio threshold is recorded. This is the end of the clip from the original track.

The period the signal remains below the threshold is accumulated in variable Y over steps 56, 57 and 58 until the next profile unit above the threshold is detected at 58. At this point step 59 is executed the clip is placed at the marker point in the EDL using the clip start and stop times recorded. (This columns 24, 26 and 27 of fig 3 can be completed).

At step 60 we advance the marker (in EDL sample units) by the product of X and N plus the product of Y, N and P, where N is the number of samples corresponding to each profile unit, P is the proportion to vary the length of the period below the threshold and X and Y are the durations above and below the threshold respectively of the last period of audio in units of profile data as described above.

Finally at 61 the next profile unit can be fetched and the process can be repeated.

To prevent the rounding errors in calculation 60 from mounting up, any fractional part can be truncated and accumulated until the next time step 60 is reached, then added on until a complete sample time can be recovered before truncation.

Although the above process is described in with the assumption that clips will fade in and out instantly, in fact the clips can be set in the EDL to have a finite time to fade in (in which case the start point is advanced slightly in both the EDL and the clip time so they reach full volume at the marker time) and to fade out.

To accommodate the fade in and fade out time it is necessary to provide an additional control specifying a minimum period which the signal must spend below the threshold before the signal is counted as having fallen below it. This modification must be made both in the comparator 32 of fig 5 and the comparator 54 of fig 7. A typical minimum period is 10 to 50mS.

It is also desirable in a practical system to specify a minimum duration which the signal spends above the threshold, as too many very short clips will generate an excessively long EDL. Although the system may be able to play such an EDL it is important for it to remain a reasonable size so the user can interactively modify it while listening to the results.

A further (optional) refinement is to provide a higher threshold at the end of a period below the threshold than at the end of a period above the threshold. Such a hysteresis prevents the signal experiencing multiple edits when a background noise is hovering just at the threshold level.

These modifications to the comparators processing the profile data can be implemented in a straightforward way in software running on the control computer 18. Profile data may be stored on either the DSP disk 14 and read through the DSP subsystem or on the control computer disk 19, where is can be read directly.

The system as described will replace the periods between the desired speech with periods of silence, the duration being varied to achieve the desired total duration. It can often sound better if, in the case where the duration is being reduced, the background noise is retained normally at the end of a clip until the next one starts. This can simply be achieved by adjusting the clip out point set in fig 7 so that it matches the start of the next clip in point in the EDL. This achieves a smooth edit at the start of the clip which is generally completely inaudible to the listener.

In the case where the duration of the audio is being extended (ratio greater than unity) this is not possible, as there is insufficient "atmosphere" between desired pieces of audio to fill the gap. In this case the clips can be extended to the maximum length practical and a further piece of audio inserted in the gap from another part of the recording. This can either be from a piece of quiet background designated by the user, or by random selection of sections below the threshold made by the control software.

Although the process is described for speech editing, it may be used on any material where there are quiet periods during which edits may be made.

Claims

1. A method of editing recorded material stored in the form of electrical signals in or upon a storage medium in order to impart a desired duration to a reproduction of the recorded material, comprising the steps of: establishing a threshold corresponding to an information content blow which the recorded material is dispensible; and altering the duration of one or more of the periods during which the recorded material falls below that threshold; thereby to obtain the desired duration for the reproduction of the recorded material.

2. A method of editing recorded material stored in the form of electrical signals in or upon a storage medium in order to impart a desired duration to a reproduction of the recorded material, comprising the steps of: establishing a threshold corresponding to an amplitude below which the recorded material is dispensible; and altering the duration of one or more of the periods during which the recorded material falls below that threshold; thereby to obtain the desired duration for the reproduction of the recorded material.

3. A method in accordance with Claim 2 in which the recorded material is initially processed to provide data indicative of the amplitude of the recorded material with time.

4. A method in accordance with claim 3 in which the threshold is selected by reference to the amplitude indicative data.

5. A method in accordance with Claim 4 in which the period or periods within which the duration will be modified are selected with reference to the amplitude indicative data.

6. A method in accordance with any preceding claim in which the duration adjustment of the reproduction of the recorded signal is effected by means of a series of iterations including, at each iteration, selection of a threshold, duration modification of one or more of the one or more periods during which the threshold is not exceeded, and subsequent qualitative monitoring of the duration modified signal.

7. A method in accordance with any preceding claim in which the recorded signals are representative of sound.

8. A method in accordance Claim 7 in which the recorded signals are representative of speech.

9. Apparatus for duration editing a reproduction of recorded material stored in the form of electrical signals in or upon a storage medium, comprising means for selecting a threshold representative of an information content below which the recorded material is dispensible, and means enabling the duration of any one or more of the periods during which the recorded material does not exceed the threshold, to be modified in dUration, thereby to achieve a desired duration for the reproduction of the recorded material.