WO1994024667A1 - Apparatus for recording and reproducing voice - Google Patents
Apparatus for recording and reproducing voice Download PDFInfo
- Publication number
- WO1994024667A1 WO1994024667A1 PCT/JP1994/000661 JP9400661W WO9424667A1 WO 1994024667 A1 WO1994024667 A1 WO 1994024667A1 JP 9400661 W JP9400661 W JP 9400661W WO 9424667 A1 WO9424667 A1 WO 9424667A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- signal
- data
- reproducing
- part display
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/005—Reproducing at a different information rate from the information rate of recording
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/20—Disc-shaped record carriers
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/20—Disc-shaped record carriers
- G11B2220/21—Disc-shaped record carriers characterised in that the disc is of read-only, rewritable, or recordable type
- G11B2220/215—Recordable discs
- G11B2220/218—Write-once discs
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/20—Disc-shaped record carriers
- G11B2220/25—Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
- G11B2220/2537—Optical discs
- G11B2220/2545—CDs
Definitions
- the present invention relates to an audio recording / reproducing device, and more particularly to an audio reproducing device such as a learning / education device or an audio electronic book.
- An object of the present invention is to solve the above-mentioned problems of the prior art, to store a sufficient amount of audio data in a predetermined recording medium and to obtain an audio output close to natural reading aloud for a long time during reproduction.
- a playback device is provided. Disclosure of the invention
- an audio recording / reproducing apparatus basically has the following technical configuration.
- Voice signal storage means for converting a non-voice portion included in the voice signal into a predetermined non-voice portion display data overnight signal and storing the voice signal; voice reproduction means for reproducing the stored voice signal at a desired utterance speed; More specifically, a voiceless part display converting means for converting a voiceless part included in an input voice signal into a predetermined voiceless part display data signal; Storage means for storing an input audio signal including the non-audio partial display data signal converted by the partial display conversion means, and more specifically, A non-sound portion display conversion means for converting the included no-sound portion into a predetermined no-sound portion display data signal; and the no-sound portion display data signal converted by the no-sound portion display conversion means.
- Storage means for storing including input audio signal, and an audio recording and reproducing apparatus which consists of a sound reproducing means for reproducing the input audio signal stored in the storage means at a desired speech rate.
- digital audio data is stored in a storage medium in a form in which substantially no audio portions are deleted, and the non-audio time is added at the time of reproduction, so that the audio data can be stored.
- the medium can store sufficient audio data, and this silent time is added during playback, so that a long-term audio output similar to natural reading can be obtained.
- FIG. 1 is a diagram showing an embodiment of the recording means of the present invention.
- FIG. 2 is a diagram showing an embodiment of the reproducing means of the present invention.
- FIGS. 3 (A), 3 (B), 4 (A), and 4 (B) are diagrams for explaining an embodiment of the present invention.
- FIGS. 5A and 5B are diagrams showing an example of a method for identifying the length of a silent part in the present invention.
- FIG. 6 is a diagram showing another embodiment of the present invention.
- FIG. 7 is a flowchart specifically showing the non-speech processing unit shown in FIG.
- FIGS. 8 (A) and 8 (B) are diagrams showing another example of a signal example in a case where a silent part is substantially deleted.
- FIG. 9 is a block diagram showing a configuration of another embodiment of the present invention.
- FIG. 10 is a flowchart for performing the determination of the silent part in the present invention.
- FIGS. 11 to 12 are flowcharts showing the operation procedure for executing the silent part deleting method in another specific example of the present invention shown in FIG.
- FIGS. 13 (A) to 13 (C) are diagrams illustrating an example of determining the brass / minus component mode in the present invention.
- FIG. 14 is a flowchart for explaining the operation procedure of the rearrangement means in the specific example of FIG. 9 according to the present invention.
- FIG. 15 is a waveform diagram showing an example of an audio input signal used in the present invention.
- FIG. 1 is a block diagram schematically showing the configuration of a specific example of the recording means 100 of the input audio signal in the audio recording / reproducing apparatus according to the present invention.
- a recording means 100 is shown, which comprises a storage means 11 for storing an input audio signal including the input audio signal.
- (11) is a recording medium, which is mainly composed of a digital storage medium such as an optical disk, a magneto-optical disk, and a magnetic disk.
- (111) is a writing means, which comprises a writing head, a head driving driver, and the like.
- (1) is an analog audio input means, which comprises a microphone, a filter, an amplifier, and the like.
- the AZD conversion means (2) is AZD conversion means for converting an analog audio signal into a digital audio signal. Further, the AZD conversion means (2) may incorporate digital signal compression means such as ADPCM.
- (3) is a non-speech detection means, which detects a non-speech part automatically or visually.
- (4) is a conversion unit which receives the output signals of the non-speech detection means (3) and the AZD conversion means (2) and outputs a digital audio signal based on the input signal from the non-speech detection means (3). This is a means to delete or convert the audio part to another code.
- the soundless detection means (3) and the conversion section (4) constitute the soundless part display conversion means 5 in the present invention.
- the conversion unit 4 may perform an algorithmic process using a CPU, a DSP, or the like.
- Figure 1 shows a configuration in which analog audio is first converted to digital audio, and then silence is removed substantially.
- the present invention is not limited to this. During the conversion process, or at the time of analog sound, the silence part may be substantially deleted.
- the non-voice part in the input voice signal handled in the present invention indicates a voice-less part or a part close to a non-voice part, for example, between syllables or between syllables.
- the display conversion means 5 for the non-speech part substantially deletes the non-speech part. For example, the whole or one part of the non-speech part is deleted, or the non-speech part is replaced with another code. It indicates conversion and so on.
- Examples of a method of converting the audio portion into another code include, for example, converting the audio portion to a signal for deleting the audio portion, converting the audio portion to a signal indicating information on the time of the audio portion, or For example, it can be converted into a signal indicating information indicating the location in the input audio signal where the non-voice portion is located.
- FIG. 2 schematically shows the configuration of a specific example of the audio reproducing means 200 used in the audio recording / reproducing apparatus according to the present invention.
- audio reproducing means used in the audio recording / reproducing apparatus according to the present invention.
- FIG. 2 is an example of the sound reproducing means 200, and is hereinafter referred to as a reproducing unit.
- (11) is a recording medium, which is shown in FIG.
- Reference numeral (112) denotes reading means, which comprises a pickup for reading, a means for rotating the recording means (11), a means for sliding the pickup for reading, and the like.
- (12) is a detecting means, which is the output of the reading means (112). It detects the virtually removed silent part from the digital audio, restores the detected silent part, newly forms it, or converts it into a signal having the same meaning as these, and outputs it.
- the adjusting means (13) is an adjusting means, which combines the digital audio signal output from the reading means (112) with the non-audio signal output from the detecting means (12), and outputs this combined signal. .
- the detecting means (12) and the adjusting means (13) constitute the audio reproducing means 17 in the present invention.
- the adjusting means 13 is algorithmically processed by one CPU, DSP one-chip microcomputer or the like. May be done. In this case, both means
- the DZA conversion means (14) is DZA conversion means for converting the digital audio output from the adjustment means (13) into analog audio.
- the DZA conversion means (14) has restoration means.
- the DZA conversion means (14) may constitute one of the above-mentioned sound reproduction means 17, and may also serve as the detection means (12) and the adjustment means (13).
- (15) is an amplifying means, which is a means for electrically amplifying analog audio.
- the amplification means (15) may be further provided with a frequency filter characteristic.
- (16) is a generating means, which comprises any or all of a speaker and an earphone.
- the recording unit and the reproducing unit may be either an integrated type or a separate type.
- the analog audio input to the analog audio input unit (1) is filtered and amplified, and then, as shown in FIG. 3 (A), by the AZD conversion means (2). Converted to digital audio signal.
- the digital audio signal is input to the audioless detection means (3) and the conversion means (4).
- the non-speech detection means (3) detects the non-speech part (31) shown in Fig. 3 (A), and the conversion means (4) uses the non-speech part as shown in Fig. 3 (B) (32).
- 31 is converted to a digital audio signal 32 (FIG. 3 (B)) indicating the position where the silence part 31 is substantially deleted, and is recorded via the writing means (111). Written to the medium (11).
- one of the digital voice strings indicates one syllable, one syllable, one step down, or from the silent part to the next silent part.
- FIG. 2 showing the reproducing means 200 for reproducing the digital audio signal recorded on the recording medium (11)
- the recording means (11) is read by the reading means.
- the detecting means (12) detects a non-voice portion display data signal such as the deleted non-voice portion 32 shown in FIG. 3 (B), and determines a predetermined time width or original time width from information included in the signal.
- the digital signal is converted into a non-voice digital signal having the following and output to the adjusting means (13).
- the adjusting means (13) combines the non-voice digital signal input from the detection means (12) with the deleted part of the digital audio signal from which the substantially non-voice part has been deleted input from the recording means (11). Then, this combined digital audio signal (Fig.
- DZA conversion means 14
- D / A conversion means 14
- the amplification means (15) amplifies the analog audio signal, optionally filters it, and outputs it to the utterance means (16).
- the vocalization means (16) outputs sound using speed and earphones as a medium.
- the utterance speed is adjusted independently by adjusting numerically adding and subtracting the amount of non-voice digital signal, and low-speed utterance can be easily implemented. For this adjustment, an adjustment knob may be mounted on the device so that the listener can make adjustment.
- the digital audio including the converted display data signal in which the audioless part is substantially deleted may be recorded on a recording medium as shown in FIGS. 4 (A) and 4 (B).
- the silence detection means (3) and the conversion means (4) use the silence part (41) of the original digital audio signal shown in FIG. 4 (A) as shown in FIG. 4 (B).
- the digital audio signal shown in FIG. 4 (B) is written to the recording means (11) via the writing means (111).
- the other codes (42) shown in FIG. 4 (B) indicate not only a mark but also a code of several bits having information on a non-speech time width and information indicating the characteristics of a non-speech part. is there.
- the recording means (11) records the digital audio signal shown in FIG. 4 (B).
- the reading means (112) reads the digital audio signal recorded in the recording means (11) from which the substantially silent part has been deleted, and outputs it to the detecting means (12) and the adjusting means (13). You.
- the detecting means (12) detects a code which is added as an alternative to the deleted silent part of the input digital audio signal, decodes the code, and adjusts the signal according to the decoded content (13). ).
- the content of the other reference numeral (42) shown in FIG. 4 (B) is the time width of the original non-voice portion of that portion as described above.
- the adjusting means (13) deletes the signal input from the detecting means (12) and the silent part input from the reading means (112).
- the digital audio signal (FIG. 4 (A)) to which the non-voice part is added or reproduced is output from the digital audio signal to the DZA conversion means (14).
- DZ A conversion means (14) The subsequent operation is the same as described above, and therefore description thereof is omitted.
- FIG. 15 shows an example of a speech input signal used in the present invention, but the present invention is not limited to this.
- the window (WD) in Fig. 5 (A) is set in advance for the non-voice section.
- Lth is a threshold value for determining that there is no voice, and is set in the (+) (-) direction.
- the signs A to D shown in Fig. 5 (A) are determined in advance, and the initial value of the time width between the signs A to D is also set in advance.
- the time width is only an initial value and can be changed. Find the smallest tn that satisfies the expression (1) between the time ts + 1 and ta at the current time ts.
- V (ti) is a voltage value at a time ti before or after a predetermined time from the current time ts. Since four codes A to D are used in the present embodiment, they are expressed by about 2 bits, so that the non-speech part on the recording means is replaced with a few code strings. The number of codes is preferably as small as possible, but is not particularly limited.
- the codes A to D determined in the above-described process are recorded on the recording medium (11) via the writing means (111).
- the first window (WD1) is set, and if the four types of time factors are different in the window. This is for setting the checkpoints A to D. If the initial time of the window WD1 is ts, the checkpoint D is arranged at a position corresponding to the time of ts + 1. The time interval between the initial time and the check point D is 1 Z 4 ⁇ ⁇ t.
- check point C is arranged at a position corresponding to the time of ts + 2, and the time interval between check point D and check point C is 1 Z 4 ⁇ t.
- Check point B is arranged at a position corresponding to the time of ts + 3, and the time interval between check point C and check point B is ⁇ t.
- check point A is arranged at a position corresponding to the time of ts + 4, and the check point B and the check point B are arranged.
- ⁇ t be the time interval with A.
- an audio signal N is input from an external input, and the input audio voltage V (n) is compared with the above-mentioned predetermined threshold value Lth.
- the voltage V (n) of the audio input signal becomes lower than the threshold value Lth at time ts, and satisfies the relationship of the above equation (1).
- the above-described window WD1 is set, and the input audio voltage V is continuously set at a predetermined sampling time interval during the inspection time set by the window WD1. (n) is compared with the threshold value Lth described above.
- the audio input signal N satisfies the relationship of the above equation (1) within the elapsed time ts + 4 predetermined in the window WD1. Therefore, during this time, it is determined that the non-speech part is continued, and the code of A is given to the non-speech part at that time as the identification code.
- the next window WD2 is set at that time. Is done.
- the set time of the second window WD2 is made longer than the set time of the first window WD1.
- the checkpoint D is arranged at a position corresponding to the time of ts + 1. Then, the time interval between the initial time and the check point D is 1 Z 4 ⁇ ⁇ t.
- check point C is arranged at a position corresponding to the time of ts + 2, and the time interval between check point D and check point C is 1 to 4 ⁇ ⁇ t
- Check point B is located at a position corresponding to the time of ts + 3, and the time interval between check point C and check point B is 2 ⁇ t + 3. I do.
- check point A is arranged at a position corresponding to the time of ts + 4, and the time interval between the check point B and the check point A is 2 ⁇ t + 3.
- the voltage V (n) of the input audio signal exceeds the above-described threshold value Lth just before the check point A in the second window WD2. It turns out that the non-voice part has ended.
- a code B is given as an identification code to the non-voice portion of the input voice signal in window WD2.
- a code string A ⁇ B is given to the non-voice portion of the audio input signal in this specific example, and the corresponding identification code is read out during reproduction, and corresponds to the identification code A ⁇ B.
- the reproduction operation is executed while inserting a non-speech part into a predetermined position of a predetermined audio input signal to be reproduced for a certain period of time.
- the digital audio recorded by the recording means (11) is read by the reading means
- the detection means (12) detects the signals A to D shown in FIG. 5 (A) and the signal indicating the start of the silent state and the code, and detects the signal shown in FIG. 5 (A). Then, the data is restored to a silent section having a time width corresponding to the code, and output to the adjusting means (13).
- the adjusting means (13) inserts the non-speech part output from the detecting means (12) into the parts of the code A to D of the digital sound.
- the detection means (12) increases a part or all of the time width of the codes A to D shown in FIG. The width automatically increases in proportion to the number of repetitions.
- the silent part can be automatically replaced with a small number of codes during recording, it is very convenient and highly streamlined. Since the restoration process time is short, there is an effect that there is no problem in reproducing sound output.
- the size of the device configured using the above-described embodiment is preferably of a size that is portable, and in the case of a learning book, a function of outputting repeated voices or a bookmark-like function may be added. . Also, since the size of the device is also affected by the size of the recording medium, the recording medium must be small and have a high capacity, such as a CD-ROM, a mini-magneto-optical disk, and 3.5. An intact floppy disk, digital audio tape, etc. are suitable.
- the digital audio does not need to be particularly limited, such as a synthesized audio, an A / D-converted or compressed audio of a natural audio, and indicates a voice converted by an existing method.
- First step Input a predetermined amount of the audio data digitized in (2a). At this time, the predetermined amount is a number such as a unit of 1024, and depends on a recording element for temporarily storing. This may also be unnecessary.
- step (2b) in order to distinguish between the control code previously set to be used as a code indicating no sound and the like and the sound data, the control code is obtained from the sound data in which 64 data is made into one block. Change the same or similar data as the data. This change is made, for example, by +1 (incrementing) this data.
- step (2c) calculation is performed to determine the start and end of silence.
- the audio data is divided into predetermined blocks (for example, one block is defined as 64 data), and data such as the audio amplitude distribution in the block is obtained.
- step (2d) it is detected how much the amplitude distribution of the data obtained in step (2c) is within a predetermined range (for example, (800h ⁇ X ⁇ 80Fh)). If the data being recorded is within 90% of the silence range, a temporary silence start is assumed.
- step (2d) If (yes) in step (2d), then it is checked in step (2e) whether the silence start flag is on. If the flag is off, silence start is determined here, and ( no), and the step of turning on the no-sound flag with respect to the temporary no-sound start point (2f). If the silence start flag has already been turned on in step (2e), it is in the silence interval, and step c (2g), which becomes (yes), is the part where the next predetermined number of data blocks are read. Yes, it is determined in step (2h) whether the data block is completed. If the data block is completed, the process returns to (yes). If not, the process returns to the silence range calculation step (2c). If finished, step (yes) In (2i), it is determined whether all data has been completed. If not completed, the result is (no). In step (2a), 1024 data are read again and temporarily recorded. , And continue processing after step (2a) o
- step (2d) if it is not within the silent range (no), then it is determined in step (2j) whether or not this is the end of silence. % Or more is outside the silent range. In step (2j), if (yes) silent end, proceed to step (2k). If (no), proceed to step (2g) as non-silent part, and Read the number of data blocks. In step (2k), it is determined whether or not the silence start flag is turned on. If it is off, the process proceeds to step (2g) as (no). If the silence start flag is on, step as (yes)
- step (2m) the number of bytes in the silent section is calculated. At this time, if there is a predetermined time or more between the provisional silence start point and the provisional silence end point, a true silence start point and a true silence end point are set. Further, when the duration of the silence set in accordance with the case is shorter than a predetermined time, a process of invalidating the duration without determining it as a silence section is performed. This processing is performed to prevent a situation in which the reading is continuous, but there is silence between the sounds, and the sound is not deleted. Appropriate for that. In step (2m), a control code based on silence detection and a code indicating the number of bytes indicating a silence section are determined and stored in the audio data. At step (2n), the silence start flag is turned off, a series of silence detection processing ends, and the routine proceeds to step (2g), where the next data block is read.
- the non-speech part in the present invention refers to a soundless part or a part close to non-speech, such as between syllables or between syllables, as described above.
- the substantial coding of the non-speech part means, for example, a state where all or one part of the non-speech part indicated by 31 in FIG. 3 (A) is indicated by 32 in FIG. 3 (B), or FIG. In A), the non-speech part indicated by 41 is converted into another code (42) shown in FIG.
- FIG. 6 is a block diagram showing the configuration of another specific example of the audio recording / reproducing apparatus according to the present invention, wherein (11) is a storage medium, that is, a storage means. Disks, optical disks such as CDs and MDs, magnetic disks, IC storage media, and the like, particularly those with a small size and high capacity are preferred.
- Reference numeral (22) denotes a drive element, which is a motor such as a spindle motor or a thread motor for driving the storage medium 11 and for moving the pickup. Also, there may be no case depending on the type of the storage medium.
- (23) is an RF amplifier for amplifying and shaping the read signal, for example.
- (24) is an adjusting means, which is composed of a DSP or the like, and is provided with an error correction processing, a PLL, and an EFM demodulating means when a general-purpose CD reproducing apparatus is used.
- the drive means (25) is a means for controlling the rotation speed of the drive element (22), the reading positioning, and the like.
- the input control means (27) is composed of a microcomputer or the like, and further includes an input signal of an external input (71) and a non-voice conversion means (26). The control signal based on the output signal is output to the adjusting means (24) and the soundless converting means (26).
- the RF amplifier (23), the adjusting means (24), and the non-speech conversion means (26) constitute the non-speech part display conversion means 20.
- (28) is a DZA conversion means, which converts a voice digital signal into an analog signal and outputs it. If data is subjected to compression processing such as ADPCM or ATC, the data shall further include a decompression process corresponding to this.
- (29) is amplification means for amplifying and outputting analog audio.
- (30) is an audio output means, which is composed of a speaker, an earphone, and the like.
- the signal at (34) is the RF amplification means
- the drive circuit (24) performs error correction and EFM demodulation processing on the input data and outputs the data. Further, the drive circuit (25) adjusts the rotation speed of the storage medium and adjusts the movement of the pickup, for example. It outputs a signal to do things and so on.
- the non-speech conversion means (26) detects coded non-speech 'data from the output from the adjustment means (24), converts the data into speech data indicating no speech, and outputs the data.
- the voiceless conversion means (26) easily changes the voiceless voice data range according to the signal from the input control means (27).
- the input control means (27) causes the adjusting means (24) to input a signal corresponding to the input by a key input from the input (71), a thumb input, or the like.
- the input control means (27) outputs a signal to the non-speech conversion means (26) for adding or deleting a signal range indicating a range of a non-speech part in order to adjust the speech speed slowly or quickly.
- the audio data output by the non-audio conversion means (26) is converted to analog audio by the DZA conversion means (28), amplified by the amplifier circuit (29), and reproduced and output by the audio output means (30) .
- the reading of the storage medium (11) is set to a specification of 44. KKHZ) resolution of about 16 bits like a general CD player, on the other hand, the DZA conversion etc.
- the no-speech conversion means (26) temporarily stops reading and adjusts the delay. Therefore, a control signal may be output to the input control means (27). At this time, the input control means (27) causes the adjusting means (24) to perform operations such as stopping reading, decreasing the rotation speed, and stopping while keeping the rotation speed constant.
- a predetermined amount of digital audio data is fetched from the adjusting means (24) and temporarily stored in the memory. Note that this memory is unnecessary when performing non-voice processing sequentially.
- a signal for stopping or suppressing reading from the storage medium (11) is output to the input control means (27) as occasion demands.
- the input control means (27) outputs the input signal to the adjusting means (24) to control the driving of the drive element (22) and the like.
- step 21 After fetching a predetermined amount of voice data in step 21, the parameter input is confirmed from the input control means 27 in step 2a, and it is determined in step 2a whether the parameter input has been performed. If YES, step 2b is executed. In, set the number of parameters according to the input. 7 Also check in step 22 whether or not a silent output is being performed. Here, the state of the flag indicating that no sound is being output is checked. If the flag during unvoiced output is set (yes), the process proceeds to step 26, and the unvoiced output process is performed. If the flag is cleared, the process proceeds to step 23 if NO. In step 23, audio data is searched, and if there is a control code on the way, parameters are set.
- step 24 it is determined whether or not the processing of the audio data has been completed. If the processing has been completed (yes), the operation proceeds to the operation for reading more data, and the operation becomes END. If not terminated (no), the flow shifts to step 25 to judge whether or not it is a no-sound control code. If it is not a voiceless control code, the process proceeds to step 27 assuming that the data indicates voice, and if it is a voiceless control code, the process proceeds to a voiceless output processing step 26. In the unvoiced output process in step 26, the unvoiced output flag is set, and data (for example, 808h) indicating a 0-level voice is output.
- data for example, 808h
- the set parameter of (Step 2b) is added to or subtracted from the parameter indicating the non-voice section, and this value is decremented by 1 (decrement) every time.
- the added or subtracted parameter is set to 1 and the value becomes 0, the flag indicating that no sound is being output is cleared, and the soundless output ends.
- the input parameters set in step 2b are cleared.
- the non-voice output process step 26 the process waits for a time in step 32 to process the next data.
- the waiting time in step 32 is about 125 microphone openings sec.
- step 27 determines whether the data is an even number. If the data is even-numbered, (Yes) Performs 4-bit bit shift, and shifts to odd-numbered numbers so that data processing in byte units can be performed.
- step 29 the 4-bit audio data is expanded to 12 bits.
- the decompression algorithm is, for example, as follows. The first data (Y) is calculated at the 0 level (808h), and the second and subsequent times use the data obtained by the following calculation as the previous data. Subtract 4-bit audio data (X) from the reference value and multiply Multiply by the rate (m) and add the previous 12-bit data (Y) to obtain 12-bit audio data.
- 12-bit audio data ((X—reference value) * m) + Y
- the previous data is set to 0 level.
- the reason why the data is expanded to 12 bits is that audio is digitized in 12 bits, and is not particularly limited together with the above-described algorithm.
- the multiplication rate (m) is added to the data at the time of recording in order to increase the accuracy at the time of restoration, and is not particularly necessary.
- step 30 the time when the audio data is odd-numbered is detected, and it is checked whether or not the audio data ends only at the odd-numbered time. This means, for example, that the flag corresponding to the entire audio data is decremented by 1 (decrement), and when it becomes 0, the operation for terminating at step 24 is performed.
- the parameters set for controlling the utterance speed and the operation with respect to the parameters are not limited to those described above, but are performed by other methods such as an interrupt.
- the processing style is not limited to software, but may be hardware. It may be due. It is preferable that the size of the device configured using the above-described embodiment is of a size that can be portable. There is also.
- the recording medium since the size of the device is also affected by the size of the recording medium, the recording medium must be small and have a high capacity, such as a CD-ROM, mini-optical disk, Suitable for use are, but not limited to, intro-floppy discs and digital audio tapes.
- the digital voice is not particularly limited, such as A / D conversion and compression-processed voice of natural voice, and indicates voice converted by other existing methods.
- FIG. 91 is an audio input means, which is a microphone for generating an analog audio signal, a filter circuit, and an amplifier. Composed of circuits, etc. o
- (92) is an AZD conversion means, which comprises a sampling circuit, an AZD converter circuit, etc., and further includes a circuit including various PCMs such as ADPCM, a compression function such as PWM and ATC as necessary. .
- a non-speech detecting means which changes, deletes, moves, etc. data indicating confusing values such as a code indicating non-speech, a parameter code indicating duration, etc. from digital audio data in advance.
- the silent part is detected, and this part is made to correspond to the code indicating no sound and the parameter code indicating the time width, and is added so as to be interrupted in the sound data.
- non-speech deleting means which is a means for deleting from the voice digital data corresponding to the number of non-speech interval bytes following the non-speech control code.
- (95) is a component control code conversion means for converting analog audio into digital data.
- a 12-bit conversion means is used, and the recording medium is designed to record 4-bit data. 4 bits from the data or used as the conversion parameter when performing the conversion process in the reverse direction.
- the part where the rate of change of the digital audio signal is large or the characteristic part is converted to the component control code. It is also a means for.
- the component control code conversion means (95) cannot sufficiently restore the portion where the rate of change of the audio signal is large or the characteristic portion is usually required to increase the digital information, the data of this portion is used If the amount of digital information is reduced by replacing it with data indicating the number, i.e., component control code, and the audio data recorded on the recording medium is reduced to a bit number as small as about 4 bits, the This is for obtaining an analog audio signal with a waveform.
- (96) is a rearrangement means. If control data such as a silence control code and voice data are mixed in a predetermined section, the silence control code part is replaced with voice data indicating silence, and the following is performed.
- the control code is stored at the beginning of the section.
- the mixed state of voice data and control code in a predetermined section means, for example, that a processing means such as a CPU has an 8-bit processing capability, and a voice data string is divided into 8 bits so that processing can be easily performed. This is the case where audio data and control data are present in these 8 bits.
- the code of the lower 4 bits is placed at the beginning of the next predetermined section, and a process of substituting voice data indicating no voice is performed instead to distinguish the code from the voice data.
- the recombining means also has a means for 4-bit encoding of digital audio.
- (97) is a recording signal adjusting means for changing a signal of a form necessary for writing to a recording medium to an input signal.
- conventional means such as EFM modulation shall be included.
- the recording signal adjusting means (96) is composed of one or more DSPs and microcomputers. The selection is appropriately selected depending on the size of the apparatus, the size of the processing capacity, and the like.
- the mixed data string in which the voice data and the like are combined is input to the voiceless deletion processing means (94), and voice data indicating voicelessness is deleted.
- step (3a) the hybrid data is read directly from the non-speech detecting means (93) or indirectly via a buffer or the like.
- step (3b) it is determined whether the first data is a silence detection control code. If it is a silence detection control code (yes), proceed to the next step (3d). If not, it becomes (no) and the silence detection control code and the number of silence section bytes 0 are forcibly stored in the buffer memory, and the process proceeds to step (3d). In step (3d), it is determined whether or not all data has been processed. If not (no), the hybrid data sequence is read again.
- step (3f) it is determined whether or not the code is a silence detection control code. If so (yes), at step (3h), a silence detection control code and a byte indicating a silence section are determined. Only the number of data is stored and skipped. Further, in step (3i), the pointer of the data corresponding to the number of silence sections corresponding to the silence detection control code is advanced, and the data is reduced by the number of bytes corresponding to the silence section. In step (3f), the silence detection control code If not (no), the read data is determined to be audio data, and the data is stored in the buffer. By repeating the above, all data is organized into a data string consisting only of voice data, a silence detection control code, and a silence section byte number code. After performing the sequential or fixed amount deletion processing, the output of the voiceless deletion means (94) is input to the component control code conversion means (95).
- step (4a) the data output by the soundless deletion means (94) is read.
- the unit of data to be read is 256, and processing is performed in units of one.
- step (4b) it is determined whether the processing data is a silence detection control code or not. If the processing data is a control code (yes), only the silence detection control code and the number of silent section bytes are used. Is stored in the buffer, the data is skipped as it is, and the procedure goes to step (4d) '. If the code is not the silence detection control code, the flow proceeds to step (4d). In step (4d), the difference between the previous data and the current data is obtained, and the maximum value of the difference data in the same mode is obtained from the difference data. In step (4d), it is determined whether or not the mode has been changed. If the mode has been changed (yes), the process proceeds to step (4e). If the mode is the same (no), go to step (4a) and process the next data.
- step (4e) it is determined whether the mode change is a change to the plus / minus component mode.
- the plus / minus component mode refers to data with little change in data, plus and minus components, and the difference between adjacent data is less than the specified value. It is.
- step (4e) go to the (yes) direction if it corresponds to the plus / minus component mode, otherwise go to (no). If (yes), plus / minus control corresponding to plus / minus component mode
- the code and the multiplication rate code are stored in the voice code data.
- the multiplication rate code is used to compensate for the insufficiency in the bit representation of the change amount of the audio data, and expresses the change amount of the audio data by the multiplication rate.
- the criterion for the multiplication rate code is arbitrary and is appropriately selected according to the digital bit representation of the audio data and the sample frequency. When the sample frequency is 8 KHz and the audio and control code digital data to be recorded on the recording medium are 4 bits, the multiplication rate is set as follows, for example.
- the plus / minus direction has a reference value of 7, with 0 to 6 representing the negative component and 8 to 14 (Eh) representing the brass component.
- the maximum change at this time is 7.
- Maximum difference (B) (Fig. 13 (A)) Calculate the multiplication rate of one change with the maximum change amount.
- step (4e) When step (4e) is (no), the process proceeds to step (4g) to determine whether the difference data matches or does not match in the positive component mode, and when they match, it corresponds to the positive component mode.
- the plus control code and the multiplication rate are stored in the audio data.
- the multiplication rate at this time is set as follows, for example.
- the maximum change is set to 14 since it is expressed as 1 to 14 (Eh) with 0 as the reference value.
- step (4h) the multiplication rate (maximum difference maximum change amount) is calculated to determine the multiplication rate per change.
- the plus control code and the multiplying ratio are stored in the portion of the voice data detected in the plus component mode.
- step (4 g) if, Step (4 i) at minus control codes, the odds of c minus component that stores the odds to the appropriate section of the audio data, directional (no)
- the data shown is not required and is expressed as ⁇ 13 (Dh) with 14 as the reference value.
- the maximum change amount is 14 (maximum difference maximum change amount), and the multiplication rate per change is calculated.
- the converted data is input to the recombination means (96).
- the rearrangement means (96) encodes 12 bits of audio data into 4 bits and rearranges the audio data and control code for the mixed data. Also, the operation means of the rearrangement means (96) in FIG. 9 will be described with reference to the flowchart (FIG. 14).
- step (6a) the data is read, and the data is processed by dividing the data by a certain number.
- step (6b) it is determined whether the control code is a control code or not. When the control code is not (no), when the sound is sound data, the 4-bit encoding is performed in step (6c). ). At this time, 4-bit encoding may be performed based on the component and multiplication rate information. An example of one conversion method to 12-bit and 4-bit is described.
- the calculation is performed at the 0 level (808h), and for the second and subsequent times, the data obtained by the following calculation is used as the previous data.
- 4-bit audio data (((n) -1 (n-1)) / m) + reference value component and multiplication rate were determined by the control code conversion means (95) shown in the previous section.
- the component control code and the multiplication factor the code before rewriting is valid until rewriting occurs.
- step (6c) it is determined whether the 4-bit audio data is even or odd. If the data is even (yes :), the encoded data is saved in the upper 4-bit register. In the case of an odd number (no), the encoded data is saved in the following 4-bit register. At the location where the upper and lower bits fit, in step (6g), the upper 4 bits and the lower 4 bits, a total of 1 byte, are stored in the buffer memory. Steps If the data is a control code in (6b), it is determined at this point in step (6h) whether or not the audio data is odd data end. At the end of the odd data (yes :), the control code section stores the voice that indicates no voice, and also stores the control code in the next upper 4 bits.
- step (6j) the component control code (type, multiplication factor, etc.) accompanying the control code (Fh) is additionally stored. If the voice data ends in an even number in step (6h) (no), the processing in step (6j) described above is performed.
- the data in which the audio data and the control code are arranged as described above are written to the temporary or main recording medium (18) via the recording signal adjusting means (97) and the write pickup (97P). It is.
- the present invention can reproduce the reading voice of a book for a sufficient time even on a generally provided recording medium, and at the same time, makes the utterance speed variable during reproduction, and is different from ordinary reading. This has the effect that no sound can be output.
- the present invention stores digital audio data in a storage medium in a form in which substantially no audio portions are encoded, and adds this silent time during reproduction to the storage medium. Is enough audio
- this code contains information on the original non-speech part, so that a sound output close to natural reading can be obtained for a long time, and the desired time width can be set. Since non-speech data can be added arbitrarily, a speech playback device with variable reading speed, such as when learning slowly or quickly, such as a study book or commentary, is realized.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP94913792A EP0652560A4 (en) | 1993-04-21 | 1994-04-21 | DEVICE FOR RECORDING AND PLAYING BACK VOICE. |
KR1019940704661A KR950702323A (ko) | 1993-04-21 | 1994-12-20 | 음성 기록/재생 장치(apparatus for recording and reproducing voice) |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP5/116599 | 1993-04-21 | ||
JP5116599A JPH06308992A (ja) | 1993-04-21 | 1993-04-21 | 音声式電子ブック |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1994024667A1 true WO1994024667A1 (en) | 1994-10-27 |
Family
ID=14691150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1994/000661 WO1994024667A1 (en) | 1993-04-21 | 1994-04-21 | Apparatus for recording and reproducing voice |
Country Status (3)
Country | Link |
---|---|
JP (1) | JPH06308992A (ja) |
KR (1) | KR950702323A (ja) |
WO (1) | WO1994024667A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996007999A1 (en) * | 1994-09-05 | 1996-03-14 | Ellen Rubin Lebowitz | Reading tutorial system |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7020663B2 (en) | 2001-05-30 | 2006-03-28 | George M. Hay | System and method for the delivery of electronic books |
JP6389348B1 (ja) * | 2018-03-23 | 2018-09-12 | 株式会社アセンド | 音声データ最適化システム |
JP6386690B1 (ja) * | 2018-06-27 | 2018-09-05 | 株式会社アセンド | 音声データ最適化システム |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59195307A (ja) * | 1983-04-20 | 1984-11-06 | Casio Comput Co Ltd | 音声情報記録方式 |
JPS6035795A (ja) * | 1983-08-05 | 1985-02-23 | 赤井電機株式会社 | 信号のピツチ変換器 |
JPS62125577A (ja) * | 1985-11-26 | 1987-06-06 | Nec Corp | 音声蓄積再生装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03184171A (ja) * | 1989-12-13 | 1991-08-12 | Hitachi Ltd | 電子ブック再生装置 |
JPH03248398A (ja) * | 1990-12-13 | 1991-11-06 | Sharp Corp | デイジタル録音再生機の録音再生方式 |
-
1993
- 1993-04-21 JP JP5116599A patent/JPH06308992A/ja active Pending
-
1994
- 1994-04-21 WO PCT/JP1994/000661 patent/WO1994024667A1/ja not_active Application Discontinuation
- 1994-12-20 KR KR1019940704661A patent/KR950702323A/ko not_active Application Discontinuation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59195307A (ja) * | 1983-04-20 | 1984-11-06 | Casio Comput Co Ltd | 音声情報記録方式 |
JPS6035795A (ja) * | 1983-08-05 | 1985-02-23 | 赤井電機株式会社 | 信号のピツチ変換器 |
JPS62125577A (ja) * | 1985-11-26 | 1987-06-06 | Nec Corp | 音声蓄積再生装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996007999A1 (en) * | 1994-09-05 | 1996-03-14 | Ellen Rubin Lebowitz | Reading tutorial system |
Also Published As
Publication number | Publication date |
---|---|
JPH06308992A (ja) | 1994-11-04 |
KR950702323A (ko) | 1995-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2965330B2 (ja) | 情報再生装置 | |
JPH10187199A (ja) | 半導体記憶媒体記録装置及び半導体記憶媒体再生装置 | |
KR100291612B1 (ko) | 데이터유니트의 오리지널시퀀스편집방법및그장치 | |
WO1994024667A1 (en) | Apparatus for recording and reproducing voice | |
JPH0689544A (ja) | ディスク記録装置 | |
JP4990375B2 (ja) | 記録再生装置 | |
EP0652560A1 (en) | Apparatus for recording and reproducing voice | |
JP2838159B2 (ja) | 音声信号処理装置 | |
JPH09138698A (ja) | 音声記録再生装置 | |
JP4588626B2 (ja) | 楽曲再生装置、再生制御方法、および、プログラム | |
US6373421B2 (en) | Voice recording/reproducing device by using adaptive differential pulse code modulation method | |
JP3727689B2 (ja) | ディジタル音声記録再生装置 | |
JPH07160282A (ja) | 音声再生装置 | |
JPS6253093B2 (ja) | ||
JPH0927189A (ja) | 音声情報再生方式 | |
JP3341348B2 (ja) | 情報検出再生装置及び情報記録装置 | |
JP3490655B2 (ja) | オーディオ信号復号器 | |
JP2001117596A (ja) | 音声信号再生方法および音声信号再生装置 | |
JPH08255430A (ja) | ディスク記録再生装置 | |
JP3829944B2 (ja) | 再生装置 | |
JP4264670B2 (ja) | 記憶再生装置及び記憶再生方法 | |
JPH07153188A (ja) | 音声再生装置 | |
JPS60146297A (ja) | 音声レコ−ダ | |
JPH07169291A (ja) | 音声記録装置及び音声再生装置 | |
JPH03182799A (ja) | 音声情報記録装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
ENP | Entry into the national phase |
Ref country code: US Ref document number: 1994 356324 Date of ref document: 19941220 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1994913792 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1994913792 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1994913792 Country of ref document: EP |