WO2011148519A1 - Dwelling unit device for interphone system for residential complex - Google Patents

Dwelling unit device for interphone system for residential complex Download PDF

Info

Publication number
WO2011148519A1
WO2011148519A1 PCT/JP2010/062581 JP2010062581W WO2011148519A1 WO 2011148519 A1 WO2011148519 A1 WO 2011148519A1 JP 2010062581 W JP2010062581 W JP 2010062581W WO 2011148519 A1 WO2011148519 A1 WO 2011148519A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
packet
call
voice
processing
Prior art date
Application number
PCT/JP2010/062581
Other languages
French (fr)
Japanese (ja)
Inventor
実 福島
恵一 ▲吉▼田
哲平 鷲
幸夫 岡田
和生 土橋
克彦 木村
Original Assignee
パナソニック電工株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニック電工株式会社 filed Critical パナソニック電工株式会社
Priority to JP2012517086A priority Critical patent/JP5544012B2/en
Priority to CN201080067044.6A priority patent/CN102918825B/en
Publication of WO2011148519A1 publication Critical patent/WO2011148519A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • the present invention relates to a dwelling unit used in a dwelling unit intercom system and installed in each dwelling unit of a dwelling unit.
  • a common unit device (lobby intercom) installed at the common entrance of the apartment, a dwelling unit installed inside each dwelling unit of the apartment, and a doorphone slave unit installed at the outside entrance of each dwelling unit
  • An intercom system for collective housing is provided.
  • a signal trunk line is connected to the shared unit, and each dwelling unit is connected to a dwelling unit branching from the signal trunk line.
  • the dwelling unit in the dwelling unit and the doorphone cordless unit at the outside entrance are connected by a cordless handset connection line.
  • another dwelling unit may be connected to each dwelling unit by a home connection line.
  • the dwelling unit connected to the dwelling unit line is called a dwelling unit main unit
  • the dwelling unit connected to the dwelling unit main unit by the in-house connection line is called a dwelling unit sub-master unit.
  • the voice transmission method via the signal trunk line and the dwelling unit line is a packet transmission method, so that another dwelling unit ( A dwelling unit intercom system that enables calls between two dwelling units) is described.
  • a call direction switching process and an echo suppression process for a hands-free call are performed.
  • the common unit device and the plurality of dwelling units can be digitally communicated, and the signal trunk line and the dwelling unit line connecting the common unit device and each dwelling unit are used.
  • call processing that compensates for voice loss due to packet loss, delay and fluctuation (jitter) accompanying packet transmission is performed. Necessary.
  • a conventional inexpensive device that is, a device that transmits voice by an analog transmission method
  • an analog transmission method is adopted as a voice transmission method between the dwelling unit (dwelling unit main unit) and the door phone slave unit or between the dwelling unit main unit and the dwelling unit sub-master unit.
  • it is necessary to perform a call direction switching process and an echo suppression process for hands-free call (speech call), but consider the case where digital data is transmitted through the signal trunk line as described above.
  • the voice loss compensation process essential for the packet transmission method is not necessary for the analog transmission method.
  • an object of the present invention is to use a packet transmission system for voice transmission via a signal trunk line and an analog transmission system for voice transmission in the vicinity of a home not via a signal trunk line, while suppressing the complexity and cost increase of the circuit configuration.
  • An object of the present invention is to provide a dwelling unit for an apartment intercom system that can be used and can improve call quality.
  • the dwelling unit of the intercom system for collective housing of the present invention is a common unit device installed in the common entrance of the collective housing, a dwelling unit installed in each dwelling unit of the collective housing, and installed in the exterior entrance of the collective housing Doorphone slave unit, signal trunk line connected to the common unit, a dwelling unit branching from the signal trunk line and connected to each dwelling unit, and a slave unit connecting the dwelling unit and the doorphone slave unit A connecting line.
  • Call voice is transmitted between the shared device and the dwelling unit, and between the dwelling units by the packet transmission method via the signal trunk line and the dwelling unit line, and between the dwelling unit and the doorphone slave unit.
  • Call voice is transmitted by an analog transmission method through the slave unit connection line.
  • a microphone and a speaker a transmission processing unit that transmits a voice packet including voice data for calling and a control packet including control data for call control via the dwelling unit line and the signal trunk line; and the slave unit connection line
  • An analog signal transmission unit for transmitting an analog audio signal via the first, an analog audio signal output from the microphone is converted into audio data, and the audio data is converted into an analog audio signal and output to the speaker.
  • a storage unit that stores first software for speech processing for voice data transmitted in an analog transmission system and second software for speech processing for speech data transmitted in a packet transmission system; And a control unit for instructing execution of call processing.
  • the control unit instructs the call processing unit to execute the first software when the door phone call detection unit detects the call, and the shared unit device or When the control data for call control is received from the dwelling unit, the call processing unit is instructed to execute the second software.
  • the call processing unit executes the first software when the other party's call terminal is an analog transmission system, and the call processing unit executes the second software when the other terminal is a packet transmission system. Therefore, it is possible to use a packet transmission system for voice transmission via the signal trunk line and an analog transmission system for voice transmission in the vicinity of the house not via the signal trunk line, while suppressing complexity of the circuit configuration and cost increase. The call quality can be improved.
  • the second software includes a program for acoustic echo suppression processing for suppressing acoustic echo generated by acoustic coupling between the microphone and a speaker, and a residual for suppressing residual echo that cannot be suppressed by the acoustic echo suppression processing. And an echo suppression processing program.
  • the second software since the second software includes the program for acoustic echo suppression processing and the program for residual echo suppression processing, the call quality in the packet transmission method can be further improved.
  • the second software includes a fluctuation absorption processing program for absorbing fluctuations in transmission delay in the transmission processing unit.
  • the second software since the second software includes a fluctuation absorption processing program, the call quality in the packet transmission method can be further improved.
  • a fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit.
  • the fluctuation absorbing processing program counts the number of voice data packets stored in the fluctuation absorbing buffer at a period not longer than the packetization period of the voice packet and calculates a packet count value; It is preferable to cause the call processing unit to perform a buffer size changing step of inserting or deleting a packet in the fluctuation absorbing buffer based on the packet count value calculated in the counting step.
  • the call processing unit performs a buffer size changing step for inserting or deleting packets in the fluctuation absorbing buffer based on the packet count value calculated in the counting step. Reduction of call delay can be realized, and call quality can be further improved.
  • the fluctuation absorption processing program calculates a representative value of the packet count value based on a past history of the packet count value in the buffer size changing step, and the calculated representative value is a predetermined reference.
  • the value is larger than the value, it is preferable to delete the packet from the fluctuation absorbing buffer, and when the representative value is smaller than the reference value, it is preferable to cause the call processing unit to perform a process of inserting the packet into the fluctuation absorbing buffer.
  • prevention of packet depletion and reduction of call delay can be realized with higher accuracy.
  • the fluctuation absorption processing program causes the call processing unit to record the reception time of the latest packet, and in the counting step, the count value of the latest packet is calculated at the calculation timing of the packet count value.
  • a process of setting the difference between a calculation time and the reception time to a value divided by the packetization period, setting the count value of packets other than the latest packet to 1, and calculating the packet count value It is preferable to have the processing unit perform it.
  • the call processing unit calculates the packet count value by setting the count value of packets other than the latest packet to 1, it is only necessary to record the reception time only for the latest packet, The recording capacity in the recording medium for recording the reception time can be saved.
  • the fluctuation absorption processing program causes the call processing unit to hold the packet count value of the past N (N is a positive integer value) times in the counting step, and in the buffer size changing step, Of the past N packet count values, it is preferable to cause the call processing unit to perform a process using the nth (n is a positive integer value less than N) -th smallest packet count value as the representative value.
  • N is a positive integer value
  • the call processing unit to perform a process using the nth (n is a positive integer value less than N) -th smallest packet count value as the representative value.
  • the fluctuation absorbing processing program determines the presence or absence of a spike delay based on the past N packet count values in the counting step, and determines that the spike delay has occurred Has the process of extracting the packet count value of the past M (M is a positive integer value of M ⁇ N) out of the past N packet count values to be performed by the call processing unit, and the buffer size changing step
  • the call processing unit is caused to perform processing for calculating, as the representative value, the packet count value that is the mth (m is an integer less than M) of the past M packet count values extracted in the counting step. It is preferable.
  • the representative value can be calculated while eliminating a spike delay that rarely occurs.
  • the fluctuation absorption processing program when the packet count value is continuously zero in the counting step, the fluctuation absorption processing program increases in absolute value as the number of times of continuous zero increases. It is preferable to cause the call processing unit to perform a process of calculating a negative value as the packet count value.
  • the fluctuation absorption processing program calculates a negative value, which increases in absolute value as the number of times of continuous zero increases, as the packet count value, so that packets can be received periodically.
  • the packet count value can be calculated in consideration of the difference between the case where the number of stored packets happens to be 0 at the calculation time and the case where the packets cannot be received regularly. Therefore, the packet is less likely to be deleted in the latter case than in the former case.
  • the second software uses the missing voice data
  • a program for audio data missing compensation processing for compensating all or part of the missing audio data is included.
  • the missing part is compensated by using voice data that is not missing, so that the call quality in the packet transmission method is further improved. Can do.
  • a fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit, and the fluctuation absorbing processing program stores the voice stored in the fluctuation absorbing buffer.
  • a counting step for calculating the packet count value by counting the number of data packets; and a buffer size changing step for inserting or deleting packets in the fluctuation absorbing buffer based on the packet count value calculated in the counting step;
  • the buffer size changing step when one packet is deleted from the fluctuation absorbing buffer, if there are two or more valid packets including voice data, Located in the middle of consecutive valid packets Possible to perform the process of deleting the two valid packets successive overlap-add to the call processor is preferable. In the present invention, since the call processing unit overlaps and deletes two consecutive valid packets located in the middle, the voice deterioration due to the packet loss concealment process can be reduced.
  • the fluctuation absorbing processing program when the fluctuation absorbing processing program inserts a packet into the fluctuation absorbing buffer in the buffer size changing step, if there are two consecutive valid packets, the program is between these two valid packets. It is preferable to cause the call processing unit to perform processing for inserting an invalid packet not including voice. In the present invention, if there are two consecutive valid packets, the call processing unit inserts an invalid packet that does not include voice between the two valid packets. Can be small.
  • the second software detects an audio data loss detection processing program for detecting loss of all or part of audio data output from the transmission processing unit, and detects a pitch of audio from the audio data.
  • a program for pitch detection processing, and audio data missing compensation processing for compensating for missing voice data based on a pitch detected by the pitch detection processing when voice data missing is detected by the voice data missing detection processing.
  • the pitch detection processing program includes a process of setting an audio signal having a time width from the current time to the past as a reference signal, and sliding the reference signal from the current time to the past with respect to the audio signal. And detecting the pitch of the audio signal by obtaining the correlation between the reference signal and the audio signal.
  • the reference signal and a process of increasing the time width of the reference signal as the amount of sliding is increased is possible to perform the call processing unit of the preferred.
  • the time width of the reference signal increases as the slide amount of the reference signal increases, it is possible to accurately detect the pitch of the audio signal immediately before the loss occurrence time.
  • the pitch detection processing program causes the call processing unit to perform a process of setting a time width of the reference signal to a predetermined initial time width until a slide amount of the reference signal reaches a predetermined slide reference value. It is preferable to carry out. According to the present invention, even when the slide amount of the reference signal is small, it is possible to ensure a time width of the reference signal equal to or larger than a certain amount, and the correlation between the reference signal and the audio signal is more accurate. You can ask well.
  • the pitch detection processing program causes the call processing unit to perform processing for obtaining a correlation between the reference signal and the voice signal by an average amplitude difference function method.
  • the correlation between the reference signal and the audio signal can be obtained with high accuracy with a relatively small amount of calculation.
  • the pitch detection processing program causes the call processing unit to perform a process of obtaining a correlation between the reference signal and the voice signal using an average amplitude difference function of Expression (1).
  • ⁇ ( ⁇ ) is the correlation value
  • N is the time width of the reference signal
  • x (j) is the reference signal
  • x (j ⁇ ) is the audio signal
  • k + 1 is the starting point of the reference signal
  • a represents a predetermined coefficient
  • represents the slide amount of the reference signal.
  • the correlation between the reference signal and the audio signal can be obtained with higher accuracy by using Expression (1).
  • the second software includes a program for audio data loss detection processing for detecting loss of all or part of the audio data output from the transmission processing unit, and audio data from the audio data.
  • a program for pitch detection processing for detecting a pitch, and audio data for compensating for missing audio data based on the pitch detected by the pitch detection processing when audio data loss is detected by the audio data loss detection processing.
  • a program for missing compensation processing and a program for speech speed conversion processing for expanding or compressing the audio data using the pitch detected by the pitch detection processing.
  • the voice data missing compensation process program and the speech speed conversion process program are respectively pitch detection. Compared to a configuration equipped with a processing program, it is possible to suppress the consumption of memory for loading the program.
  • the pitch detection process counts a predetermined detection cycle and repeatedly detects the pitch in synchronization with the detection cycle.
  • the audio data loss detection process detects a loss of audio data
  • the pitch is detected at the time of detection of the missing audio data and the detection cycle is restarted from the detection time. In the present invention, it is possible to maintain the quality of the voice after the voice data missing compensation process.
  • the pitch detection process detects only a pitch in a predetermined frequency range. In the present invention, since the pitch detection in an unnecessary frequency range is not performed, the processing load can be reduced.
  • the speech speed conversion process detects a voice section of the voice data and converts only the voice data of the voice section.
  • the speech speed conversion process since the speech speed conversion process is not performed in a section other than the voice section (for example, a silent section), the processing load in the speech speed conversion process can be reduced.
  • the audio data loss detection processing is performed in synchronization with a first time interval obtained by dividing a time length of the audio data for one packet by a positive integer and the input timing of the audio data. It is preferable that the pitch detection process detects the pitch in synchronization with the detection period obtained by multiplying the first time interval by a positive integer and the first time interval. In the present invention, the pitch detection process detects the pitch in synchronization with the detection period obtained by multiplying the first time interval by a positive integer and the first time interval. There is an advantage that the control becomes simple.
  • the speech speed conversion process is performed when the speech data loss detection process detects speech data loss when the speech data loss detection process detects speech data loss. It is preferable that speech speed conversion be performed using the pitch detected by the pitch detection process immediately before. According to the present invention, it is possible to suppress deterioration in voice quality due to the speech speed conversion process.
  • the speech speed conversion process is performed by using speech data compensated by the speech data loss compensation process when speech speed conversion is performed when the speech data loss detection process detects a lack of speech data. It is preferable to perform speech speed conversion using the pitch detected by the pitch detection process. In the present invention, even when the speech speed conversion process is started when voice data is missing, the pitch detection process only needs to be executed at a constant detection cycle. There is an advantage that the control becomes simple.
  • the pitch detection process discriminates between a voice section and a non-voice section of the voice data, and makes the detection period in the non-voice section longer than the detection period in the voice section.
  • the pitch detection since the pitch detection is performed with a relatively short detection period in the voice section, the quality of speech speed conversion processing is ensured, and the pitch detection is performed with a relatively long detection period in the non-voice section. Therefore, the processing load can be reduced.
  • the second software includes a voice switch processing program that reduces a loop gain of a closed loop formed by an acoustic echo path generated by acoustic coupling between the microphone and a speaker and suppresses howling.
  • the voice switch processing program estimates a feedback gain of the acoustic echo path, and, based on the estimated value of the feedback gain, attenuates the received voice attenuation data received from the transmission processing unit; Calculating the sum of the attenuation on the transmission side that attenuates the voice data of the transmission input to the transmission processing unit, monitoring the voice data of the transmission and reception, estimating the call state, and The distribution of the transmission side attenuation and the reception side attenuation is determined according to the state estimation result and the calculated value of the sum, and the estimated value of the feedback gain is reduced. It is preferred to perform the process for reducing the total depending on the amount to the call processor.
  • the call processing unit determines the distribution of the transmission-side attenuation amount and the reception-side attenuation amount in accordance with the estimation result of the call state and the calculated value of the sum, and determines the estimated value of the feedback gain. Since the sum is decreased according to the amount of decrease, call quality in the packet transmission method can be further improved.
  • the power supply device includes an extension connection line to which a communication device installed in a house is connected, and an extension analog signal transmission unit that transmits an analog voice signal through the extension connection line. It is preferable that the voice data processed by executing the first software in the call processing unit is transmitted from the extension analog signal transmission unit to the call device via the extension connection line. According to the present invention, an extension call can be made with the call device by an analog transmission method.
  • the first software detects a pitch of a voice from a digital voice signal obtained by A / D converting the analog voice signal and uses the pitch for the digital voice. It is preferable to include a speech speed conversion processing program for expanding or compressing a signal. In the present invention, since the first software includes a program for converting the speech speed, the speech speed of the voice uttered by the other party can be made faster or slower even in a call using the analog transmission method.
  • FIG. 4A is a block diagram for explaining an operation during an intercom call with the door phone slave unit according to the first embodiment of the present invention, and FIG.
  • FIG. 4B illustrates an operation during an extension call with the sub master unit according to the first embodiment of the present invention. It is a block diagram for doing.
  • FIG. 5A is a block diagram for explaining an operation during an interphone call with the lobby interphone according to the first embodiment of the present invention
  • FIG. 5B explains an operation during an interphone call with the management room device according to the first embodiment of the present invention.
  • FIG. 5C is a block diagram for explaining an operation during an interphone call with another dwelling unit according to Embodiment 1 of the present invention
  • FIG. 5D is a lobby interphone or management room device according to Embodiment 1 of the present invention.
  • FIG. 6 is a block diagram for explaining an operation during an interphone call between the mobile phone and the sub-master.
  • Embodiment 1 of this invention It is a figure explaining the process of the template setting part and pitch detection part of Embodiment 1 of this invention.
  • the graph of the correlation value of Embodiment 1 of the present invention is shown. It is a flowchart which shows the audio
  • FIG. 23A is a schematic diagram showing processing at the time of packet insertion by the buffer size changing unit
  • FIG. 23B is a schematic diagram showing processing at the time of packet deletion by the buffer size changing unit.
  • 30A and 30B are explanatory diagrams of processing in which the buffer size changing unit deletes one invalid packet.
  • 31A and 31B are explanatory diagrams of processing in which the buffer size changing unit inserts one packet by overlap addition.
  • 32A and 32B are diagrams for explaining processing when five packets are inserted into the jitter buffer at one time.
  • 33A, 33B, and 33C are diagrams for explaining processing when a valid packet corresponding to a deleted invalid packet is received after the invalid packet is deleted.
  • FIGS. 34A and 34B are diagrams illustrating processing when the buffer size changing unit inserts a concealed packet in place of an invalid packet into the jitter buffer. It is the flowchart which showed the deletion process by the buffer size change part.
  • Embodiment 1 of the present invention will be described in detail with reference to FIGS. First, an intercom system for an apartment house including a dwelling unit according to the present invention will be described.
  • the intercom system for an apartment house in this embodiment includes a common unit device (lobby interphone) LI installed at the common entrance (lobby) of the apartment house, and a dwelling unit installed in each unit of the apartment house Unit A (only one shown), door phone slave unit B installed at the entrance of each dwelling unit, signal trunk line Ls connected to lobby interphone LI, and branch unit from signal trunk line Ls A dwell unit line Ld connected to A and a slave unit connection line Lb connecting the dwell unit A and the door phone slave unit B are provided.
  • lobby interphone lobby interphone
  • control unit CT connected to the dwelling unit A and the lobby intercom LI via the signal trunk line Ls and the dwelling unit line Ld, and the lobby intercom LI and each And a management room device X that exchanges voice information and the like with the dwelling unit A.
  • communication devices (secondary master units) C are installed in the dwelling unit, and the dwelling unit (parent unit) A and the second master unit C are connected by the extension connection line Lc. Yes.
  • the door phone slave unit B transmits a call signal via the microphone and speaker, a call button that accepts a visitor's call operation, and the slave unit connection line Lb, and transmits and receives voice signals to and from the dwell unit A (analog transmission). ) Communication unit.
  • a visitor image captured by the camera is analog-transmitted from the doorphone slave unit B to the dwelling unit A via the slave unit connection line Lb.
  • the dwelling unit A transfers the video transmitted from the door phone slave unit B to the sub-master unit C via the extension connection line Lc.
  • the dwelling unit A and the sub-main unit C if the video transmitted from the doorphone slave unit B is displayed on the monitor (display unit 3) and the response button of the dwelling unit A is pressed, the dwelling unit A and the doorphone slave unit are displayed. A call can be made with B, and if the response button of the sub-master unit C is pressed, a call can be made between the sub-master unit C and the door phone slave unit B.
  • the sub-master C includes a microphone and a speaker, a call button for receiving an extension call operation, a communication unit that transmits a call signal and transmits / receives an audio signal (analog transmission) via the extension connection line Lc. Yes.
  • the lobby interphone LI packet-transmits voice information and video information via the signal trunk line Ls, an imaging device that captures the image of the visitor, a microphone and a speaker, a numeric keypad or touch panel for the visitor to enter the dwelling unit number of the visited residence.
  • a transmission unit and the like are provided.
  • the lobby intercom LI when the ten key switch or touch panel is operated to accept the operation input of the dwelling unit number of any dwelling unit, the packet storing the dwelling unit number in the data field and the video of the visitor captured by the imaging device (video) A packet storing information) in the data field is transmitted (packet transmission) from the transmission unit to the address of the control device CT via the signal trunk line Ls.
  • the management room device X includes a microphone and a speaker, a numeric keypad or a touch panel for an administrator to input a dwelling unit number of a contact destination, a transmission unit for transmitting voice information through the signal trunk line Ls, and the like.
  • the control unit transmits the packet storing the dwelling unit number in the data field from the transmission unit via the signal trunk line Ls. Send to CT address.
  • the control device CT stores the correspondence between the address assigned to the dwelling unit A of each dwelling unit and the dwelling unit number of the dwelling unit, and stores it in the data field of the packet received from the lobby intercom LI or the management room device X.
  • the stored unit number is compared with the correspondence and converted into an address, the address is stored in the destination address field, and a call command for notifying the call from the lobby intercom LI or the control room device X is stored in the data field.
  • the stored packet and the packet storing the video information in the data field are sent to the signal trunk line Ls.
  • the lobby intercom LI, the management room device X, and the control device CT as described above are conventionally known, detailed illustration and description thereof will be omitted.
  • the dwelling unit A includes a control unit 1, a microphone 2a and a speaker 2b, a call processing unit 2, a display unit 3, a video processing unit 4, a storage unit 5, a call detection unit 6, a transmission processing unit 7, and a secondary communication processing unit 8. , An analog signal transmission unit 9, a first conversion processing unit 10, a second conversion processing unit 11, a first switching unit 12, a second switching unit 13, a third switching unit 14, and the like.
  • the analog voice signal (speech voice signal) output from the microphone 2a is amplified by the amplifier AMP1, and then converted into a digital voice signal (speech voice) by the A / D converter 10a of the first conversion processing unit 10. Data) and input to the call processing unit 2.
  • the digital voice signal (received voice signal) after the call processing by the call processing unit 2 is converted into an analog received voice signal by the D / A converter 10b of the first conversion processing unit 10, and then the amplifier. Amplified by AMP2 and output to the speaker 2b.
  • the digital transmission voice signal (transmission voice data) processed by the call processing unit 2 is transmitted by the D / A converter 11a of the second conversion processing unit 11 in the case of a door phone call or an extension call to be described later.
  • the digital transmitted voice signal after the call processing by the call processing unit 2 is directly output to the transmission processing unit 7.
  • the analog reception voice signal output from the analog signal transmission unit 9 is amplified by the amplifier AMP4 and then digitally received by the A / D converter 11b of the second conversion processing unit 11 (reception voice data). And is input to the call processing unit 2.
  • the digital received voice signal output from the transmission processing unit 7 is directly input to the call processing unit 2.
  • the analog signal transmission unit 9 is composed of a conventionally known 2-wire / 4-wire converter (hybrid transformer).
  • the first switching unit 12 is connected to the two-wire side of the analog signal transmission unit 9.
  • the first switching unit 12 selectively switches between a state in which the two-wire side of the analog signal transmission unit 9 is connected to the slave unit connection line Lb and a state in which the analog signal transmission unit 9 is connected to the second switching unit 13.
  • the second switching unit 13 selectively switches the first switching unit 12 between a state where it is connected to the extension connection line Ld and a state where it is not connected.
  • the third switching unit 14 selectively switches between a state in which the slave unit connection line Lb and the extension connection line Lc are connected and a state in which it is not connected. Note that the switching of the first to third switching units 12, 13, and 14 is all controlled by the control unit 1.
  • the control unit 1 has a microcomputer as a main component and controls the entire dwelling unit A including the switching control.
  • the display unit 3 includes a display device such as a liquid crystal display, a driver circuit that drives the display device, a touch panel as an input device, and the like.
  • the video processing unit 4 performs signal processing on the video signal received from the transmission processing unit 7 and displays the video on the display unit 3. Specifically, a video (still image or moving image) of a visitor packet-transmitted from the lobby interphone LI is displayed on the display unit 3.
  • the call processing unit 2 includes a microprocessor, an ASIC (Application Specific Integrated Circuit) or a DSP (Digital Signal Processor) and performs various controls and various calculations for call processing. Data and received voice data) are subjected to various signal processing (call processing).
  • the storage unit 5 includes an electrically rewritable nonvolatile semiconductor memory (flash memory or the like), and stores first software and second software.
  • the first software is composed of a collection of a plurality of programs for performing various call processing on the audio signal transmitted by the analog signal transmission unit 9 by the analog transmission method.
  • the second software is composed of a collection of a plurality of programs for performing various call processing on the audio signal transmitted by the packet transmission method by the transmission processing unit 7. Details of each program will be described later.
  • the transmission processing unit 7 performs packet transmission with the control device CT and other dwelling units A via the signal trunk line Ls (including the dwelling unit line Ld, the same applies hereinafter).
  • the transmission processing unit 7 divides the control signal (control data) created by the control unit 1 to create a packet (control packet), and the transmission voice signal (transmission voice data) also created by the call processing unit 2. ) To create a packet (voice packet). Further, the transmission processing unit 7 encodes the control packet and the voice packet, converts (modulates) the encoded bit string into an electric signal, and sends the electric signal to the signal trunk line Ls.
  • the transmission processing unit 7 converts (demodulates) an electric signal flowing through the signal trunk line Ls into a bit string and decodes a packet (voice packet, control packet, video packet) from the demodulated bit string.
  • the transmission processing unit 7 discards the packet if the address of the decrypted packet does not match its own address (address of the dwelling unit A), and if the address matches, it is included in the data field of the packet. If the data is video data (video signal), it is output to the video processing unit 4, if it is control data (control signal), it is output to the control unit 1, and if it is audio data (voice signal), it is output to the call processing unit 2.
  • the secondary master communication processing unit 8 encodes and frequency-modulates the control data for the secondary master created by the control unit 1 and transmits it to the secondary master C via the extension connection line Lc. Control data obtained by frequency-demodulating and decoding a control signal transmitted from the sub-master unit C via Lc is passed to the control unit 1.
  • the door phone call between the dwelling unit A and the door phone slave unit B will be described.
  • a call button of the door phone slave unit B is operated by a visitor
  • a call signal is transmitted from the door phone slave unit B via the slave unit connection line Lb.
  • the call detection unit 6 that has detected the call signal outputs a call detection signal to the control unit 1.
  • the control unit 1 sounds a ringing tone from the speaker 2b.
  • the doorphone cordless handset B is equipped with a camera
  • the camera is activated to image a visitor, and the captured image is transmitted from the doorphone cordless handset B via the cordless handset connection line Lb. Is transmitted.
  • the video transmitted through the slave unit connection line Lb is displayed on the display unit 3 by the video processing unit 4.
  • the control unit 1 performs the first operation.
  • the switching unit 12 is controlled so that the two-wire side of the analog signal transmission unit 9 is connected to the slave unit connection line Lb, and the third switching unit 14 is switched to the disconnected state.
  • the call processing unit 2 executes the first software to perform the call processing, so that a resident of the dwelling unit and a visitor make a doorphone call using the dwelling unit A and the doorphone slave unit B. Can do.
  • the control unit 1 that has received the call detection signal causes the secondary phone communication processing unit 8 to transmit the doorphone call control signal and switches the third switching unit 14 to the connected state so that the slave unit connection line Lb is connected.
  • the video transmitted via the extension connection line Lc is transmitted to the secondary master unit C.
  • a ringing tone is generated from the speaker and a video of the visitor is displayed on the monitor. Then, when the resident who has heard the ringing tone confirms the image of the visitor displayed on the monitor and operates the response button of the secondary master unit C, the secondary phone C to the residential unit A via the extension connection line Lc.
  • a response control signal is transmitted.
  • the control signal (control data) of the doorphone response is output from the peer-to-subordinate communication processing unit 8 to the control unit 1, and the control unit 1 that has received the control data changes the connection state of the third switching unit 14. Keep it as it is.
  • a resident of the dwelling unit and a visitor can make a doorphone call using the sub-master C and the doorphone slave unit B.
  • the call processing unit 2 of the dwelling unit A does not perform any call processing.
  • an extension call between the dwelling unit A and the secondary master unit C will be described.
  • a control signal for extension call is transmitted from the secondary master unit C via the extension connection line Lc.
  • an extension call control signal (control data) is output from the secondary master communication processing unit 8 to the control unit 1.
  • the control unit 1 Upon receiving the extension call control data, the control unit 1 causes the speaker 2b to ring.
  • the control unit 1 controls the first switching unit 12 so that the two-wire side of the analog signal transmission unit 9 is switched.
  • the second switching unit 13 is connected and the second switching unit 13 is controlled to connect the first switching unit 12 to the extension connection line Lc.
  • control unit 1 instructs the call processing unit 2 to load and execute the first software stored in the storage unit 5. Then, as shown in FIG. 4B, when the call processing unit 2 executes the first software to perform the call processing, the residents in the same dwelling unit can make an extension call using the dwelling unit A and the sub-master unit C. it can.
  • extension call control signal transmitted from one secondary master unit C is received not only by the dwelling unit A but also by the other secondary master unit C.
  • response button is operated on the other secondary master unit C that has received the control signal, a communication path is formed between the two secondary master units C and C via the extension connection line Lc, and the same dwelling unit residents can make extension calls using the sub-masters C and C, respectively.
  • the first software includes a voice switch processing program for switching the call direction, an acoustic echo canceller processing program for suppressing acoustic echo, a line echo canceller processing program for suppressing line echo, and an output from the speaker 2b. And a speech speed conversion processing program for reducing or speeding up the speed (speech speed) of the voice of the other party to be called.
  • the call processing unit 2 executing the first software includes a voice switch VS, an acoustic side echo canceller EC1, a line side echo canceller EC2, and a speech rate conversion processing unit SE as shown in FIG.
  • the voice switch VS, the acoustic side echo canceller EC1, the line side echo canceller EC2, and the speech rate conversion processing unit SE the signal processing circuit such as DSP constituting the speech processing unit 2 is a voice switch processing program, and the acoustic side echo canceller. This is realized by executing a processing program, a line-side echo canceller processing program, and a speech rate conversion processing program, respectively.
  • the first and second conversion processing units 10 and 11 are not shown.
  • Acoustic side echo canceller EC1 has a conventionally known structure comprising an adaptive filter ADF1 a subtractor SUB1, adapting the impulse response of the feedback path (acoustic echo path) H AC formed by the acoustic coupling between the loudspeaker 2b- microphones 2a
  • An echo component (acoustic echo) that is adaptively identified by the filter ADF1 and estimated from the reference signal (output signal to the first conversion processing unit 10) is input by the subtractor SUB1 from the first conversion processing unit 10 ( The echo component is suppressed by subtracting from the transmitted voice signal).
  • the line-side echo canceller EC2 also has a conventionally known configuration including an adaptive filter ADF2 and a subtractor SUB2, and impedance between the analog signal transmission unit 9 and the transmission path (slave unit connection line Lb or extension connection line Lc).
  • An echo component (line echo) that is adaptively identified by the filter ADF2 and estimated from the reference signal (the output signal to the second conversion processing unit 11, that is, the transmitted voice signal) is subtracted from the received voice signal by the subtractor SUB2. In this way, the echo component is suppressed.
  • a voice switch VS is provided between the acoustic echo canceller EC1 and the line echo canceller EC2.
  • the voice switch VS includes a transmission side attenuator 100 for attenuating a transmission voice signal, a reception side attenuator 101 for attenuating a reception voice signal, and attenuation amounts (insertion) in the transmission side and reception side attenuators 100, 101.
  • an insertion loss amount control unit 102 for controlling the loss amount).
  • the insertion loss amount control unit 102 includes a total loss amount calculation unit 103 and an insertion loss amount distribution processing unit 104.
  • Total loss amount calculation unit 103 the route for returning from the output point Rout of the receiving side attenuator 101 to the input point Tin of the transmitting end attenuator 100 via the acoustic echo path H AC (hereinafter referred to as "acoustic side feedback path" )
  • acoustic side feedback path On the acoustic side feedback gain ⁇ , and a feedback path from the output point Tout of the transmitting side attenuator 100 to the input point Rin of the receiving side attenuator 101 via the line echo path H LIN (hereinafter referred to as ⁇ line side
  • the line-side feedback gain ⁇ of the feedback path) is estimated, and the total amount of loss to be inserted into the closed loop based on the estimated values ⁇ ′ and ⁇ ′ of the feedback gains ⁇ and ⁇ on the acoustic side and the line side (transmission)
  • the insertion loss amount distribution processing unit 104 monitors the transmission voice signal and the reception voice signal to estimate the call state, and according to the estimation result and the calculated value of the total loss amount calculation unit 103, the transmission side attenuator 100 and The distribution of each attenuation amount (insertion loss amount) of the receiving side attenuator 101 is determined.
  • the total loss calculation unit 103 estimates the time-average power in a short time of the input signal (speech voice signal) of the transmission side attenuator 100 using a rectifier smoother, a low-pass filter, etc. and using a low-pass filter or the like to estimate the time average power in a short time of the output signal of the receiving side attenuator 101 (received voice signal), the receiving side in the maximum delay time assumed in acoustic side feedback path H AC
  • the minimum value of the estimated value of the time average power of the output signal of the attenuator 101 is obtained, and the value obtained by dividing the estimated value of the time average power of the input signal of the transmission side attenuator 100 by this minimum value is the acoustic feedback gain ⁇ .
  • the estimated value ⁇ ′ Further, the total loss calculation unit 103 estimates the time average power of the input signal (received voice signal) of the reception side attenuator 101 in a short time using a rectifier smoother, a low-pass filter, etc. Estimate the short time average power of the output signal (speech voice signal) of the transmission side attenuator 100 using a low-pass filter etc., and send it at the maximum delay time assumed in the line side feedback path H LIN . Obtain the minimum value of the estimated value of the time average power of the output signal of the talker attenuator 100, and divide the estimated value of the time average power of the input signal (received voice signal) of the receive side attenuator 101 by this minimum value.
  • the total loss amount calculation unit 103 calculates the total loss amount Lt necessary to obtain a desired gain margin MG from the estimated values ⁇ ′ and ⁇ ′ of the acoustic side feedback gain ⁇ and the line side feedback gain ⁇ , The value Lt is output to the insertion loss amount distribution processing unit 104.
  • the insertion loss distribution processing unit 104 monitors the input / output signals of the transmitting side attenuator 100 and the input / output signals of the receiving side attenuator 101, and determines the power level of these signals and information such as the presence or absence of speech. Attenuate the call state (receiving state, transmitting state, etc.) and distribute each loss so that the total loss Lt is distributed to the transmitting side attenuator 100 and the receiving side attenuator 101 at a rate according to the determined call state The attenuation amount (insertion loss amount) of the devices 100 and 101 is adjusted.
  • the total loss calculation unit 103 calculates an adaptive update by calculating the sum of loss amounts to be inserted into the closed loop based on the estimated values ⁇ ′ and ⁇ ′ of the feedback gains ⁇ and ⁇ as described above, and There are two operation modes, a fixed mode for fixing the total loss amount to a predetermined initial value.
  • the total loss amount calculation unit 103 operates in the fixed mode during the period from the start of the call with the other party's call terminal until the echo cancellers EC1 and EC2 on the acoustic side and the line side sufficiently converge, and the acoustic side and the line In the period after the echo cancellers EC1 and EC2 on the side have sufficiently converged, it operates in the update mode.
  • the total loss amount calculation unit 103 has both the estimated values ⁇ ′ and ⁇ ′ of the acoustic feedback gain ⁇ and the line feedback gain ⁇ continuously for a predetermined time (several hundred milliseconds) from the start of a call for a predetermined threshold ⁇ (for example, it is considered that the echo cancellers EC1 and EC2 on the acoustic side and the line side have sufficiently converged when the values are below 10 dB to 15 dB smaller than the estimated values ⁇ ′ and ⁇ ′ at the start of the call.
  • the operation mode is switched to the update mode in which the total loss amount is adaptively updated based on the estimated values ⁇ ′ and ⁇ ′.
  • the initial value of the total loss amount in the fixed mode is set to a value sufficiently larger than the total loss amount updated as needed in the update mode.
  • the total loss amount calculation unit 103 operating in the fixed mode. Since the initial total loss amount is inserted into the closed loop, it is possible to suppress the generation of unpleasant echoes (acoustic echoes and line echoes) and howling, and realize a stable half-duplex call. Also, in the state where the echo cancellers EC1 and EC2 on the acoustic side and the line side have sufficiently converged after the start of the call, the operation mode of the total loss calculation unit 103 is switched from the fixed mode to the update mode and closed loop. Since the total loss amount to be inserted into the value decreases to a value sufficiently lower than the initial value, two-way simultaneous calls can be realized.
  • the total loss calculation unit 103 executes an estimation process of the acoustic side feedback gain ⁇ and the line side feedback gain ⁇ at a predetermined sampling period from the time when the fixed mode is changed to the update mode, and the estimated value ⁇ ′ (n), ⁇ ′ (n) is calculated (step 1), and the gain margin of the closed loop is maintained at MG [dB] from the product of these two estimated values ⁇ ′ (n) and ⁇ ′ (n) and the gain margin MG.
  • the desired total loss amount Lr (n) required for the above is calculated by the following equation (step 2).
  • Lr (n) 20log
  • ⁇ ′ (n), ⁇ ′ (n), and Lr (n) indicate an estimated value of feedback gain and a desired total loss amount calculated by the nth sampling from the update mode transition point, respectively.
  • the total loss amount calculation unit 103 calculates the n-th total loss amount desired value Lr (n) calculated from the above formula and the previous (n ⁇ 1th) total loss amount Lt (n ⁇ 1), that is, the previous processing.
  • the total loss calculation unit 103 by suppressing the increase / decrease in the total loss amount by the total loss calculation unit 103 to a small value of ⁇ i or ⁇ d, just after the start of a call with the other party's call terminal (door phone slave unit B or sub master unit C).
  • the acoustic side and line side echo cancellers EC1 and EC2 actively update the coefficients toward convergence, so even when the acoustic side feedback gain ⁇ and the line side feedback gain ⁇ change drastically, there is a sense of discomfort in hearing. Can be eliminated.
  • the speech rate conversion processing unit SE converts the speech rate of the original speech by expanding or compressing the speech (received speech) .
  • the well-known conventionally called PICOLA (Pointer Interval Controlled OverLap and Add) Based on the speech speed conversion algorithm, the speech speed is converted (fast or slow) by inserting or deleting waveforms in pitch units.
  • PICOLA Pointer Interval Controlled OverLap and Add
  • the speech speed is converted (fast or slow) by inserting or deleting waveforms in pitch units.
  • “Pitch” is the pitch of the voice determined by the vibration period of the vocal cords. If the vibration period of the vocal cords is short, the voice will be high, and if the vibration period is long, the voice will be low. .
  • the speech speed conversion processing unit SE performs the speech speed conversion process during a doorphone call with the doorphone slave unit B or an extension call with the sub-master unit C, the other party of the call that is ringed from the speaker 2b of the dwelling unit A
  • the speech speed can be made faster or slower than the speech speed actually spoken by the other party.
  • the intercom call between the dwelling unit A and the lobby intercom LI will be described.
  • the packet storing the dwelling unit number in the data field and the visitor imaged by the imaging device A packet storing video (video data) in the data field is transmitted (packet transmission) from the transmission unit to the address of the control device CT via the signal trunk line Ls.
  • the control device CT sends a packet storing a call command for notifying a call from the lobby intercom LI in the data field and a packet storing the video data in the data field to the signal trunk line Ls.
  • the transmission processing unit 7 receives the packet via the dwelling unit line Ld
  • the paging command (control signal) stored in the data field of the packet is controlled.
  • the video data stored in the data field is output to the video processing unit 4 while being output to the unit 1.
  • the control unit 1 receives the call command
  • the control unit 1 causes the speaker 2b to ring.
  • the video processing unit 4 processes the video signal received from the transmission processing unit 7 and causes the display unit 3 to display the video of the visitor.
  • the resident who has heard the ringing tone confirms the video of the visitor displayed on the display unit 3 of the dwelling unit A and then operates the response button
  • the control unit 1 causes the call processing unit 2 to store the storage unit.
  • the second software stored in 5 is instructed to be loaded and executed. Then, as shown in FIG. 5A, when the call processing unit 2 executes the second software to perform the call processing, the resident of the dwelling unit and the visitor can make an interphone call using the dwelling unit A and the lobby intercom LI. it can.
  • the lobby interphone LI has almost the same configuration as the right side dwelling unit A in FIG. 5A except for the speech speed conversion processing unit SE as shown on the left side in FIG. Those having the same functions as those of the units of the dwelling unit A are given the same reference numerals.
  • the management room device X when the manager operates the numeric keypad or the touch panel and receives the operation input of the dwelling unit number of any dwelling unit, the packet storing the dwelling unit number in the data field is transmitted from the transmission unit via the signal trunk line Ls.
  • the control device CT Packet transmission.
  • the control device CT sends a packet storing a call command for notifying a call from the management room device X in the data field to the signal trunk line Ls.
  • the transmission processing unit 7 receives the packet via the dwelling unit line Ld
  • the paging command (control signal) stored in the data field of the packet is controlled.
  • the control unit 1 causes the speaker 2b to ring.
  • the control unit 1 instructs the call processing unit 2 to load and execute the second software stored in the storage unit 5. Then, as shown in FIG. 5B, the call processing unit 2 executes the second software to perform the call processing, so that the resident and the manager of the dwelling unit make an interphone call using the dwelling unit A and the management room device X. Can do.
  • the management room apparatus X has substantially the same configuration as the dwelling unit A on the right side of FIG. Therefore, the same code
  • the secondary master unit C responds to a call from the lobby intercom LI or the management room device X.
  • the call processing unit 2 of the dwelling unit A executes the second software as shown in FIG. By doing this, the residents of the dwelling unit and the visitors or managers can make interphone calls using the sub-master C and the lobby intercom LI or the management room device X.
  • the intercom call between the dwelling units A installed in different dwelling units will be described.
  • the dwelling unit A when the resident operates the numeric keypad and receives an operation input of the dwelling unit number of another dwelling unit, a packet storing the dwelling unit number in the data field is transmitted from the transmission unit via the signal trunk line Ls of the control device CT. Send to address (packet transmission).
  • the control device CT sends a packet storing a call command for notifying the call from the dwelling unit A in the data field to the signal trunk line Ls.
  • a call command (control signal) stored in the data field of the packet Is output to the control unit 1.
  • the control unit 1 causes the speaker 2b to ring.
  • the control unit 1 instructs the call processing unit 2 to load and execute the second software stored in the storage unit 5. Then, as shown in FIG. 5C, the call processing unit 2 in the dwelling unit A of each dwelling unit executes the second software to perform call processing, so that residents in different dwelling units use the dwelling unit A. Intercom calls can be made.
  • the second software includes a voice switch processing program for switching the call direction, an acoustic echo canceller processing program for suppressing acoustic echo, an echo suppressor processing program for suppressing residual echo, and packet loss associated with packet transmission.
  • Audio data loss compensation processing program that compensates for loss of audio data due to noise, a fluctuation absorption processing program that absorbs delay and fluctuation (jitter) associated with packet transmission, and the voice of the other party's voice output from the speaker 2b
  • a speech speed conversion processing program for decreasing or increasing the speed (speech speed).
  • the call processing unit 2 executing the second software includes a voice switch VS, an acoustic echo canceller EC1, an echo suppressor ES, a speech speed conversion processing unit SE, a voice data loss compensation unit VC, and fluctuations.
  • Absorption processing unit JA is provided.
  • the voice switch VS, the acoustic side echo canceller EC1, the echo suppressor ES, the speech speed conversion unit SE, the voice data loss compensation unit VC, and the fluctuation absorption processing unit JA are signal processing circuits such as a DSP constituting the call processing unit 2.
  • the voice switch VS has the same configuration as the voice switch VS when the first software is executed, and therefore detailed illustration of the configuration is omitted.
  • the voice switch VS in the second software is different from the first software in that the total loss amount calculated by the total loss amount calculation unit 103 is reduced according to the reduction amount of the estimated value ⁇ ′ of the acoustic feedback gain ⁇ . It is different from the voice switch VS.
  • the total loss calculation unit 103 considers two types of feedback gains of the acoustic side feedback gain ⁇ and the line side feedback gain ⁇ and calculates the total loss amount. It is necessary to calculate.
  • the packet transmission system since no feedback path is formed, there is no need to consider the line side feedback gain ⁇ . Therefore, in the voice switch VS in the second software, by reducing the total loss amount calculated by the total loss amount calculation unit 103 according to the reduction amount of the estimated value ⁇ ′ of the acoustic feedback gain ⁇ as described above, A two-way simultaneous call can be realized more reliably.
  • the echo suppressor ES is provided between the transmission processing unit 7 and the voice switch VS in the signal path of the transmission voice signal, and attenuates residual echo (acoustic echo that could not be suppressed by the acoustic echo canceller EC1, the same applies hereinafter). Is. In other words, in the packet transmission system that divides voice data into packets and transmits it, the transmission delay is longer than in the analog transmission system, and a residual echo that cannot be suppressed by the acoustic echo canceller EC1 occurs. It is necessary to increase the amount of echo suppression by the echo suppressor ES. Note that the echo suppressor ES effectively attenuates the residual echo, while the audio signal to be transmitted (transmitted audio signal) needs not to be attenuated.
  • the echo suppressor ES attenuates the transmitted voice signal in conjunction with the voice switch VS, and specifically operates as shown in the flowchart of FIG. That is, the echo suppressor ES always monitors the state of the voice switch VS (the estimation result of the call state ⁇ receiving state or transmitting state> by the insertion loss distribution processing unit 104) (step 1), and the voice switch VS is in the receiving state. In some cases, it is assumed that there is no transmission voice signal to be transmitted to the signal path, and the input signal is attenuated by being multiplied (multiplied) by the input signal (step 2).
  • the echo suppressor ES determines that there is no residual echo to be canceled or there is a transmission voice signal to be transmitted, and does not apply an attenuation coefficient to the input signal.
  • the output is output as it is without being attenuated (step 3).
  • the transmission is caused by the transmission delay.
  • the residual echo generated in the signal path of the speech signal can be attenuated by the echo suppressor ES.
  • two-way simultaneous calls can be reliably realized even in the packet transmission method.
  • the voice switch VS is not in the reception state, for example, when the echo suppressor ES attenuates the transmission voice signal in the transmission state, the near-end speaker (resident who talks on the dwelling unit A). May be attenuated inadvertently, resulting in an increase in the volume of the near-end speaker that can be heard from the other party's call device.
  • the echo suppressor ES attenuates the input signal when the voice switch VS is in the receiving state, and the echo suppressor ES does not attenuate the input signal when the voice switch VS is not in the receiving state. It is possible to attenuate only an unpleasant echo (residual echo) during a call without causing any inflection.
  • the speech speed conversion processing unit SE is realized by executing the same program as the speech speed conversion processing program included in the first software, and thus the description thereof is omitted.
  • FIG. 9 is a waveform diagram of an audio signal for explaining the basic principle of audio data loss compensation processing (hereinafter abbreviated as “compensation processing”).
  • the vertical axis indicates the intensity of the received voice signal input from the transmission processing unit 7 to the call processing unit 2, and the horizontal axis indicates time.
  • the voice data loss compensation processing unit VC sets the received voice signal of a predetermined period immediately before the packet loss as a reference signal (template). To do.
  • the template is slid toward the past from the time when the packet loss occurs with respect to the reception voice signal, and the correlation calculation between the template and the reception voice signal is performed, and the reception voice signal immediately before the packet loss occurs
  • the basic period (pitch) is detected.
  • the received voice signal for one pitch is extracted retroactively, and the received voice signal is repeatedly applied to the loss period, whereby a loss period (period in which voice data is missing.
  • the loss period is compensated by the received voice signal for one pitch. For example, when the speaker utters the voice “A”, the voice “A” is divided into about 20 msec (packetization). This is because the received voice signal for one pitch immediately before the occurrence of the packet loss is likely to be repeated in the loss period because it is transmitted on one voice packet.
  • the audio data loss compensation processing unit VC includes a delay fluctuation absorbing buffer (jitter buffer) 20, a timer 21, a packet loss detection unit 22, a detection processing unit 23, and a compensation processing unit 24 as shown in FIG. However, each of these units is realized by executing a voice data loss compensation processing program by the DSP of the call processing unit 2.
  • the transmission processing unit 7 outputs the received received voice signal (received voice data) to the jitter buffer 20 in chronological order according to the sequence number.
  • the voice packet header includes a time stamp in addition to the sequence number.
  • the sequence number indicates the transmission order of the voice packets, and the time stamp indicates the relative position of the voice signal in the original voice waveform.
  • the jitter buffer 20 temporarily holds the received voice data output from the transmission processing unit 7, delays it for a predetermined time, and outputs it to the detection processing unit 23 to absorb the delay fluctuation of the voice packet.
  • the timer 21 is used when the packet loss detection unit 22 detects a packet loss.
  • the packet loss detection unit 22 starts the timer 21 timing when the jitter buffer 20 outputs the reception voice data to the detection processing unit 23, and before the jitter buffer 20 outputs the next reception voice data, the timer 21 If the measured time exceeds a predetermined time in which packet loss is assumed to occur, it is determined that packet loss has occurred.
  • the detection processing unit 23 When a packet loss is detected by the packet loss detection unit 22, the detection processing unit 23 performs a basic period (pitch) detection process on the received voice data output from the jitter buffer 20, and the packet loss detection unit 22 If no packet loss is detected, nothing is performed on the received voice data. The detection processing unit 23 holds received voice data for a certain period in the past.
  • the detection processing unit 23 includes a template setting unit 23a and a pitch detection unit 23b.
  • the template setting unit 23a sets received voice data having a predetermined time width as a template from the loss occurrence time to the past when the packet loss has occurred.
  • the template setting unit 23a increases the time width of the template as the pitch detection unit 23b increases the slide amount of the template.
  • the pitch detection unit 23b slides the template set by the template setting unit 23a toward the past from the point of occurrence of loss with respect to the reception voice data, obtains the cross-correlation between the template and the reception voice data, and calculates the template and the reception voice data.
  • the pitch of the received voice signal immediately before the point of occurrence of loss is detected from the amount of slide when the correlation peak with the maximum appears.
  • FIG. 10 is a waveform diagram of a received voice signal for explaining the processing of the template setting unit 23a and the pitch detection unit 23b.
  • shaft shown in FIG. 10 shows the intensity
  • the horizontal axis shows time by the number of samples.
  • a template TJ shown in FIG. 10 indicates a template used in the conventional compensation process.
  • a received voice signal for a predetermined period in the past from the loss occurrence time RT is set as a template TJ. Then, by sliding the template TJ toward the past from the loss occurrence time RT with respect to the received voice signal, the cross-correlation between the received voice signal and the template TJ is obtained, and the template TJ when the strongest correlation peak is obtained.
  • the pitch of the received voice signal was detected from the slide amount.
  • FIG. 11 is a graph showing the calculation result of the correlation value between the template TJ and the received voice signal when the conventional template TJ is used.
  • the correlation value is calculated using a conventionally known average amplitude difference function (Average (Magnitude Difference Function).
  • the vertical axis indicates the correlation value
  • the horizontal axis indicates the time when the loss occurrence time RT is 0 as the number of samples.
  • FIG. 11 shows the correlation value by AMDF, the smaller the value, the stronger the correlation between the received voice signal and the template TJ.
  • a downwardly-correlated correlation peak PK1 appears at the time of 37 samples, and then a downwardly-correlated correlation peak PK2 appears at the time of 47 samples, and thereafter convex downward at a period of approximately 37 samples.
  • the correlation peak of appears repeatedly.
  • the correlation peak PK1 appears smaller than the correlation peak PK2. Therefore, in the conventional method, 37 samples are detected as the pitch of the received voice signal.
  • the pitch of the received voice signal immediately before the loss occurrence time RT is 47 samples. Therefore, it can be seen that in the conventional method, the pitch of the received voice signal immediately before the loss occurrence time RT is not accurately detected.
  • the time width of the template TJ is much larger than 47 samples, and the template TJ includes only one period of the received voice signal whose pitch to be detected is 47 samples, but the pitch that is not to be detected is 37. Since the sample received voice signal includes three periods, it is considered that a strong correlation peak appeared at 37 samples.
  • the pitch of 47 samples cannot be detected.
  • the time width of the template TM is increased as the slide amount of the template TM is increased as shown in FIG.
  • the template TM when the template TM is slid to some extent as in the template TM shown in the third row of FIG. 10, the template includes only 47 samples of received voice signals that are to be detected.
  • the template TM at the fourth stage in FIG. 10 includes a received voice signal with a pitch of 37 samples in addition to a received voice signal with a pitch of 47 samples. Therefore, the correlation between the third-stage template TM and the received voice signal is stronger than the correlation between the fourth-stage template TM and the received voice signal, and the pitch of the received voice signal immediately before the loss occurrence time RT is increased. It becomes possible to detect with high accuracy.
  • the pitch detection unit 23b adopts, for example, AMDF shown in the equation (1) as the correlation calculation.
  • ⁇ ( ⁇ ) is the correlation value
  • N is the time width of the template TM
  • x (j) is the template TM
  • x (j ⁇ ) is the received voice signal
  • k + 1 is the starting point of the template TM
  • a is in advance
  • indicates the slide amount of the template TM
  • j indicates the sampling number of each sampling point of the received voice signal.
  • the template setting unit 23a sets the time width of the template TM to a predetermined initial time width until the slide amount of the template TM reaches a predetermined slide reference value.
  • the time width of the template TM is set to the initial time width, and even when the slide amount is small, the time width of the template TM is larger than a certain amount.
  • the correlation between the template TM and the received voice signal (input signal) can be obtained with higher accuracy.
  • the time width of the template TM is set to the initial time width until the slide amount of the template TM reaches the slide reference value, but the amount of calculation can be reduced by relatively shortening the initial time width. .
  • the initial time width it is preferable to adopt the assumed minimum value of the pitch of the received voice signal.
  • the slide reference value for example, an initial time width may be adopted.
  • FIG. 12 is a diagram for explaining processing of the template setting unit 23a and the pitch detection unit 23b.
  • Each point on the straight line shown in FIG. 12 indicates a sampling point of the received voice signal.
  • the rightmost sampling point indicates a loss occurrence time RT, and each sampling point indicates a past sampling point toward the left.
  • the loss occurrence time RT is set as the 0th sampling point.
  • the pitch of the received voice signal is about 3 msec in a short case, and if the sampling frequency is 8 kHz, it corresponds to 24 samples. Therefore, the initial time width may be 24 samples, for example.
  • the template setting unit 23a sets the reception voice signals x (k + 1) to x (k + 4) as the template TM0.
  • the pitch detection unit 23b calculates a correlation value ⁇ (0) between the template TM0 and the received voice signal x (j-0) using the equation (1).
  • the template TM0 is applied to the audio signals x (k + 1) to x (k + 4).
  • the template TM0 is applied to the audio signals x (k) to x (k + 3).
  • the template setting unit 23a sets the audio signals x (k + 1) to x (k + 5) as the template TM5.
  • the pitch detection unit 23b obtains a correlation value ⁇ (5) between the template TM5 and the audio signal x (j-5) using Expression (1).
  • the template TM5 is applied to the audio signals x (k-4) to x (k).
  • the pitch detection unit 23b repeats the above processing until ⁇ reaches the maximum slide amount ⁇ max, and obtains ⁇ ( ⁇ ). As a result, the time width of the template TM is increased as the slide amount increases.
  • FIG. 13 shows a graph of the correlation value ⁇ ( ⁇ ) when the correlation value ⁇ ( ⁇ ) is obtained for the received voice signal shown in FIG. 10 using the method according to the present embodiment.
  • the vertical axis indicates the correlation value ⁇ ( ⁇ )
  • the horizontal axis indicates time in terms of the number of samples.
  • the correlation value ⁇ ( ⁇ ) is calculated by AMDF. Therefore, as in FIG. 11, the correlation peak with the lower correlation value has a stronger correlation between the received voice signal and the template TM.
  • the correlation peak PK1 when the template TM is shifted by 47 samples is the smallest.
  • the pitch detector 23b detects 47 samples, which are the time when the minimum correlation peak PK1 appears, as the pitch of the received voice signal immediately before the loss occurrence time RT. Therefore, it can be seen that the pitch detector 23b can detect 47 samples, which are the pitches of the received voice signal immediately before the loss occurrence time RT shown in FIG.
  • the compensation processing unit 24 extracts a reception voice signal for one pitch detected by the pitch detection unit 23b from the loss occurrence time point RT to the past, and compensates for a loss period in which a packet loss has occurred in the extracted reception voice signal Process.
  • the received voice signal shown in FIG. 10 is input to the compensation processing unit 24 and the pitch detection unit 23b detects 47 samples as the pitch, the reception of 47 samples from the loss occurrence time RT to the past is performed. A voice signal is extracted, and the received reception voice signal is repeatedly applied to the end of the loss period to compensate for the loss period.
  • FIG. 14 is a flowchart showing the procedure of the operation (audio data loss compensation processing) of the audio data loss compensation processing unit VC.
  • the pitch detection unit 23b sets a reference sampling point k so that k + 1 becomes the starting point of the template TM, and assigns a sampling number to each sampling point (step S4).
  • the pitch detection unit 23b calculates a correlation value between the template TM and the received voice signal using the equation (1) (step S5).
  • step S7 the pitch detection unit 23b advances the process to step S8, where ⁇ ⁇ slide reference value If so (step S7), the process returns to step S5.
  • step S7 the template TM having the initial time width is slid toward the past with respect to the received voice signal until the slide TM becomes the slide reference value.
  • step S8 if ⁇ ⁇ max (step S8), the process returns to step S3, and the processes of steps S3 to S8 are repeated until ⁇ ⁇ ⁇ max. Thereby, the time width of the template TM is increased as ⁇ which is the slide amount increases.
  • step S8 when ⁇ ⁇ ⁇ max (step S8), the pitch detector 23b detects a correlation peak from the correlation value calculated in step S5, and among the detected correlation peaks, the template TM and the received voice signal The slide amount of the correlation peak with the strongest correlation is identified, and the pitch is detected from the identified slide amount (step S9).
  • the correlation peak indicating the minimum correlation value indicates the strongest correlation between the template TM and the received voice signal.
  • the pitch detection unit 23b may calculate the pitch by multiplying the specified slide amount by the sampling period of the audio signal.
  • the compensation processing unit 24 extracts the received voice signal according to the pitch detected in step S9, and compensates the loss period using the received received voice signal (step S10).
  • a is set to 1 ⁇ a ⁇ 2 until the slide amount of the template TM exceeds a predetermined change reference value.
  • the value of a may be gradually decreased so as to approach 1 as the slide amount approaches the maximum slide amount ( ⁇ max).
  • the change reference value for example, the above-described slide reference value can be adopted.
  • the time width of the template TM can be set larger than the slide amount, and when the slide amount is large, the time width of the template TM can be set to a value about the slide amount. it can. Therefore, when the slide amount is small, it is possible to prevent the correlation calculation accuracy from being lowered due to the time width of the template TM becoming too small.
  • the received voice signal having a time width from the packet loss occurrence time point RT to the past is set as the template TM. Then, the set template TM is slid toward the past from the present time with respect to the received voice signal. Then, the correlation between the template TM and the received voice signal is obtained, and the pitch of the received voice signal is detected.
  • the time width of the template TM increases as the slide amount increases. Therefore, at a relatively early stage where the slide amount is small, a timing occurs when the received voice signal for one pitch almost immediately before the current time is used as the template TM. At this time, a strong correlation peak appears between the template TM and the received voice signal. On the other hand, when the slide amount increases, the time width of the template TM increases accordingly, and the template TM includes a plurality of frequency components. Therefore, it becomes impossible to obtain a stronger correlation peak as the correlation peak obtained at the above timing. Therefore, it is possible to accurately detect the pitch of the received voice signal almost immediately before the current time.
  • the fluctuation absorption processing unit JA includes a jitter buffer 30, a counting unit 31, a buffer size changing unit 32, a reception time recording unit 33, a reference value storage unit 34, a concealment processing unit 35, an output unit 36, and an observation history.
  • a holding part 37 is provided.
  • these units are realized by the DSP of the call processing unit 2 executing a fluctuation absorbing processing program in the second software.
  • the jitter buffer 30 is shared with the jitter buffer 20 of the audio data loss compensation processing unit VC.
  • the reception time recording unit 33 records the time (time stamp) when the transmission processing unit 7 receives the voice packet (received voice packet) in association with the sequence number of the received packet.
  • the jitter buffer 30 is configured by, for example, a ring buffer, and accumulates packets received by the transmission processing unit 7 in chronological order. As a result, fluctuations in the transmission delay of the voice packet transmitted via the signal trunk line Ls are absorbed. As the size of the jitter buffer 30, a size larger than a reference value described later is adopted.
  • the counting unit 31 calculates a packet count value by counting the number of accumulated packets accumulated in the jitter buffer 30 at a predetermined period (count period) that is equal to or less than a period in which voice is packetized (packetization period).
  • the packet count value calculated by the count unit 31 is held in the observation history holding unit 37.
  • the observation history holding unit 37 is composed of, for example, a volatile semiconductor memory, and holds the packet count value of the past N (N is a positive integer) calculated by the counting unit 31.
  • FIG. 16 is an explanatory diagram of packet count value calculation processing by the count unit 31. As shown in FIG. 16, the count unit 31 calculates a packet count value at the count cycle Tb.
  • the counting unit 31 sets the count value to a value obtained by ⁇ T / Ta for the packet PS received in the past in the packetization period Ta from the calculation time Tk that is the calculation timing of the packet count value, For the packet PL received before the packetization period Ta from the calculation time Tk, the packet count value is calculated by setting the count value to 1. That is, the packet count value of the packet PS decreases as the difference ⁇ T decreases as the reception time approaches the calculation time Tk.
  • the reception time since the reception time is used in calculating the packet count value, it is necessary to hold the reception time.
  • the packet PL since the reception time is not necessary for calculating the packet count value, it is not necessary to record the reception time.
  • the counting unit 31 is At time Tk + 1, the reception time of the packet received in the past in the packetization period Ta can be acquired. In this way, the capacity of the reception time recording unit 33 can be saved.
  • the buffer size changing unit 32 reads the past N packet count values of the packet count value calculated by the counting unit 31 from the observation history holding unit 37, and the nth smallest packet from the read N packet count values The count value is calculated as a representative value of the packet count value. If the calculated representative value is larger than a predetermined reference value, the packet stored in the jitter buffer 30 is deleted. If the representative value is smaller than the reference value, the jitter buffer Insert packet into 30. The reference value is stored in the reference value storage unit 34.
  • the buffer size changing unit 32 may insert a packet into the jitter buffer 30 so that the representative value is not less than the reference value and less than the reference value + 1. For example, when the representative value is 2.1 and the reference value is 4, two packets are inserted into the jitter buffer 30 so that the representative value is 4.1. In addition, when the representative value is larger than the reference value, the buffer size changing unit 32 may delete the packet from the jitter buffer 30 so that the representative value is not less than the reference value and less than the reference value + 1. For example, when the representative value is 4.2 and the reference value is 2, two packets are deleted from the jitter buffer 30 so that the representative value is 2.2.
  • n it is preferable to adopt a value rounded to an integer value by N ⁇ ⁇ .
  • the reference value a value determined in advance based on a call delay time allowed by the intercom system for collective housing in an interphone call (call using a packet transmission method) is adopted. That is, if the number of packets stored in the jitter buffer 30 is larger than the reference value, the number of packets waiting for output in the jitter buffer 30 increases, so that a call delay occurs. Therefore, as described above, when the representative value that is the nth packet count value is larger than the reference value, it is possible to prevent call delay by deleting the packet from the jitter buffer 30.
  • the packet is inserted into the jitter buffer 30.
  • the concealment processing unit 35 performs a packet loss concealment process on invalid packets (packets that do not include voice; the same applies hereinafter) inserted into the jitter buffer 30 and when the packets are depleted in the jitter buffer 30.
  • Perform packet loss concealment processing for example, the pitch of the received voice signal is detected from the received voice signal in the past from the invalid packet, and the valid packet immediately before the invalid packet (packet including voice; the same applies hereinafter).
  • the voice waveform of the section one pitch before the end is taken out, and the voice waveform obtained by repeating this voice waveform for the period of packetization period (for example, 20 msec) is generated as the received voice signal of the invalid packet. It is sufficient to adopt a technique to do this.
  • the pitch detection a method common to the pitch detection process in the audio data loss compensation process described above may be employed.
  • the output unit 36 When the number of packets stored in the jitter buffer 30 exceeds the reference value, the output unit 36 reads packets (received voice data) from the jitter buffer 30 in chronological order in synchronization with the packetization period Ta, and receives the received voice signal Output to the route.
  • the output unit 36 causes the concealment processing unit 35 to execute the packet loss concealment process, and outputs the voice data after the execution process.
  • the observation history holding unit 37 is configured by, for example, a non-volatile semiconductor memory, and holds the packet count value of the past N times calculated by the counting unit 31.
  • FIG. 17 is a diagram for explaining the role of the jitter buffer 30.
  • a packet including a received voice signal is transmitted from the other party's call terminal (lobby interphone LI, management room device X, or other dwelling unit) at a packetization period (20 msec in the illustrated example).
  • FIG. 17 shows a situation in which 8 packets with numbers 1 to 8 (sequence numbers) are transmitted at intervals of 20 msec.
  • the packet transmitted from the other party's call terminal is received by the dwelling unit A via the signal trunk line Ls.
  • voice packets transmitted from the partner telephone terminal at the packetization period reach the dwelling unit A.
  • the time until the transmission time (transmission delay) is greatly different for each voice packet, and so-called transmission delay fluctuation occurs. Therefore, the reception intervals of voice packets by the dwelling unit A are unequal intervals.
  • a jitter buffer 30 is provided to absorb this transmission delay fluctuation.
  • the buffer size of the jitter buffer 30 is three packets.
  • the output unit 36 starts the output by performing the decoding process and the D / A conversion process on the first packet at the time T1 when the delay time Td has elapsed since the reception of the first packet. .
  • the jitter buffer 30 stores the second packet at time T2, which is the output time of the second packet after 20 msec from time T1. Therefore, the output unit 36 can output the second packet at time T2.
  • the third packet since the third packet has an extremely large transmission delay, it does not reach the dwelling unit A at the time T3 and the jitter buffer 30 is depleted. For this reason, the output unit 36 cannot output the third packet at time T3, and sound loss (voice data loss) occurs.
  • the third to seventh packets reach the dwelling unit A continuously in a short time after the congestion is eliminated.
  • the jitter buffer 30 includes the fifth and sixth pieces. However, since the jitter buffer 30 is empty, the seventh packet is not discarded and stored in the jitter buffer 30. Therefore, the seventh packet is output from the output unit 36 at time T7.
  • the buffer size of the jitter buffer 30 is set to a fixed size, the transmission delay fluctuation must be sufficiently longer than the assumed transmission delay fluctuation. Moreover, if the buffer size of the jitter buffer 30 is made sufficiently long and the delay time Td is made sufficiently long, the occurrence of sound omission can be prevented, but if the delay time Td is long, the jitter buffer 30 waits for output. Packets increase and call delay occurs.
  • FIG. 18 shows an example of a transmission delay characteristic graph showing the relationship between the transmission delay and the frequency of occurrence of the transmission delay.
  • the vertical axis indicates the occurrence frequency
  • the horizontal axis indicates the transmission delay.
  • FIG. 19 is a diagram for explaining an optimum buffer size of the jitter buffer 30.
  • dmin represents the minimum transmission delay
  • dmax represents the maximum transmission delay.
  • the transmission delay of the (k-1) th packet is dmin
  • the transmission delay of the kth packet is d
  • the transmission delay of the (k + 1) th packet is dmax.
  • the optimum output waiting time by the output unit 36 is as follows. i) Packets received with dmax are output immediately. ii) Wait for dmax-dmin before outputting packets that arrive at dmin. iii) The packet arrived at d is output after waiting dmax-d.
  • the buffer size buf of the jitter buffer 30 may be set to buf ⁇ dmax ⁇ dmin.
  • dmax of the transmission delay characteristic becomes extremely large, that is, FIG. If the tail at the right end of the graph becomes extremely long, the buffer size buf will increase.
  • the frequency of occurrence decreases as the transmission delay increases, in order to observe the true dmax, it is necessary to observe the transmission delay of a huge number of packets. For this reason, in the graph of FIG. 18, not true dmax but a value obtained by rounding down the upper few percent of the distribution of transmission characteristics is regarded as dmax. In this case, when a transmission delay exceeding the value considered as dmax occurs, packet depletion occurs.
  • FIG. 20 is a flowchart showing the fluctuation absorption processing of the fluctuation absorption processing unit JA.
  • the counting unit 31 determines whether or not the packet count value calculation timing comes after the count period Tb has elapsed since the packet count value calculation timing was calculated last time. If the counting unit 31 determines that the packet count value calculation timing has come (YES in step S1), the counting unit 31 counts the number of accumulated packets that are currently accumulated in the jitter buffer 30 (step S2). On the other hand, when determining that the packet count value calculation timing has not come (NO in step S1), the counting unit 31 returns the process to step S1.
  • the count unit 31 executes a packet count value calculation process to calculate a packet count value (step S3).
  • FIG. 21 is a flowchart showing details of packet count value calculation processing.
  • the count unit 31 specifies the current time as the packet count value calculation time (step S21).
  • the control unit 1 of the dwelling unit A has a clock function, the calculated time can be specified using the clock function.
  • the counting unit 31 specifies the reception time of each packet received in the past in the packetization period Ta from the calculation time Tk as shown in FIG. 16 among the packets stored in the jitter buffer 30. (Step S22). In this case, the count unit 31 specifies the reception time of each packet by specifying the sequence number associated with the reception time recorded in the reception time recording unit 33.
  • the counting unit 31 calculates a difference ⁇ T between the calculation time Tk and the reception time for each packet received in the past in the packetization period Ta (step S23).
  • the counting unit 31 calculates ⁇ T / Ta for each packet received in the past in the packetization period Ta, and sets this ⁇ T / Ta as the count value of each packet (step S24).
  • the count unit 31 sets the count value to 1 for packets received from the calculation time Tk before the packetization period Ta among the packets stored in the jitter buffer 30 (step S25). ).
  • the count unit 31 calculates the packet count value by counting the number of packets stored in the jitter buffer 30 using the count value set in steps S24 and S25 (step S26). For example, from the calculation time Tk, the number of packets received before the packetization cycle Ta in the past is 1, and from the calculation time Tk, the number of packets received in the past within the packetization cycle Ta is two. When the reception time of each packet is Ti and Tj, the packet count value is 1+ (Tk ⁇ Ti) / Ta + (Tk ⁇ Tj) / Ta.
  • the counting unit 31 deletes the reception time from the reception time recording unit 33 for packets received in the past and before Ta-Tb from the calculation time Tk (step S27).
  • the counting unit 31 causes the observation history holding unit 37 to hold the packet count value at the calculation time Tk. In this case, the count unit 31 deletes the oldest packet count value from the observation history holding unit 37 so that the number of packet count values held in the observation history holding unit 37 is N.
  • the buffer size changing unit 32 specifies the nth smallest packet count value among the N packet count values stored in the observation history holding unit 37 as a representative value (step S5).
  • FIG. 22 is a schematic diagram showing the relationship between the packet count value and the calculation time of the packet count value.
  • the vertical axis shows the packet count value
  • the horizontal axis shows the calculation time of the packet count value.
  • the buffer size changing unit 32 determines whether or not the representative value is greater than the reference value. If representative value ⁇ reference value + 1 (YES in step S6), the representative value is greater than or equal to the reference value and the reference value + The number of packets that is less than 1 is deleted from the jitter buffer 30 (step S7).
  • the buffer size changing unit 32 subtracts the number of packets deleted in step S7 from each of the N packet count values held in the observation history holding unit 37, and updates the N packet count values.
  • the observation history is updated (step S8). For example, assuming that the number of deleted packets is 1, 1 is subtracted from all N packet count values. Thereby, the fact that the packet is deleted from the jitter buffer 30 is reflected in the observation history.
  • step S6 when the representative value is less than the reference value +1 (NO in step S6) and the representative value is equal to or larger than the reference value (NO in step S9), the buffer size changing unit 32 is configured to use the jitter buffer 30. The packet is not deleted or inserted in step S10.
  • the buffer size changing unit 32 inserts into the jitter buffer 30 a number of packets whose representative value is greater than or equal to the reference value and less than the reference value + 1 (step S11). ).
  • the buffer size changing unit 32 adds the number of packets inserted in step S11 to each of the N packet count values held in the observation history holding unit 37, and updates the N packet count values. Then, the observation history is updated (step S12). For example, if the number of inserted packets is 1, 1 is added to all N packet count values. Thereby, the fact that the packet is inserted into the jitter buffer 30 is reflected in the observation history.
  • step S8 the process returns to step S1, and when the next packet count value calculation time comes, the processes after step S2 are executed.
  • FIG. 23A is a schematic diagram showing processing at the time of packet insertion by the buffer size changing unit 32
  • FIG. 23B is a schematic diagram showing processing at the time of packet deletion by the buffer size changing unit 32.
  • the buffer size changing unit 32 inserts an invalid packet between the fourth packet and the fifth packet, which are valid packets.
  • the buffer size changing unit 32 overlaps the fourth packet and the fifth packet, which are valid packets, so that two packet lengths become one packet length. Has been deleted.
  • the packet count value is calculated from the number of packets stored in the jitter buffer 30, and the nth smallest packet count value is specified as the representative value among the past N packet count values. If the identified representative value is larger than the reference value, the packet is deleted from the jitter buffer 30. For this reason, the number of packets stored in the jitter buffer 30 tends to be larger than the reference value from the past history of the packet count value, and if output delay occurs, the packet is deleted from the jitter buffer 30 and the output delay is reduced. Is done.
  • the packet is inserted into the jitter buffer 30 Therefore, it is possible to prevent packet depletion.
  • the count unit 31 sets the count value for the latest packet to a value obtained by the difference ⁇ T / Ta between the calculation time Tk and the reception time of the latest packet, and sets the count value to 1 for other packets. To calculate a packet count value.
  • the counting unit 31 has received the packet received in the packetization period Ta in the jitter buffer 30 when the packets received in the past in the packetization period Ta have been accumulated from the calculation time Tk.
  • the packet PS having the latest reception time is identified from the packets, and the count value of the latest packet PS is set to ⁇ T / Ta.
  • the count unit 31 uniformly sets the count value to 1 for the packets PL1 and PL2 other than the latest packet PS among the packets stored in the jitter buffer 30.
  • the packet count value calculation process is performed. After the completion, the reception record recorded in the reception time recording unit 33 is deleted.
  • step S31, S33, S34, and S36 in FIG. 25 are the same as steps S21, S23, S24, and S26 in FIG.
  • the counting unit 31 specifies the reception time of the latest packet among the packets received in the past in the packetization period Ta from the calculation time Tk in the jitter buffer 30. Further, the count unit 31 uniformly sets the count value to 1 for packets other than the latest packet from the calculation time Tk (step S35).
  • step S37 the count unit 31 deletes the latest packet reception time from the reception time recording unit 33.
  • the packet count value is calculated by the above-described method, it is only necessary to record the reception time for only the latest packet, so that the capacity of the reception time recording unit 33 can be further saved.
  • the fluctuation absorption processing unit JA determines whether or not a spike delay has occurred. If a spike delay has occurred, the window width of the past packet count value to be referred to is shortened, and packets within the shortened window width are detected. It is preferable to calculate the representative value from the count value.
  • the count unit 31 stores the calculated packet count value in the observation history holding unit 37 in association with an index for indicating the time-series order of each packet count value. Specifically, since the observation history holding unit 37 holds the packet count value of the past N times, the count unit 31 has an index of N for the latest packet count value and an index of 1 for the oldest packet count value. Thus, an index is added to the past N packet count values so that the index increases as the calculation time becomes new.
  • the counting unit 31 determines the presence or absence of a spike delay based on the past N packet count values held in the observation history holding unit 37, and determines that the spike delay has occurred. From the packet count value of the number of times, the packet count value of the past M (M ⁇ N) times is extracted.
  • the counting unit 31 determines the presence or absence of a spike delay as follows.
  • FIG. 26 is a graph for explaining the determination processing for the presence or absence of spike delay.
  • the vertical axis indicates the packet count value
  • the horizontal axis indicates the index.
  • N 100.
  • the count unit 31 specifies a packet count value that is equal to or less than the reference value.
  • the packet count values at points PP1 to PP6 are below the reference value.
  • the count unit 31 specifies the smallest index, that is, the oldest point, and the largest index, that is, the latest point among packet count values equal to or less than the reference value.
  • the counting unit 31 specifies the points PP1 and PP6.
  • the count unit 31 obtains a difference ⁇ I between the minimum index and the maximum index.
  • the counting unit 31 determines that a spike delay has occurred if the difference ⁇ I is smaller than a predetermined threshold, and determines that no spike delay has occurred if the difference ⁇ I is larger than the threshold.
  • FIG. 27 is a graph showing the relationship between the packet count value and the index when spike delay occurs.
  • the vertical axis represents the packet count value
  • the horizontal axis represents the index.
  • the packet count values at points PP1 to PP5 are equal to or less than the reference value.
  • the point PP1 has the smallest index
  • the point PP5 has the largest index.
  • the difference ⁇ I between the index of the point PP1 and the index of the point PP5 is smaller than the threshold value. Therefore, the count unit 31 determines that a spike delay has occurred.
  • the count unit 31 determines that the spike delay has occurred as shown in FIG. 27, the count unit 31 extracts the past M packet count values from the calculation time Tk.
  • the buffer size changing unit 32 calculates the m-th smallest packet count value among the past M packet count values as a representative value. Thereafter, the buffer size changing unit 32 compares the representative value with the reference value, and inserts or deletes the packet in the jitter buffer 30.
  • m a value obtained by rounding M ⁇ ⁇ with an integer can be adopted.
  • the window width of the past packet count value to be referred to is narrowed, and a packet is inserted into or deleted from the jitter buffer 30. Therefore, the representative value can be calculated in such a manner that spike delays that rarely occur are eliminated.
  • the packet count value when the number of accumulated packets of 0 occurs continuously, it is preferable to calculate the packet count value as follows.
  • the count unit 31 sets, as the packet count value, a negative value that increases in absolute value as the number of consecutive 0 stored packet numbers increases when the number of 0 stored packet numbers continues. calculate.
  • FIG. 28A and 28B are diagrams for explaining the processing of the counting unit 31.
  • FIG. 28A packets are received immediately after the packet count value calculation times Tk-4, Tk-3, Tk-2, and Tk-1 in each section of the count cycle Tb.
  • the output unit 36 receives the packet from the jitter buffer 30 in each section until the next packet count value calculation time Tk-3, Tk-2, Tk-1, Tk elapses. Reading (received voice data). For example, a packet received immediately after the calculation time Tk-4 is read out until the next calculation time Tk-3 elapses. Therefore, at each calculation time Tk-4, Tk-3, Tk-2, Tk-1, Tk, the number of stored packets in the jitter buffer 30 is zero. Therefore, the count unit 31 calculates the packet count value as 0 at each of the calculation times Tk-4, Tk-3, Tk-2, Tk-1, and Tk.
  • FIG. 28A and 28B the situation of the signal trunk line Ls is greatly different. That is, in FIG. 28A, the packet periodically reaches the dwelling unit A, and the output unit 36 can continuously output the packet. However, in FIG. Therefore, the output unit 36 cannot output continuously.
  • the counting unit 31 performs the following processing. First, the difference between the calculated time (current time) and the latest packet reception time is compared with the count cycle Tb. If the difference is smaller than the count cycle Tb, it is determined that the situation in FIG. On the other hand, if the difference is greater than the count cycle Tb, it is determined that no packet has been received since the previous calculation time, that is, the situation in FIG. 28B, and the following processing is performed. That is, as shown in FIG. 28B, the number of accumulated packets is 0 at the calculation time Tk-3, and the number of accumulated packets is 0 at the calculation time Tk-2. The number of consecutive numbers is one. In this case, the count unit 31 calculates 0 as the packet count value at the calculation time Tk-2.
  • the count unit 31 calculates ⁇ 1, which is a value obtained by multiplying the value obtained by subtracting 1 from 2 that is the number of consecutive times by ⁇ 1, as the packet count value at the calculation time Tk ⁇ 1.
  • the count unit 23 calculates -2, which is a value obtained by multiplying the value obtained by subtracting 1 from 3 which is the number of consecutive times, and -1. Calculated as the packet count value at Tk. Therefore, the counting unit 31 calculates (number of consecutive times ⁇ 1) ⁇ ( ⁇ 1) as the packet count value.
  • the packet can be received periodically as shown in FIG. 28A
  • the packet can be received periodically as shown in FIG. 28B when the number of stored packets happens to be zero at the calculation time.
  • the packet count value can be calculated in consideration of the difference from the case where the packet is not received. Therefore, in the case of FIG. 28B, packets are less likely to be deleted from the jitter buffer 30 than in the case of FIG. 28A.
  • the buffer size changing unit 32 deletes one packet from the jitter buffer 30, if there are two or more valid packets including voice in succession, two consecutive consecutive packets located in the middle of these consecutive valid packets will be described. Two valid packets are overlapped and deleted.
  • FIG. 29A, 29B, and 29C are explanatory diagrams of processing in which the buffer size changing unit 32 deletes one packet by overlap addition, FIG. 29A shows the jitter buffer 30 before deletion, and FIG. 29B shows jitter after deletion. A buffer 30 is shown.
  • the read pointer RP indicates the start address of the jitter buffer 30 having a ring buffer structure
  • the write pointer WP indicates the end address of the jitter buffer 30.
  • each ⁇ indicates one packet, and the numbers in ⁇ indicate the time-series order of the packets.
  • a white wrinkle indicates an invalid packet
  • a gray wrinkle indicates a valid packet.
  • the packets are combined into one packet by addition, and one packet is deleted.
  • one packet can be deleted by overlap addition, but packet loss concealment processing is performed when overlap addition is performed in a section where there are many consecutive valid packets. It is possible to reduce voice deterioration when
  • overlap addition using triangular window functions RF1 and RF2 can be adopted as shown in FIG. 29C.
  • the buffer size changing unit 32 performs window function processing using the triangular window function RF1 on the audio signal of the fifth packet, and applies the triangular window to the audio signal of the sixth packet.
  • the window function processing using the function RF2 is performed, the two audio signals after the window function processing are added to generate one audio signal, and this is packetized into one to perform overlap addition.
  • the triangular window function RF1 a linear function having a time width of 20 msec, a maximum value of 1 and a minimum value of 0 and decreasing in value as time passes can be adopted.
  • the triangular window function RF2 a linear function having a time width of 20 msec, a maximum value of 1 and a minimum value of 0 and increasing in value as time passes can be adopted.
  • the buffer size changing unit 32 deletes the invalid packet if there is an invalid packet inserted in the past.
  • FIG. 30A and 30B are explanatory diagrams of processing in which the buffer size changing unit 32 deletes one invalid packet.
  • FIG. 30A shows the jitter buffer 30 before deletion
  • FIG. 30B shows the jitter buffer 30 after deletion. Yes.
  • the third and fourth packets are invalid packets. Therefore, the buffer size changing unit 32 deletes one packet by deleting either the third or the fourth packet.
  • the buffer size changing unit 32 preferentially extracts invalid packets in a continuous area, and randomly selects one invalid packet from the extracted invalid packets. A packet may be selected and deleted.
  • the buffer size changing unit 32 inserts an invalid packet between these two valid packets if there are two consecutive valid packets.
  • FIG. 31A and 31B are explanatory diagrams of processing in which the buffer size changing unit 32 inserts one packet.
  • FIG. 31A shows the jitter buffer 30 before insertion
  • FIG. 31B shows the jitter buffer 30 after insertion. .
  • one invalid packet is inserted between the fifth valid packet and the sixth valid packet. This is because inserting one invalid packet between the fifth valid packet and the sixth valid packet increases the number of consecutive valid packets.
  • the buffer size changing unit 32 inserts invalid packets in the middle of a section where the number of consecutive valid packets is large.
  • the buffer size changing unit 32 has a predetermined upper limit value for the number of packets that can be inserted or deleted at a time.
  • 32A and 32B are diagrams for explaining processing when five packets are inserted into the jitter buffer 30 at once, FIG. 32A shows the jitter buffer 30 before insertion, and FIG. 32B shows the jitter buffer after insertion. 30 is shown.
  • 32A and 32B five invalid packets are inserted between the first valid packet and the second valid packet. In this case, since there are continuous invalid packets, there is a risk that voice deterioration will increase. Therefore, an upper limit is set for the number of invalid packets inserted.
  • “at once” refers to one process executed when the above-described count cycle Tb has been reached.
  • the upper limit value is set to 3 in FIG. 32A, even if it is necessary to insert five invalid packets, only three invalid packets are inserted.
  • the buffer size changing unit 32 receives another valid packet corresponding to the deleted invalid packet. Replace the packet with the received valid packet.
  • FIG. 33A, 33B, and 33C are diagrams for explaining processing when a valid packet corresponding to a deleted invalid packet is received after deleting the invalid packet.
  • FIG. 33A shows the jitter buffer 30 before deletion
  • FIG. 33B shows the jitter buffer 30 after deletion
  • FIG. 33C shows the jitter buffer 30 after replacement.
  • the third invalid packet has been deleted. Thereafter, as shown in FIG. 33C, the third valid packet corresponding to the third invalid packet is received.
  • the buffer size changing unit 32 replaces the fourth invalid packet with the received third valid packet. As a result, the third valid packet can be restored, and voice deterioration can be reduced.
  • the buffer size changing unit 32 determines whether or not invalid packets corresponding to the accumulated packet are accumulated in the jitter buffer 30. Then, if the corresponding invalid packet is accumulated in the jitter buffer 30, the buffer size changing unit 32 determines whether the invalid packet is stored next to the invalid packet, and the invalid packet is stored. If it is, the next invalid packet is deleted, and the received valid packet is inserted into the deleted location, so that the next invalid packet and the received valid packet are exchanged.
  • the buffer size changing unit 32 may determine that a valid packet corresponding to an invalid packet has been received when a packet having the same sequence number as that of the invalid packet is accumulated in the jitter buffer 30.
  • the buffer size changing unit 32 causes the concealment processing unit 35 to execute a packet loss concealment process using the previous valid packet, thereby concealing.
  • a processed packet may be generated and inserted into the jitter buffer 30.
  • FIG. 34A and 34B are diagrams for explaining processing when the buffer size changing unit 32 inserts a concealed packet in place of an invalid packet into the jitter buffer 30, and FIG. 34A shows the jitter buffer 30 before insertion. FIG. 34B shows the jitter buffer 30 after insertion.
  • a concealed packet is inserted between the third valid packet and the fourth valid packet.
  • the output unit 36 reads a packet (voice data) from the jitter buffer 30, it is not necessary to execute the packet loss concealment process, and the processing delay of the packet loss concealment process at the time of output can be reduced.
  • the buffer size changing unit 32 preferably inserts an invalid packet between two consecutive packets including vowel sounds. Thereby, the voice generated by executing the packet loss concealment process on the inserted invalid packet is continuously connected to the voice included in the preceding and succeeding packets, and voice deterioration can be reduced.
  • FIG. 35 is a flowchart showing the deletion process by the buffer size changing unit 32.
  • step S51 the buffer size changing unit 32 determines whether or not the number of packet deletion requests is equal to or less than a predetermined maximum packet deletion number (upper limit), and the number of deletion requests is equal to or less than the upper limit value. If so (YES in step S51), the deletion count value DN is set to the number of deletion requests (step S52). On the other hand, when the number of deletion requests is larger than the upper limit value (NO in step S51), the deletion count value DN is set to the upper limit value (step S53).
  • a predetermined maximum packet deletion number upper limit
  • the buffer size changing unit 32 has a maximum continuous number that is twice or more the deletion count value DN. It is determined whether or not (step S55). Here, it is determined whether or not the maximum continuous number is twice the deletion count value DN. When one packet is deleted, two packets are overlap-added. This is because twice the value DN is required.
  • the buffer size changing unit 32 determines that the maximum number of consecutive times is twice or more the deletion count value DN (YES in step S55)
  • the buffer size changing unit 32 deletes the packet corresponding to the deletion count value DN by overlap addition
  • the delete count value DN is updated by subtracting the number of deleted packets from the value DN (step S58).
  • step S55 when the maximum continuous number is less than twice the deletion count value DN in step S55 (NO in step S55), the buffer size changing unit 32 deletes the deleteable packet by overlap addition, and deletes the deletion count value. The number of deleted packets is subtracted from the DN, the deletion count value DN is updated (step S56), and the process returns to step S54.
  • step S54 if the maximum number of consecutive valid packets is 1 or less (1 or less in step S54), invalid packets are deleted, and the deleted count value DN is subtracted from the deleted count value DN. Is updated (step S57).
  • step S59 the buffer size changing unit 32 determines whether or not the deletion count value DN is 0. If the deletion count value DN is 0 (YES in step S59), the process ends.
  • step S59 if the deletion count value DN is not 0 (NO in step S59), the buffer size changing unit 32 deletes the effective packet and processes it if there is a valid packet (YES in step S60). Is finished (step S61). In this case, since the valid packet to be deleted is not continuous with other valid packets, it is simply deleted regardless of overlap addition. On the other hand, if there is no valid packet (NO in step S60), the process is terminated as it is.
  • FIG. 36 is a flowchart showing the insertion processing by the buffer size changing unit 32.
  • step S71 the buffer size changing unit 32 determines whether or not the number of packet insertion requests is equal to or less than a predetermined maximum packet insertion number (upper limit), and the number of deletion requests is equal to or less than the maximum number of insertions. If there is (YES in step S71), the number of insertions is set to the number of insertion requests (step S72). On the other hand, if the number of insertion requests is larger than the maximum number of insertions (NO in step S71), the number of insertions is set to the maximum number of insertions (step S73).
  • a predetermined maximum packet insertion number upper limit
  • Step S75 the process is terminated.
  • the buffer size changing unit 32 inserts invalid packets by the number of insertions in the middle of the continuous valid packet section. Is inserted (step S76), and the process is terminated.
  • the buffer size changing unit 32 inserts invalid packets for the number of insertions immediately after the valid packets (step S77). ), The process ends.
  • one packet is deleted from the jitter buffer 30
  • one packet is generated by overlapping and adding two packets located in the middle of a section where two or more valid packets are continuous. Therefore, voice quality degradation can be reduced.
  • packet loss concealment processing performed by the concealment processing unit 35 of the fluctuation absorption processing unit JA can be replaced by the voice data loss compensation processing by the voice data loss compensation processing unit VC described above.
  • the call processing unit 2 executes the first software when the other party's call terminal is an analog transmission method, and the call processing unit 2 is the case when the other terminal is a packet transmission method.
  • the second software By executing the second software, call processing suitable for each transmission method can be selectively executed.
  • the packet transmission method is used for voice transmission via the signal trunk line Ls
  • the analog transmission method is used for voice transmission in the vicinity of the house not via the signal trunk line Ls. It is possible to improve the call quality.
  • Embodiment 2 Hereinafter, the second embodiment of the present invention will be described in detail with reference to FIGS.
  • the same elements as those in the intercom system for multi-dwelling houses of Embodiment 1 are assigned to the same elements, and the description thereof is omitted.
  • both the voice data loss compensation process and the speech speed conversion process in the first embodiment described above use the pitch of the voice, it is necessary to perform a pitch detection process for detecting the pitch of the voice.
  • the audio data loss compensation processing program and the speech speed conversion processing program are each equipped with a pitch detection processing program (program module)
  • a memory for loading the program is wasted. Therefore, in this embodiment, the pitch detection processing program for detecting the pitch of the speech is made independent of the speech data missing compensation processing and the speech speed conversion processing program, and is detected by the pitch detection processing in the speech data missing compensation processing and speech speed conversion processing. This is characterized in that the same pitch is shared, and this can reduce wasteful consumption of memory.
  • the speech speed conversion processing unit SE of the present embodiment may execute voice quality conversion processing other than speech speed conversion processing, speech segment detection processing, speech enhancement processing, speaker discrimination processing, speech recognition processing, and the like. I do not care.
  • the call processing unit 2 of the present embodiment includes an acoustic echo canceller EC1, a voice switch VS, a voice data missing detection unit 15, a pitch detection unit 16, a voice data missing compensation processing unit VC, and a speech speed conversion process.
  • Department SE is provided.
  • the audio data loss detection unit 15 detects the loss of audio data output from the transmission processing unit 7, and the audio data is lost when the audio data output from the jitter buffer of the transmission processing unit 7 is not continuous. A detection flag is set up. Note that the cause of missing audio data includes packet loss, delay, and jitter (fluctuation) associated with transmission as described in the first embodiment.
  • the pitch detection unit 16 Based on the detection flag from the audio data loss detection unit 15 and the counter inside the pitch detection unit 16, the pitch detection unit 16 outputs audio data (audio data with missing compensation or This is to detect the pitch of audio from audio data that has not been compensated for omission (the same applies hereinafter).
  • audio data audio data with missing compensation or This is to detect the pitch of audio from audio data that has not been compensated for omission (the same applies hereinafter).
  • a specific method of pitch detection for example, a method of calculating the autocorrelation of speech while changing the frame length and estimating the frame length having the highest correlation as the pitch of the speech may be used.
  • the audio data loss compensation processing unit VC detects the audio data loss based on the pitch detected by the pitch detection unit 16 when the audio data loss detection unit 15 detects the audio data loss (when the detection flag is set). To compensate.
  • the audio data loss compensation processing unit VC extracts audio data for one pitch from past audio data held in the buffer and makes up for it so that the audio is not interrupted. However, if there is no missing voice data, the voice data missing compensation processing unit VC outputs the input voice data as it is without missing compensation.
  • the speech rate conversion processing unit SE converts the speech rate of the original speech by expanding or compressing the speech data output from the speech data loss compensation processing unit VC.
  • PICOLA Pointer Interval Controlled OverLap
  • the speech speed is converted (fast or slow) by inserting or deleting waveforms in units of pitches based on a conventionally known speech speed conversion algorithm called “and Add”. These units are realized by causing a DSP (Digital Signal Processor) to execute a predetermined program.
  • DSP Digital Signal Processor
  • the voice data loss compensation processing unit VC and the speech speed conversion processing unit SE individually perform pitch detection processing, when the voice data loss compensation processing and the speech speed conversion processing are simultaneously executed in the call processing unit 2
  • the processing load increases.
  • the call processing unit 2 of the present embodiment has only one pitch detection unit 16, and both the voice data loss compensation processing unit VC and the speech rate conversion processing unit SE are a common pitch detection unit 16. The detected pitch is used. Therefore, when both the voice data loss compensation processing unit VC and the speech speed conversion processing unit SE share the pitch detected by the pitch detection unit 16, the voice data loss compensation processing and the speech speed conversion processing are executed simultaneously. An increase in processing load (DSP processing load on the DSP) can be suppressed.
  • the pitch detection unit 16 in the present embodiment counts a predetermined detection cycle Tx and repeatedly detects the pitch in synchronization with the detection cycle Tx, and the audio data loss detection unit 15 detects that audio data is missing.
  • the pitch is detected at the detection time point t1 of the missing audio data, and the detection cycle Tx is restarted from the detection time point t1. That is, when the pitch detection unit 16 repeatedly detects the pitch in synchronization with a certain detection cycle Tx, the speech speed conversion processing unit SE detects the pitch of the speech section in which the speech speed conversion process is executed and the pitch detection unit 16 detects the pitch. Therefore, the quality of speech after conversion of speech speed can be maintained. It should be noted that it is desirable to set the detection cycle Tx to a time during which the voice can be regarded as steady, for example, about 10 milliseconds.
  • the pitch detection unit 16 immediately detects the pitch regardless of the detection cycle Tx, so that the audio data loss compensation processing unit VC performs the audio data loss compensation. Quality in processing can be maintained.
  • the pitch detection unit 16 detects only a pitch in a predetermined frequency range. In other words, since the frequency of the voice waveform in a normal voice call is within the frequency range of a few hundred tens to a few hundreds of hertz, if only the pitch in the frequency range is detected, the pitch detection in the unnecessary frequency range can be performed. By not doing so, the processing load can be reduced.
  • the speech speed conversion processing unit SE detects the speech section of the speech data and converts only the speech data in the speech section. That is, the processing load in the speech speed conversion process can be reduced by not performing the speech speed conversion process in a section other than the speech section (for example, a silent section).
  • the voice data loss detection unit 15 and the pitch detection unit 16 perform a voice data loss detection process and a pitch detection process every ⁇ / 4 hours.
  • the control of the timing at which the pitch detection unit 16 executes the pitch detection process is simplified. There is.
  • the speech speed conversion processing unit SE detects that the voice data is missing. If speech speed conversion is performed using the pitch detected by the pitch detection unit 16 immediately before detection, it is possible to suppress deterioration in speech quality due to the speech speed conversion processing.
  • the speech speed conversion may be performed using the pitch detected by the pitch detection unit 16 from the voice data compensated by the unit VC. In this way, even when the speech speed conversion process is started when audio data is missing, the pitch detection unit 16 only needs to execute the pitch detection process at a constant detection cycle Tx. 16 has an advantage that the control of the timing for executing the pitch detection process becomes simple.
  • the dwelling unit A has a recording unit (not shown) that can record the audio data output from the audio data loss compensation processing unit VC.
  • a recording unit (not shown) that can record the audio data output from the audio data loss compensation processing unit VC.
  • speech speed conversion processing is performed by the speed conversion processing section SE.
  • the ease of listening is improved by performing the speech speed conversion process not only on the speech section but also on the non-speech section.
  • the speech speed conversion process is performed even for a non-speech section during a normal call, a delay due to the speech speed conversion process increases, which hinders natural conversation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Interconnected Communication Systems, Intercoms, And Interphones (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

In a dwelling unit device (A), if the call terminal at the other end of communication is an analogue transmission-type, a call processing unit (2) implements a first software; if the call terminal at the other end of communication is a packet transmission-type, the call processing unit (2) implements a second software, and thus call processing which is suited to each transmission-type can be selectively implemented.

Description

集合住宅用インターホンシステムの住戸機Intercom system dwelling unit for apartment houses
 本発明は、集合住宅用インターホンシステムに用いられ、集合住宅の各住戸に設置される住戸機に関するものである。 The present invention relates to a dwelling unit used in a dwelling unit intercom system and installed in each dwelling unit of a dwelling unit.
 従来、集合住宅の共用玄関に設置された共用部装置(ロビーインターホン)と、集合住宅の各住戸内に設置された住戸機と、各住戸の外玄関に設置されたドアホン子機とを備えた集合住宅用インターホンシステムが提供されている。共用部装置には信号幹線が接続され、この信号幹線から分岐される住戸線に各住戸機が接続されている。また、各住戸において、住戸内の住戸機と外玄関のドアホン子機とが子機接続線によって接続されている。さらに、各住戸内には、宅内接続線によって別の住戸機が接続される場合もある。但し、住戸線に接続される住戸機は住戸機親機と呼ばれ、宅内接続線によって住戸機親機に接続される住戸機は住戸機副親機と呼ばれる。なお、日本国特許公開2010-28771号公報には、信号幹線及び住戸線を介した音声伝送方式をパケット伝送方式とすることにより、共用部装置と住戸機との通話中に別の住戸機(住戸機親機)同士で通話することを可能とした集合住宅用インターホンシステムが記載されている。 Conventionally, it was equipped with a common unit device (lobby intercom) installed at the common entrance of the apartment, a dwelling unit installed inside each dwelling unit of the apartment, and a doorphone slave unit installed at the outside entrance of each dwelling unit An intercom system for collective housing is provided. A signal trunk line is connected to the shared unit, and each dwelling unit is connected to a dwelling unit branching from the signal trunk line. Moreover, in each dwelling unit, the dwelling unit in the dwelling unit and the doorphone cordless unit at the outside entrance are connected by a cordless handset connection line. Furthermore, another dwelling unit may be connected to each dwelling unit by a home connection line. However, the dwelling unit connected to the dwelling unit line is called a dwelling unit main unit, and the dwelling unit connected to the dwelling unit main unit by the in-house connection line is called a dwelling unit sub-master unit. In Japanese Patent Publication No. 2010-28771, the voice transmission method via the signal trunk line and the dwelling unit line is a packet transmission method, so that another dwelling unit ( A dwelling unit intercom system that enables calls between two dwelling units) is described.
 ところで、住戸機においては様々な通話処理、例えば、ハンズフリー通話(拡声通話)のための通話方向切換処理やエコー抑圧処理などが行われている。さらに、上述の先行文献に記載の従来例のように、共用部装置と複数の住戸機とをデジタル通信が可能なものとし、共用部装置と各住戸機とを接続する信号幹線及び住戸線にデジタルデータを伝送する場合に、音声をパケット伝送するものにおいては、通話品質を向上するために、パケット伝送に伴うパケット損失や遅延及び揺らぎ(ジッタ)に起因した音声の欠落を補償する通話処理が必要となる。 By the way, in a dwelling unit, various call processes, for example, a call direction switching process and an echo suppression process for a hands-free call (amplified call) are performed. Furthermore, as in the conventional example described in the above-mentioned prior document, the common unit device and the plurality of dwelling units can be digitally communicated, and the signal trunk line and the dwelling unit line connecting the common unit device and each dwelling unit are used. In the case of transmitting voice data in the case of transmitting digital data, in order to improve call quality, call processing that compensates for voice loss due to packet loss, delay and fluctuation (jitter) accompanying packet transmission is performed. Necessary.
 一方、ドアホン子機や住戸機副親機には従来の安価なもの、つまりアナログ伝送方式で音声を伝送するものが使用される場合がある。この場合、住戸機(住戸機親機)とドアホン子機、あるいは住戸機親機と住戸機副親機の間の音声伝送方式にはアナログ伝送方式が採用される。アナログ伝送方式においてもハンズフリー通話(拡声通話)のための通話方向切換処理やエコー抑圧処理などは必要であるが、上記のように信号幹線を介してデジタルデータを伝送する場合を考えてみると、パケット伝送方式で必須となる音声欠落補償処理はアナログ伝送方式には不要である。 On the other hand, there are cases where a conventional inexpensive device, that is, a device that transmits voice by an analog transmission method, is used for the door phone slave unit or the dwelling unit secondary master unit. In this case, an analog transmission method is adopted as a voice transmission method between the dwelling unit (dwelling unit main unit) and the door phone slave unit or between the dwelling unit main unit and the dwelling unit sub-master unit. Even in the analog transmission system, it is necessary to perform a call direction switching process and an echo suppression process for hands-free call (speech call), but consider the case where digital data is transmitted through the signal trunk line as described above. The voice loss compensation process essential for the packet transmission method is not necessary for the analog transmission method.
 ここで、住戸機(住戸機親機)では、アナログ伝送方式とパケット伝送方式の両方式に対応した通話処理を実行しなければならないが、これらの通話処理を別々のハードウェア(通話処理用の回路)で実現すると回路構成の複雑化とコスト上昇を招くという問題がある。 Here, in the dwelling unit (dwelling unit base unit), it is necessary to execute call processing corresponding to both the analog transmission method and the packet transmission method, but these call processing is performed by separate hardware (for call processing). When implemented with a circuit), there is a problem that the circuit configuration becomes complicated and the cost increases.
 そこで、本発明の目的は、回路構成の複雑化とコスト上昇を抑えつつ、信号幹線経由の音声伝送にはパケット伝送方式を用いるとともに信号幹線を経由しない宅内近傍の音声伝送にはアナログ伝送方式を用いることを可能とし且つ通話品質の向上を図ることができる集合住宅用インターホンシステムの住戸機を提供することにある。 Therefore, an object of the present invention is to use a packet transmission system for voice transmission via a signal trunk line and an analog transmission system for voice transmission in the vicinity of a home not via a signal trunk line, while suppressing the complexity and cost increase of the circuit configuration. An object of the present invention is to provide a dwelling unit for an apartment intercom system that can be used and can improve call quality.
 本発明の集合住宅用インターホンシステムの住戸機は、集合住宅の共用玄関に設置される共用部装置と、当該集合住宅の各住戸内に設置される住戸機と、前記集合住宅の外玄関に設置されるドアホン子機と、前記共用部装置に接続された信号幹線と、当該信号幹線から分岐されて前記各住戸機に接続される住戸線と、前記住戸機とドアホン子機を接続する子器接続線とを有する。前記共用部装置と前記住戸機の間、並びに前記住戸機同士の間では前記信号幹線及び住戸線を介したパケット伝送方式によって通話音声が伝送され、前記住戸機と前記ドアホン子機との間では前記子機接続線を介してアナログ伝送方式によって通話音声が伝送される。マイクロホン及びスピーカと、通話用の音声データが含まれる音声パケット及び呼制御用の制御データが含まれる制御パケットを前記住戸線並びに前記信号幹線を介して伝送する伝送処理部と、前記子機接続線を介してアナログの音声信号を伝送するアナログ信号伝送部と、前記マイクロホンから出力されるアナログの音声信号を音声データに変換し、音声データをアナログの音声信号に変換して前記スピーカに出力する第1の変換処理部と、前記アナログ信号伝送部で受信するアナログの音声信号を音声データに変換し、音声データをアナログの音声信号に変換して前記アナログ信号伝送部に出力する第2の変換処理部と、音声データに対して所定の通話処理を行う通話処理部と、前記ドアホン子機からの呼出を検出するドアホン呼出検出部と、アナログ伝送方式で伝送される音声データに対する通話処理用の第1のソフトウェアとパケット伝送方式で伝送される音声データに対する通話処理用の第2のソフトウェアを記憶する記憶部と、前記通話処理部に対して通話処理の実行を指示する制御部とを備える。本発明の第1の特徴において、当該制御部は、前記ドアホン呼出検出部が前記呼出を検出した場合は前記第1のソフトウェアを実行するように前記通話処理部に指示し、前記共用部装置若しくは住戸機から呼制御用の制御データを受信した場合は前記第2のソフトウェアを実行するように前記通話処理部に指示する。この発明では、相手の通話端末がアナログ伝送方式の場合には前記通話処理部が前記第1のソフトウェアを実行し、パケット伝送方式の場合には前記通話処理部が前記第2のソフトウェアを実行するので、回路構成の複雑化とコスト上昇を抑えつつ、信号幹線経由の音声伝送にはパケット伝送方式を用いるとともに信号幹線を経由しない宅内近傍の音声伝送にはアナログ伝送方式を用いることを可能とし且つ通話品質の向上を図ることができる。 The dwelling unit of the intercom system for collective housing of the present invention is a common unit device installed in the common entrance of the collective housing, a dwelling unit installed in each dwelling unit of the collective housing, and installed in the exterior entrance of the collective housing Doorphone slave unit, signal trunk line connected to the common unit, a dwelling unit branching from the signal trunk line and connected to each dwelling unit, and a slave unit connecting the dwelling unit and the doorphone slave unit A connecting line. Call voice is transmitted between the shared device and the dwelling unit, and between the dwelling units by the packet transmission method via the signal trunk line and the dwelling unit line, and between the dwelling unit and the doorphone slave unit. Call voice is transmitted by an analog transmission method through the slave unit connection line. A microphone and a speaker; a transmission processing unit that transmits a voice packet including voice data for calling and a control packet including control data for call control via the dwelling unit line and the signal trunk line; and the slave unit connection line An analog signal transmission unit for transmitting an analog audio signal via the first, an analog audio signal output from the microphone is converted into audio data, and the audio data is converted into an analog audio signal and output to the speaker. 1 conversion processing unit and a second conversion process for converting an analog audio signal received by the analog signal transmission unit into audio data, converting the audio data into an analog audio signal, and outputting the analog audio signal to the analog signal transmission unit Unit, a call processing unit that performs predetermined call processing on voice data, and a door phone call detection that detects a call from the door phone slave unit A storage unit that stores first software for speech processing for voice data transmitted in an analog transmission system and second software for speech processing for speech data transmitted in a packet transmission system; And a control unit for instructing execution of call processing. In the first feature of the present invention, the control unit instructs the call processing unit to execute the first software when the door phone call detection unit detects the call, and the shared unit device or When the control data for call control is received from the dwelling unit, the call processing unit is instructed to execute the second software. In the present invention, the call processing unit executes the first software when the other party's call terminal is an analog transmission system, and the call processing unit executes the second software when the other terminal is a packet transmission system. Therefore, it is possible to use a packet transmission system for voice transmission via the signal trunk line and an analog transmission system for voice transmission in the vicinity of the house not via the signal trunk line, while suppressing complexity of the circuit configuration and cost increase. The call quality can be improved.
 一実施形態において、前記第2のソフトウェアは、前記マイクロホンとスピーカの音響結合によって生じる音響エコーを抑圧する音響エコー抑圧処理のプログラムと、前記音響エコー抑圧処理では抑圧しきれない残留エコーを抑圧する残留エコー抑圧処理のプログラムとを含むことが好ましい。この発明では、前記第2のソフトウェアが音響エコー抑圧処理のプログラムと、残留エコー抑圧処理のプログラムとを含むので、パケット伝送方式における通話品質をより向上させることができる。 In one embodiment, the second software includes a program for acoustic echo suppression processing for suppressing acoustic echo generated by acoustic coupling between the microphone and a speaker, and a residual for suppressing residual echo that cannot be suppressed by the acoustic echo suppression processing. And an echo suppression processing program. In the present invention, since the second software includes the program for acoustic echo suppression processing and the program for residual echo suppression processing, the call quality in the packet transmission method can be further improved.
 一実施形態において、前記第2のソフトウェアは、前記伝送処理部における伝送遅延の揺らぎを吸収する揺らぎ吸収処理のプログラムを含むことが好ましい。この発明では、前記第2のソフトウェアが揺らぎ吸収処理のプログラムを含むので、パケット伝送方式における通話品質をより向上させることができる。 In one embodiment, it is preferable that the second software includes a fluctuation absorption processing program for absorbing fluctuations in transmission delay in the transmission processing unit. In the present invention, since the second software includes a fluctuation absorption processing program, the call quality in the packet transmission method can be further improved.
 一実施形態において、前記伝送処理部で受信した前記音声パケットに含まれている音声データを蓄積する揺らぎ吸収用バッファを備える。前記揺らぎ吸収処理プログラムは、前記音声パケットのパケット化周期よりも長くない周期で前記揺らぎ吸収用バッファに蓄積されている音声データのパケット数をカウントしてパケットカウント値を算出するカウントステップと、前記カウントステップで算出される前記パケットカウント値に基づいて、前記揺らぎ吸収用バッファにパケットを挿入又は削除するバッファサイズ変更ステップとを前記通話処理部に行わせることが好ましい。この発明では、前記通話処理部が前記カウントステップで算出される前記パケットカウント値に基づいて、前記揺らぎ吸収用バッファにパケットを挿入又は削除するバッファサイズ変更ステップを行うので、パケットの枯渇の防止や通話遅延の低減を実現することができ、より通話品質を向上させることができる。 In one embodiment, there is provided a fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit. The fluctuation absorbing processing program counts the number of voice data packets stored in the fluctuation absorbing buffer at a period not longer than the packetization period of the voice packet and calculates a packet count value; It is preferable to cause the call processing unit to perform a buffer size changing step of inserting or deleting a packet in the fluctuation absorbing buffer based on the packet count value calculated in the counting step. In the present invention, the call processing unit performs a buffer size changing step for inserting or deleting packets in the fluctuation absorbing buffer based on the packet count value calculated in the counting step. Reduction of call delay can be realized, and call quality can be further improved.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、前記バッファサイズ変更ステップにおいて、前記パケットカウント値の過去の履歴を基に、パケットカウント値の代表値を算出し、算出した代表値が所定の基準値より大きい場合、前記揺らぎ吸収用バッファからパケットを削除し、前記代表値が前記基準値より小さい場合、前記揺らぎ吸収用バッファにパケットを挿入する処理を前記通話処理部に行わせることが好ましい。この発明では、パケットの枯渇の防止や通話遅延の低減をより高精度に実現することができる。 In one embodiment, the fluctuation absorption processing program calculates a representative value of the packet count value based on a past history of the packet count value in the buffer size changing step, and the calculated representative value is a predetermined reference. When the value is larger than the value, it is preferable to delete the packet from the fluctuation absorbing buffer, and when the representative value is smaller than the reference value, it is preferable to cause the call processing unit to perform a process of inserting the packet into the fluctuation absorbing buffer. In the present invention, prevention of packet depletion and reduction of call delay can be realized with higher accuracy.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、最新のパケットの受信時刻を前記通話処理部に記録させ、前記カウントステップにおいて、前記最新のパケットのカウント値を、前記パケットカウント値の算出タイミングである算出時刻と前記受信時刻との差分を前記パケット化周期で除した値に設定し、前記最新のパケット以外のパケットのカウント値を1に設定して前記パケットカウント値を算出する処理を前記通話処理部に行わせることが好ましい。この発明では、前記通話処理部は、前記最新のパケット以外のパケットのカウント値を1に設定して前記パケットカウント値を算出するので、最新のパケットについてのみ受信時刻を記録しておけばよく、受信時刻を記録するための記録媒体における記録容量を節約することができる。 In one embodiment, the fluctuation absorption processing program causes the call processing unit to record the reception time of the latest packet, and in the counting step, the count value of the latest packet is calculated at the calculation timing of the packet count value. A process of setting the difference between a calculation time and the reception time to a value divided by the packetization period, setting the count value of packets other than the latest packet to 1, and calculating the packet count value It is preferable to have the processing unit perform it. In this invention, since the call processing unit calculates the packet count value by setting the count value of packets other than the latest packet to 1, it is only necessary to record the reception time only for the latest packet, The recording capacity in the recording medium for recording the reception time can be saved.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、過去N(Nは正の整数値)回のパケットカウント値を前記通話処理部に保持させ、前記バッファサイズ変更ステップにおいて、前記過去N回のパケットカウント値のうち、n(nはN未満の正の整数値)番目に小さいパケットカウント値を前記代表値とする処理を前記通話処理部に行わせることが好ましい。この発明では、パケットの枯渇の防止や通話遅延の低減をより高精度に実現することができる。 In one embodiment, the fluctuation absorption processing program causes the call processing unit to hold the packet count value of the past N (N is a positive integer value) times in the counting step, and in the buffer size changing step, Of the past N packet count values, it is preferable to cause the call processing unit to perform a process using the nth (n is a positive integer value less than N) -th smallest packet count value as the representative value. In the present invention, prevention of packet depletion and reduction of call delay can be realized with higher accuracy.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、前記過去N回のパケットカウント値に基づいて、スパイク遅延の有無を判定し、当該スパイク遅延が発生していると判定した場合は、前記過去N回のパケットカウント値のうち、過去M(MはM<Nの正の整数値)回のパケットカウント値を抽出する処理を前記通話処理部に行わせ、前記バッファサイズ変更ステップにおいて、前記カウントステップにより抽出された過去M回のパケットカウント値のうち、m(mはM未満の整数)番目に小さいパケットカウント値を前記代表値として算出する処理を前記通話処理部に行わせることが好ましい。この発明では、稀にしか発生しないスパイク遅延を排除しつつ前記代表値を算出することができる。 In one embodiment, when the fluctuation absorbing processing program determines the presence or absence of a spike delay based on the past N packet count values in the counting step, and determines that the spike delay has occurred Has the process of extracting the packet count value of the past M (M is a positive integer value of M <N) out of the past N packet count values to be performed by the call processing unit, and the buffer size changing step In the above, the call processing unit is caused to perform processing for calculating, as the representative value, the packet count value that is the mth (m is an integer less than M) of the past M packet count values extracted in the counting step. It is preferable. In the present invention, the representative value can be calculated while eliminating a spike delay that rarely occurs.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、前記パケットカウント値が連続してゼロとなった場合、当該連続してゼロとなった回数が増大するにつれて絶対値が増大する負の値を前記パケットカウント値として算出する処理を前記通話処理部に行わせることが好ましい。この発明では、前記揺らぎ吸収処理用プログラムは、連続してゼロとなった回数が増大するにつれて絶対値が増大する負の値を前記パケットカウント値として算出するので、パケットは定期的に受信できているものの、算出時刻において、蓄積パケット数がたまたま0になっている場合と、パケットを定期的に受信できていない場合との差を考慮してパケットカウント値を算出することが可能となる。したがって、後者の方が前者に比べてパケットが削除され難くなる。 In one embodiment, when the packet count value is continuously zero in the counting step, the fluctuation absorption processing program increases in absolute value as the number of times of continuous zero increases. It is preferable to cause the call processing unit to perform a process of calculating a negative value as the packet count value. In the present invention, the fluctuation absorption processing program calculates a negative value, which increases in absolute value as the number of times of continuous zero increases, as the packet count value, so that packets can be received periodically. However, the packet count value can be calculated in consideration of the difference between the case where the number of stored packets happens to be 0 at the calculation time and the case where the packets cannot be received regularly. Therefore, the packet is less likely to be deleted in the latter case than in the former case.
 一実施形態において、前記第2のソフトウェアは、前記伝送処理部で受信した前記音声パケットに含まれている音声データの全部又は一部が欠落した場合、欠落していない音声データを利用して、欠落した前記音声データの全部又は一部を補償する音声データ欠落補償処理のプログラムを含むことが好ましい。この発明では、音声データ欠落補償処理が音声データの全部又は一部が欠落した場合、欠落していない音声データを利用して欠落部分を補償するので、パケット伝送方式における通話品質をより向上させることができる。 In one embodiment, when all or a part of the voice data included in the voice packet received by the transmission processing unit is missing, the second software uses the missing voice data, It is preferable that a program for audio data missing compensation processing for compensating all or part of the missing audio data is included. According to the present invention, when all or part of the voice data is lost in the voice data missing compensation process, the missing part is compensated by using voice data that is not missing, so that the call quality in the packet transmission method is further improved. Can do.
 一実施形態において、前記伝送処理部で受信した前記音声パケットに含まれている音声データを蓄積する揺らぎ吸収用バッファを備え、前記揺らぎ吸収処理プログラムは、前記揺らぎ吸収用バッファに蓄積されている音声データのパケット数をカウントしてパケットカウント値を算出するカウントステップと、前記カウントステップで算出される前記パケットカウント値に基づいて、前記揺らぎ吸収用バッファにパケットを挿入又は削除するバッファサイズ変更ステップとを前記通話処理部に行わせるとともに、前記バッファサイズ変更ステップにおいて、前記揺らぎ吸収用バッファから1つのパケットを削除する場合、音声データを含む有効なパケットが連続して2つ以上存在すれば、これら連続する有効パケットのうち、中間に位置する連続する2つの有効パケットをオーバーラップ加算して削除する処理を前記通話処理部に行わせることが好ましい。この発明では、前記通話処理部が中間に位置する連続する2つの有効パケットをオーバーラップ加算して削除するので、パケットロス隠蔽処理による音声劣化を小さくすることができる。 In one embodiment, a fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit is provided, and the fluctuation absorbing processing program stores the voice stored in the fluctuation absorbing buffer. A counting step for calculating the packet count value by counting the number of data packets; and a buffer size changing step for inserting or deleting packets in the fluctuation absorbing buffer based on the packet count value calculated in the counting step; In the buffer size changing step, when one packet is deleted from the fluctuation absorbing buffer, if there are two or more valid packets including voice data, Located in the middle of consecutive valid packets Possible to perform the process of deleting the two valid packets successive overlap-add to the call processor is preferable. In the present invention, since the call processing unit overlaps and deletes two consecutive valid packets located in the middle, the voice deterioration due to the packet loss concealment process can be reduced.
 一実施形態において、前記揺らぎ吸収処理用プログラムは、前記バッファサイズ変更ステップにおいて、前記揺らぎ吸収用バッファにパケットを挿入する場合、連続する2つの有効パケットが存在すれば、これら2つの有効パケットの間に、音声を含まない無効なパケットを挿入する処理を前記通話処理部に行わせることが好ましい。この発明では、前記通話処理部は、連続する2つの有効パケットが存在すれば、これら2つの有効パケットの間に、音声を含まない無効なパケットを挿入するので、パケットロス隠蔽処理による音声劣化を小さくすることができる。 In one embodiment, when the fluctuation absorbing processing program inserts a packet into the fluctuation absorbing buffer in the buffer size changing step, if there are two consecutive valid packets, the program is between these two valid packets. It is preferable to cause the call processing unit to perform processing for inserting an invalid packet not including voice. In the present invention, if there are two consecutive valid packets, the call processing unit inserts an invalid packet that does not include voice between the two valid packets. Can be small.
 一実施形態において、前記第2のソフトウェアは、前記伝送処理部が出力する音声データの全部又は一部の欠落を検出する音声データ欠落検出処理のプログラムと、前記音声データから音声のピッチを検出するピッチ検出処理のプログラムと、前記音声データ欠落検出処理で音声データの欠落が検出されたときに前記ピッチ検出処理で検出されるピッチに基づいて、欠落した音声データを補償する音声データ欠落補償処理のプログラムとを含み、前記ピッチ検出処理プログラムは、現時点から過去に向けてある時間幅の音声信号を基準信号として設定する処理と、前記基準信号を前記音声信号に対して現時点から過去に向けてスライドさせ、前記基準信号と前記音声信号との相関を求めることで、前記音声信号のピッチを検出するとともに、前記基準信号のスライド量が増大するにつれて前記基準信号の時間幅を増大させる処理とを前記通話処理部に行わせることが好ましい。この発明では、前記基準信号のスライド量が増大するにつれて前記基準信号の時間幅が増大するので、ロス発生時点の直前の音声信号のピッチを精度良く検出することが可能となる。 In one embodiment, the second software detects an audio data loss detection processing program for detecting loss of all or part of audio data output from the transmission processing unit, and detects a pitch of audio from the audio data. A program for pitch detection processing, and audio data missing compensation processing for compensating for missing voice data based on a pitch detected by the pitch detection processing when voice data missing is detected by the voice data missing detection processing. The pitch detection processing program includes a process of setting an audio signal having a time width from the current time to the past as a reference signal, and sliding the reference signal from the current time to the past with respect to the audio signal. And detecting the pitch of the audio signal by obtaining the correlation between the reference signal and the audio signal. The reference signal and a process of increasing the time width of the reference signal as the amount of sliding is increased is possible to perform the call processing unit of the preferred. In this invention, since the time width of the reference signal increases as the slide amount of the reference signal increases, it is possible to accurately detect the pitch of the audio signal immediately before the loss occurrence time.
 一実施形態において、前記ピッチ検出処理プログラムは、前記基準信号のスライド量が所定のスライド基準値になるまで、前記基準信号の時間幅を所定の初期時間幅に設定する処理を前記通話処理部に行わせることが好ましい。この発明では、前記基準信号のスライド量が小さい場合であっても、前記基準信号の時間幅を一定の大きさ以上確保することが可能となり、前記基準信号と音声信号の間の相関をより精度良く求めることができる。 In one embodiment, the pitch detection processing program causes the call processing unit to perform a process of setting a time width of the reference signal to a predetermined initial time width until a slide amount of the reference signal reaches a predetermined slide reference value. It is preferable to carry out. According to the present invention, even when the slide amount of the reference signal is small, it is possible to ensure a time width of the reference signal equal to or larger than a certain amount, and the correlation between the reference signal and the audio signal is more accurate. You can ask well.
 一実施形態において、前記ピッチ検出処理プログラムは、平均振幅差関数法により前記基準信号と前記音声信号との相関を求める処理を前記通話処理部に行わせることが好ましい。この発明では、比較的少ない計算量でありながら精度良く前記基準信号と前記音声信号との相関を求めることができる。 In one embodiment, it is preferable that the pitch detection processing program causes the call processing unit to perform processing for obtaining a correlation between the reference signal and the voice signal by an average amplitude difference function method. According to the present invention, the correlation between the reference signal and the audio signal can be obtained with high accuracy with a relatively small amount of calculation.
 一実施形態において、前記ピッチ検出処理プログラムは、式(1)の平均振幅差関数を用いて前記基準信号と前記音声信号との相関を求める処理を前記通話処理部に行わせることが好ましい。 In one embodiment, it is preferable that the pitch detection processing program causes the call processing unit to perform a process of obtaining a correlation between the reference signal and the voice signal using an average amplitude difference function of Expression (1).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
但し、φ(τ)は相関値、Nは前記基準信号の時間幅、x(j)は前記基準信号、x(j-τ)は前記音声信号、k+1は前記基準信号の開始点、aは予め定められた係数、τは前記基準信号のスライド量をそれぞれ示す。この発明では、式(1)を用いることでより精度良く前記基準信号と前記音声信号との相関を求めることができる。 Where φ (τ) is the correlation value, N is the time width of the reference signal, x (j) is the reference signal, x (j−τ) is the audio signal, k + 1 is the starting point of the reference signal, a represents a predetermined coefficient, and τ represents the slide amount of the reference signal. In the present invention, the correlation between the reference signal and the audio signal can be obtained with higher accuracy by using Expression (1).
 本発明の第2の特徴において、前記第2のソフトウェアは、前記伝送処理部が出力する音声データの全部又は一部の欠落を検出する音声データ欠落検出処理のプログラムと、前記音声データから音声のピッチを検出するピッチ検出処理のプログラムと、前記音声データ欠落検出処理で音声データの欠落が検出されたときに前記ピッチ検出処理で検出されるピッチに基づいて、欠落した音声データを補償する音声データ欠落補償処理のプログラムと、前記ピッチ検出処理で検出されるピッチを利用して前記音声データを伸長又は圧縮する話速変換処理のプログラムとを含む。この発明では、前記音声データ欠落補償処理、並びに前記話速変換処理において前記ピッチ検出処理で検出されるピッチを共用するので、音声データ欠落補償処理のプログラムと話速変換処理のプログラムがそれぞれピッチ検出処理のプログラムを装備する構成に比べて、プログラムをロードするメモリの消費を抑えることができる。 In the second aspect of the present invention, the second software includes a program for audio data loss detection processing for detecting loss of all or part of the audio data output from the transmission processing unit, and audio data from the audio data. A program for pitch detection processing for detecting a pitch, and audio data for compensating for missing audio data based on the pitch detected by the pitch detection processing when audio data loss is detected by the audio data loss detection processing A program for missing compensation processing, and a program for speech speed conversion processing for expanding or compressing the audio data using the pitch detected by the pitch detection processing. In the present invention, since the pitch detected in the pitch detection process is shared in the voice data loss compensation process and the speech speed conversion process, the voice data missing compensation process program and the speech speed conversion process program are respectively pitch detection. Compared to a configuration equipped with a processing program, it is possible to suppress the consumption of memory for loading the program.
 一実施形態において、前記ピッチ検出処理は、所定の検出周期をカウントするとともに当該検出周期に同期して前記ピッチを繰り返し検出し、前記音声データ欠落検出処理で音声データの欠落が検出されたときは当該音声データ欠落の検出時点で前記ピッチを検出するとともに当該検出時点から前記検出周期のカウントを再開することが好ましい。この発明では、音声データ欠落補償処理後における音声の品質を保つことができる。 In one embodiment, the pitch detection process counts a predetermined detection cycle and repeatedly detects the pitch in synchronization with the detection cycle. When the audio data loss detection process detects a loss of audio data, It is preferable that the pitch is detected at the time of detection of the missing audio data and the detection cycle is restarted from the detection time. In the present invention, it is possible to maintain the quality of the voice after the voice data missing compensation process.
 一実施形態において、前記ピッチ検出処理は、所定の周波数範囲のピッチのみを検出することが好ましい。この発明では、不要な周波数範囲のピッチ検出が行われることがないので、処理負荷を軽減することができる。 In one embodiment, it is preferable that the pitch detection process detects only a pitch in a predetermined frequency range. In the present invention, since the pitch detection in an unnecessary frequency range is not performed, the processing load can be reduced.
 一実施形態において、前記話速変換処理は、前記音声データの音声区間を検出し、当該音声区間の音声データのみを話速変換することが好ましい。この発明では、音声区間以外の区間(例えば、無音区間)で話速変換処理が行われることがないので、話速変換処理における処理負荷を軽減することができる。 In one embodiment, it is preferable that the speech speed conversion process detects a voice section of the voice data and converts only the voice data of the voice section. In the present invention, since the speech speed conversion process is not performed in a section other than the voice section (for example, a silent section), the processing load in the speech speed conversion process can be reduced.
 一実施形態において、前記音声データ欠落検出処理は、1パケット分の前記音声データの時間長を正の整数で除した第1の時間間隔と前記音声データの入力タイミングに同期して音声データの欠落を検出し、前記ピッチ検出処理は、前記第1の時間間隔を正の整数倍した前記検出周期と当該第1の時間間隔に同期してピッチを検出することが好ましい。この発明では、前記ピッチ検出処理は、前記第1の時間間隔を正の整数倍した前記検出周期と当該第1の時間間隔に同期してピッチを検出するので、ピッチ検出処理を実行するタイミングの制御が簡便になるという利点がある。 In one embodiment, the audio data loss detection processing is performed in synchronization with a first time interval obtained by dividing a time length of the audio data for one packet by a positive integer and the input timing of the audio data. It is preferable that the pitch detection process detects the pitch in synchronization with the detection period obtained by multiplying the first time interval by a positive integer and the first time interval. In the present invention, the pitch detection process detects the pitch in synchronization with the detection period obtained by multiplying the first time interval by a positive integer and the first time interval. There is an advantage that the control becomes simple.
 一実施形態において、前記話速変換処理は、前記音声データ欠落検出処理が音声データの欠落を検出しているときに話速変換を行う場合、前記音声データ欠落検出処理が音声データの欠落を検出する直前に前記ピッチ検出処理で検出されたピッチを用いて話速変換を行うことが好ましい。この発明では、前記話速変換処理による音声の品質劣化を抑えることができる。 In one embodiment, the speech speed conversion process is performed when the speech data loss detection process detects speech data loss when the speech data loss detection process detects speech data loss. It is preferable that speech speed conversion be performed using the pitch detected by the pitch detection process immediately before. According to the present invention, it is possible to suppress deterioration in voice quality due to the speech speed conversion process.
 一実施形態において、前記話速変換処理は、前記音声データ欠落検出処理が音声データの欠落を検出しているときに話速変換を行う場合、前記音声データ欠落補償処理で補償された音声データから前記ピッチ検出処理で検出されたピッチを用いて話速変換を行うことが好ましい。この発明では、音声データが欠落しているときに前記話速変換処理を開始する場合においても、一定の検出周期で前記ピッチ検出処理を実行すればよいので、前記ピッチ検出処理を実行するタイミングの制御が簡便になるという利点がある。 In one embodiment, the speech speed conversion process is performed by using speech data compensated by the speech data loss compensation process when speech speed conversion is performed when the speech data loss detection process detects a lack of speech data. It is preferable to perform speech speed conversion using the pitch detected by the pitch detection process. In the present invention, even when the speech speed conversion process is started when voice data is missing, the pitch detection process only needs to be executed at a constant detection cycle. There is an advantage that the control becomes simple.
 一実施形態において、前記ピッチ検出処理は、前記音声データの音声区間と非音声区間とを判別し、前記音声区間における前記検出周期よりも前記非音声区間における前記検出周期を長くすることが好ましい。この発明では、前記音声区間においては相対的に短い検出周期でピッチ検出が行われるために話速変換処理の品質を確保し、前記非音声区間においては相対的に長い検出周期でピッチ検出が行われるために処理負荷を軽減することができる。 In one embodiment, it is preferable that the pitch detection process discriminates between a voice section and a non-voice section of the voice data, and makes the detection period in the non-voice section longer than the detection period in the voice section. In the present invention, since the pitch detection is performed with a relatively short detection period in the voice section, the quality of speech speed conversion processing is ensured, and the pitch detection is performed with a relatively long detection period in the non-voice section. Therefore, the processing load can be reduced.
 本発明の第3の特徴において、前記第2のソフトウェアは、前記マイクロホンとスピーカの音響結合によって生じる音響エコー経路により形成される閉ループの一巡利得を低減してハウリングを抑制する音声スイッチ処理のプログラムを含み、当該音声スイッチ処理プログラムは、前記音響エコー経路の帰還利得を推定し、当該帰還利得の推定値に基づいて、前記伝送処理部から出力される受話の音声データを減衰させる受話側減衰量と、前記伝送処理部に入力される送話の音声データを減衰させる送話側減衰量との総和を算出するとともに、送話及び受話の各音声データを監視して通話状態を推定し、当該通話状態の推定結果と前記総和の算出値に応じて前記送話側減衰量と前記受話側減衰量の配分を決定し、前記帰還利得の推定値の減少量に応じて前記総和を減少させる処理を前記通話処理部に行わせることが好ましい。この発明では、前記通話処理部は、前記通話状態の推定結果と前記総和の算出値に応じて前記送話側減衰量と前記受話側減衰量の配分を決定し、前記帰還利得の推定値の減少量に応じて前記総和を減少させるので、パケット伝送方式における通話品質をより向上させることができる。 In the third aspect of the present invention, the second software includes a voice switch processing program that reduces a loop gain of a closed loop formed by an acoustic echo path generated by acoustic coupling between the microphone and a speaker and suppresses howling. The voice switch processing program estimates a feedback gain of the acoustic echo path, and, based on the estimated value of the feedback gain, attenuates the received voice attenuation data received from the transmission processing unit; Calculating the sum of the attenuation on the transmission side that attenuates the voice data of the transmission input to the transmission processing unit, monitoring the voice data of the transmission and reception, estimating the call state, and The distribution of the transmission side attenuation and the reception side attenuation is determined according to the state estimation result and the calculated value of the sum, and the estimated value of the feedback gain is reduced. It is preferred to perform the process for reducing the total depending on the amount to the call processor. In the present invention, the call processing unit determines the distribution of the transmission-side attenuation amount and the reception-side attenuation amount in accordance with the estimation result of the call state and the calculated value of the sum, and determines the estimated value of the feedback gain. Since the sum is decreased according to the amount of decrease, call quality in the packet transmission method can be further improved.
 本発明の第4の特徴において、住宅内に設置される通話装置が接続される内線接続線と、当該内線接続線を介してアナログの音声信号を伝送する内線用アナログ信号伝送部とを備え、前記通話処理部で前記第1のソフトウェアを実行して通話処理された音声データが前記内線用アナログ信号伝送部から前記内線接続線を介して前記通話装置に伝送されることが好ましい。この発明では、前記通話装置とのアナログ伝送方式による内線通話が可能となる。 In a fourth aspect of the present invention, the power supply device includes an extension connection line to which a communication device installed in a house is connected, and an extension analog signal transmission unit that transmits an analog voice signal through the extension connection line. It is preferable that the voice data processed by executing the first software in the call processing unit is transmitted from the extension analog signal transmission unit to the call device via the extension connection line. According to the present invention, an extension call can be made with the call device by an analog transmission method.
 本発明の第5の特徴において、前記第1のソフトウェアは、前記アナログの音声信号がA/D変換されたデジタルの音声信号から音声のピッチを検出するとともに当該ピッチを利用して前記デジタルの音声信号を伸長又は圧縮する話速変換処理のプログラムを含むことが好ましい。この発明では、前記第1のソフトウェアは話速変換処理のプログラムを含むので、アナログ伝送方式による通話においても、相手が発した音声の話速よりも速く又は遅くすることができる。 In the fifth aspect of the present invention, the first software detects a pitch of a voice from a digital voice signal obtained by A / D converting the analog voice signal and uses the pitch for the digital voice. It is preferable to include a speech speed conversion processing program for expanding or compressing a signal. In the present invention, since the first software includes a program for converting the speech speed, the speech speed of the voice uttered by the other party can be made faster or slower even in a call using the analog transmission method.
 本発明の好ましい実施形態をさらに詳細に記述する。本発明の他の特徴および利点は、以下の詳細な記述および添付図面に関連して一層良く理解されるものである。
本発明の実施形態1の住戸機を示すブロック図並びに当該住戸機を含む集合住宅用インターホンシステムのシステム構成図である。 本発明の実施形態1の通話処理部が第1のソフトウェアを実行しているときのブロック図である。 本発明の実施形態1の音声スイッチの処理を説明するためのフローチャートである。 図4Aは本発明の実施形態1のドアホン子機とのインターホン通話時の動作を説明するためのブロック図、図4Bは本発明の実施形態1の副親機との内線通話時の動作を説明するためのブロック図である。 図5Aは本発明の実施形態1のロビーインターホンとのインターホン通話時の動作を説明するためのブロック図、図5Bは本発明の実施形態1の管理室装置とのインターホン通話時の動作を説明するためのブロック図、図5Cは本発明の実施形態1の他の住戸機とのインターホン通話時の動作を説明するためのブロック図、図5Dは本発明の実施形態1のロビーインターホン又は管理室装置と副親機とのインターホン通話時の動作を説明するためのブロック図である。 本発明の実施形態1の通話処理部が第2のソフトウェアを実行しているときのブロック図である。 本発明の実施形態1のエコーサプレッサの処理を説明するためのフローチャートである。 本発明の実施形態1の音声データ欠落補償処理部を示すブロック図である。 本発明の実施形態1の音声データ欠落補償処理の基本原理を説明するための音声信号(受話音声信号)の波形図である。 本発明の実施形態1のテンプレート設定部及びピッチ検出部の処理を説明するための受話音声信号の波形図である。 従来のテンプレートを用いたときのテンプレートと受話音声信号との相関値の演算結果を示したグラフである。 本発明の実施形態1のテンプレート設定部及びピッチ検出部の処理を説明する図である。 本発明の実施形態1の相関値のグラフを示している。 本発明の実施形態1の音声データ欠落補償処理を示すフローチャートである。 本発明の実施形態1の揺らぎ吸収処理部を示すブロック図である。 本発明の実施形態1のカウント部によるパケットカウント値の算出処理の説明図である。 本発明の実施形態1のジッタバッファの役割を説明するための図である。 伝送遅延と発生頻度との関係を示す伝送遅延特性の一例を示す図である。 本発明の実施形態1のジッタバッファの最適なバッファサイズを説明するための図である。 本発明の実施形態1の揺らぎ吸収処理を示すフローチャートである。 本発明の実施形態1のパケットカウント値の算出処理の詳細を示すフローチャートである。 本発明の実施形態1のパケットカウント値と、パケットカウント値の算出時刻との関係を示したグラフである。 図23Aはバッファサイズ変更部によるパケット挿入時の処理を示した模式図であり、図23Bはバッファサイズ変更部によるパケット削除時の処理を示した模式図である。 本発明の実施形態1のパケットカウント値の別の算出手法の説明図である。 本発明の実施形態1のパケットカウント値の別の算出処理を示すフローチャートである。 本発明の実施形態1のスパイク遅延の有無の判定処理を説明するためのグラフである。 本発明の実施形態1のスパイク遅延が発生している場合のパケットカウント値とインデックスとの関係を示すグラフである。 図28A及び28Bは、本発明の実施形態1のカウント部の処理を説明する図である。 図29A、29B及び29Cは、バッファサイズ変更部がオーバーラップ加算により1つのパケットを削除する処理の説明図である。 図30A及び30Bは、バッファサイズ変更部が1つの無効パケットを削除する処理の説明図である。 図31A及び31Bは、バッファサイズ変更部がオーバーラップ加算により1つのパケットを挿入する処理の説明図である。 図32A及び32Bは、ジッタバッファに5つのパケットを一度に挿入する場合の処理を説明するための図である。 図33A、33B及び33Cは、無効パケットを削除した後に、削除した無効パケットに対応する有効パケットを受信した場合の処理を説明する図である。 図34A及び34Bは、バッファサイズ変更部が無効パケットに代えて隠蔽処理されたパケットをジッタバッファに挿入させる場合の処理を説明する図である。 バッファサイズ変更部による削除処理を示したフローチャートである。 バッファサイズ変更部による挿入処理を示したフローチャートである。 本発明の実施形態2の音声のピッチを音声データ欠落補償処理部、及び話速変換処理部で共用するときの通話処理部のブロック図である。 本発明の実施形態2のピッチ検出部の動作説明図である。 図39A及び39Bは、本発明の実施形態3の音声データ欠落検出部並びにピッチ検出部の動作説明図である。 本発明の実施形態3の動作説明図である。 本発明の実施形態3の動作説明図である。 本発明の実施形態3の動作説明図である。
Preferred embodiments of the invention are described in further detail. Other features and advantages of the present invention will be better understood with reference to the following detailed description and accompanying drawings.
It is a block diagram which shows the dwelling unit of Embodiment 1 of this invention, and the system block diagram of the collective housing intercom system containing the said dwelling unit. It is a block diagram when the call processing part of Embodiment 1 of this invention is running 1st software. It is a flowchart for demonstrating the process of the audio | voice switch of Embodiment 1 of this invention. FIG. 4A is a block diagram for explaining an operation during an intercom call with the door phone slave unit according to the first embodiment of the present invention, and FIG. 4B illustrates an operation during an extension call with the sub master unit according to the first embodiment of the present invention. It is a block diagram for doing. FIG. 5A is a block diagram for explaining an operation during an interphone call with the lobby interphone according to the first embodiment of the present invention, and FIG. 5B explains an operation during an interphone call with the management room device according to the first embodiment of the present invention. FIG. 5C is a block diagram for explaining an operation during an interphone call with another dwelling unit according to Embodiment 1 of the present invention, and FIG. 5D is a lobby interphone or management room device according to Embodiment 1 of the present invention. FIG. 6 is a block diagram for explaining an operation during an interphone call between the mobile phone and the sub-master. It is a block diagram when the call processing part of Embodiment 1 of this invention is running 2nd software. It is a flowchart for demonstrating the process of the echo suppressor of Embodiment 1 of this invention. It is a block diagram which shows the audio | voice data missing compensation processing part of Embodiment 1 of this invention. It is a wave form diagram of a voice signal (received voice signal) for explaining a basic principle of voice data loss compensation processing of Embodiment 1 of the present invention. It is a wave form diagram of a received voice signal for demonstrating the process of the template setting part and pitch detection part of Embodiment 1 of this invention. It is the graph which showed the calculation result of the correlation value of a template when using the conventional template, and an incoming voice signal. It is a figure explaining the process of the template setting part and pitch detection part of Embodiment 1 of this invention. The graph of the correlation value of Embodiment 1 of the present invention is shown. It is a flowchart which shows the audio | voice data missing compensation process of Embodiment 1 of this invention. It is a block diagram which shows the fluctuation | variation absorption process part of Embodiment 1 of this invention. It is explanatory drawing of the calculation process of the packet count value by the count part of Embodiment 1 of this invention. It is a figure for demonstrating the role of the jitter buffer of Embodiment 1 of this invention. It is a figure which shows an example of the transmission delay characteristic which shows the relationship between transmission delay and occurrence frequency. It is a figure for demonstrating the optimal buffer size of the jitter buffer of Embodiment 1 of this invention. It is a flowchart which shows the fluctuation | variation absorption process of Embodiment 1 of this invention. It is a flowchart which shows the detail of the calculation process of the packet count value of Embodiment 1 of this invention. It is the graph which showed the relationship between the packet count value of Embodiment 1 of this invention, and the calculation time of a packet count value. FIG. 23A is a schematic diagram showing processing at the time of packet insertion by the buffer size changing unit, and FIG. 23B is a schematic diagram showing processing at the time of packet deletion by the buffer size changing unit. It is explanatory drawing of another calculation method of the packet count value of Embodiment 1 of this invention. It is a flowchart which shows another calculation process of the packet count value of Embodiment 1 of this invention. It is a graph for demonstrating the determination process of the presence or absence of spike delay of Embodiment 1 of this invention. It is a graph which shows the relationship between the packet count value and index when the spike delay of Embodiment 1 of this invention has generate | occur | produced. 28A and 28B are diagrams illustrating the processing of the count unit according to the first embodiment of the present invention. 29A, 29B, and 29C are explanatory diagrams of processing in which the buffer size changing unit deletes one packet by overlap addition. 30A and 30B are explanatory diagrams of processing in which the buffer size changing unit deletes one invalid packet. 31A and 31B are explanatory diagrams of processing in which the buffer size changing unit inserts one packet by overlap addition. 32A and 32B are diagrams for explaining processing when five packets are inserted into the jitter buffer at one time. 33A, 33B, and 33C are diagrams for explaining processing when a valid packet corresponding to a deleted invalid packet is received after the invalid packet is deleted. FIGS. 34A and 34B are diagrams illustrating processing when the buffer size changing unit inserts a concealed packet in place of an invalid packet into the jitter buffer. It is the flowchart which showed the deletion process by the buffer size change part. It is the flowchart which showed the insertion process by the buffer size change part. It is a block diagram of a telephone call processing part when the pitch of the voice of Embodiment 2 of the present invention is shared by the voice data loss compensation processing part and the speech speed conversion processing part. It is operation | movement explanatory drawing of the pitch detection part of Embodiment 2 of this invention. 39A and 39B are explanatory diagrams of operations of the voice data loss detection unit and the pitch detection unit according to the third embodiment of the present invention. It is operation | movement explanatory drawing of Embodiment 3 of this invention. It is operation | movement explanatory drawing of Embodiment 3 of this invention. It is operation | movement explanatory drawing of Embodiment 3 of this invention.
 (実施形態1)
 以下、図1~36を参照して本発明の実施形態1を詳細に説明する。まず、本発明に係る住戸機が含まれる集合住宅用インターホンシステムについて説明する。
(Embodiment 1)
Hereinafter, Embodiment 1 of the present invention will be described in detail with reference to FIGS. First, an intercom system for an apartment house including a dwelling unit according to the present invention will be described.
 本実施形態における集合住宅用インターホンシステムは、図1に示すように集合住宅の共用玄関(ロビー)に設置された共用部装置(ロビーインターホン)LIと、集合住宅の各住戸内に設置された住戸機A(図示は1台のみ)と、各住戸の外玄関に設置されたドアホン子機Bと、ロビーインターホンLIに接続された信号幹線Lsと、信号幹線Lsから分岐されて各住戸の住戸機Aに接続される住戸線Ldと、住戸機Aとドアホン子機Bを接続する子機接続線Lbとを有している。また、信号幹線Ls並びに住戸線Ldを介して住戸機A並びにロビーインターホンLIと接続された制御装置CTと、集合住宅の管理人室などに設置されて信号幹線Lsを介してロビーインターホンLIや各住戸機Aとの間で音声情報などを授受する管理室装置Xとを有している。さらに、住戸内には1乃至複数(図示例では2台)の通話装置(副親機)Cが設置され、内線接続線Lcによって住戸機(親機)Aと副親機Cが接続されている。 As shown in FIG. 1, the intercom system for an apartment house in this embodiment includes a common unit device (lobby interphone) LI installed at the common entrance (lobby) of the apartment house, and a dwelling unit installed in each unit of the apartment house Unit A (only one shown), door phone slave unit B installed at the entrance of each dwelling unit, signal trunk line Ls connected to lobby interphone LI, and branch unit from signal trunk line Ls A dwell unit line Ld connected to A and a slave unit connection line Lb connecting the dwell unit A and the door phone slave unit B are provided. In addition, the control unit CT connected to the dwelling unit A and the lobby intercom LI via the signal trunk line Ls and the dwelling unit line Ld, and the lobby intercom LI and each And a management room device X that exchanges voice information and the like with the dwelling unit A. Further, one or more (two in the illustrated example) communication devices (secondary master units) C are installed in the dwelling unit, and the dwelling unit (parent unit) A and the second master unit C are connected by the extension connection line Lc. Yes.
 ドアホン子機Bは、マイクロホン及びスピーカと、来訪者の呼出操作を受け付ける呼出釦と、子機接続線Lbを介して呼出信号を送信するとともに住戸機Aとの間で音声信号を送受信(アナログ伝送)する通信部とを具備している。なお、ドアホン子機Bがカメラ付きである場合、カメラで撮影された来訪者の映像が子機接続線Lbを介してドアホン子機Bから住戸機Aにアナログ伝送される。住戸機Aはドアホン子機Bから伝送される映像を内線接続線Lcを介して副親機Cに転送する。住戸機A並びに副親機Cでは、ドアホン子機Bから伝送される映像をモニタ(表示部3)に表示し、住戸機Aの応答釦が押操作されれば、住戸機Aとドアホン子機Bとの間で通話が可能となり、副親機Cの応答釦が押操作されれば、副親機Cとドアホン子機Bとの間で通話が可能となる。 The door phone slave unit B transmits a call signal via the microphone and speaker, a call button that accepts a visitor's call operation, and the slave unit connection line Lb, and transmits and receives voice signals to and from the dwell unit A (analog transmission). ) Communication unit. In addition, when the doorphone slave unit B is equipped with a camera, a visitor image captured by the camera is analog-transmitted from the doorphone slave unit B to the dwelling unit A via the slave unit connection line Lb. The dwelling unit A transfers the video transmitted from the door phone slave unit B to the sub-master unit C via the extension connection line Lc. In the dwelling unit A and the sub-main unit C, if the video transmitted from the doorphone slave unit B is displayed on the monitor (display unit 3) and the response button of the dwelling unit A is pressed, the dwelling unit A and the doorphone slave unit are displayed. A call can be made with B, and if the response button of the sub-master unit C is pressed, a call can be made between the sub-master unit C and the door phone slave unit B.
 また副親機Cは、マイクロホン及びスピーカ、内線呼出の呼出操作を受け付ける呼出釦、内線接続線Lcを介して呼出信号を送信するとともに音声信号を送受信(アナログ伝送)する通信部などを具備している。 The sub-master C includes a microphone and a speaker, a call button for receiving an extension call operation, a communication unit that transmits a call signal and transmits / receives an audio signal (analog transmission) via the extension connection line Lc. Yes.
 ロビーインターホンLIは、来訪者を撮像する撮像装置、マイクロホン及びスピーカ、来訪者が訪問先の住戸の住戸番号を入力するためのテンキースイッチ又はタッチパネル、音声情報や映像情報を信号幹線Lsを通じてパケット伝送する伝送部などを具備している。ロビーインターホンLIでは、テンキースイッチ又はタッチパネルが操作されて何れかの住戸の住戸番号の操作入力を受け付けると、データフィールドに当該住戸番号を格納したパケット、並びに撮像装置で撮像した来訪者の映像(映像情報)をデータフィールドに格納したパケットを伝送部より信号幹線Lsを介して制御装置CTのアドレス宛に送信(パケット伝送)する。 The lobby interphone LI packet-transmits voice information and video information via the signal trunk line Ls, an imaging device that captures the image of the visitor, a microphone and a speaker, a numeric keypad or touch panel for the visitor to enter the dwelling unit number of the visited residence. A transmission unit and the like are provided. In the lobby intercom LI, when the ten key switch or touch panel is operated to accept the operation input of the dwelling unit number of any dwelling unit, the packet storing the dwelling unit number in the data field and the video of the visitor captured by the imaging device (video) A packet storing information) in the data field is transmitted (packet transmission) from the transmission unit to the address of the control device CT via the signal trunk line Ls.
 管理室装置Xは、マイクロホン及びスピーカ、管理人が連絡先の住戸の住戸番号を入力するためのテンキースイッチ又はタッチパネル、音声情報を信号幹線Lsを通じてパケット伝送する伝送部などを具備している。管理室装置Xでは、テンキースイッチ又はタッチパネルが操作されて何れかの住戸の住戸番号の操作入力を受け付けると、データフィールドに当該住戸番号を格納したパケットを伝送部より信号幹線Lsを介して制御装置CTのアドレス宛に送信する。 The management room device X includes a microphone and a speaker, a numeric keypad or a touch panel for an administrator to input a dwelling unit number of a contact destination, a transmission unit for transmitting voice information through the signal trunk line Ls, and the like. In the management room device X, when the numeric keypad or the touch panel is operated to accept the operation input of the dwelling unit number of any dwelling unit, the control unit transmits the packet storing the dwelling unit number in the data field from the transmission unit via the signal trunk line Ls. Send to CT address.
 制御装置CTは、各住戸の住戸機Aに割り当てられているアドレスと当該住戸の住戸番号との対応関係を記憶しており、ロビーインターホンLIや管理室装置Xから受け取ったパケットのデータフィールドに格納されている住戸番号を前記対応関係と照合してアドレスに変換し、当該アドレスを宛先アドレスフィールドに格納し且つロビーインターホンLI又は管理室装置Xからの呼出を通知するための呼出コマンドをデータフィールドに格納したパケット並びに前記映像情報をデータフィールドに格納したパケットを信号幹線Lsに送出する。ただし、上述のようなロビーインターホンLIや管理室装置X、制御装置CTについては従来周知であるから、詳細な構成の図示並びに説明は省略する。 The control device CT stores the correspondence between the address assigned to the dwelling unit A of each dwelling unit and the dwelling unit number of the dwelling unit, and stores it in the data field of the packet received from the lobby intercom LI or the management room device X. The stored unit number is compared with the correspondence and converted into an address, the address is stored in the destination address field, and a call command for notifying the call from the lobby intercom LI or the control room device X is stored in the data field. The stored packet and the packet storing the video information in the data field are sent to the signal trunk line Ls. However, since the lobby intercom LI, the management room device X, and the control device CT as described above are conventionally known, detailed illustration and description thereof will be omitted.
 住戸機Aは、制御部1、マイクロホン2a及びスピーカ2b、通話処理部2、表示部3、映像処理部4、記憶部5、呼出検出部6、伝送処理部7、対副親通信処理部8、アナログ信号伝送部9、第1の変換処理部10、第2の変換処理部11、第1の切換部12、第2の切換部13、第3の切換部14などを備えている。 The dwelling unit A includes a control unit 1, a microphone 2a and a speaker 2b, a call processing unit 2, a display unit 3, a video processing unit 4, a storage unit 5, a call detection unit 6, a transmission processing unit 7, and a secondary communication processing unit 8. , An analog signal transmission unit 9, a first conversion processing unit 10, a second conversion processing unit 11, a first switching unit 12, a second switching unit 13, a third switching unit 14, and the like.
 マイクロホン2aから出力されるアナログの音声信号(送話音声信号)は、アンプAMP1で増幅された後に第1の変換処理部10のA/D変換器10aでデジタルの送話音声信号(送話音声データ)に変換されて通話処理部2に入力される。また、通話処理部2で通話処理された後のデジタルの音声信号(受話音声信号)は、第1の変換処理部10のD/A変換器10bでアナログの受話音声信号に変換された後にアンプAMP2で増幅されてスピーカ2bに出力される。 The analog voice signal (speech voice signal) output from the microphone 2a is amplified by the amplifier AMP1, and then converted into a digital voice signal (speech voice) by the A / D converter 10a of the first conversion processing unit 10. Data) and input to the call processing unit 2. The digital voice signal (received voice signal) after the call processing by the call processing unit 2 is converted into an analog received voice signal by the D / A converter 10b of the first conversion processing unit 10, and then the amplifier. Amplified by AMP2 and output to the speaker 2b.
 一方、通話処理部2で通話処理されたデジタルの送話音声信号(送話音声データ)は、後述するドアホン通話又は内線通話の場合、第2の変換処理部11のD/A変換器11aでアナログの送話音声信号に変換された後にアンプAMP3で増幅されてアナログ信号伝送部9に出力される。ただし、後述するインターホン通話や住戸間通話の場合、通話処理部2で通話処理された後のデジタルの送話音声信号は、直接、伝送処理部7に出力される。また、アナログ信号伝送部9から出力されるアナログの受話音声信号は、アンプAMP4で増幅された後に第2の変換処理部11のA/D変換器11bでデジタルの受話音声信号(受話音声データ)に変換されて通話処理部2に入力される。ただし、伝送処理部7から出力されるデジタルの受話音声信号は、直接、通話処理部2に入力される。なお、アナログ信号伝送部9は従来周知の2線/4線変換器(ハイブリッドトランス)からなる。 On the other hand, the digital transmission voice signal (transmission voice data) processed by the call processing unit 2 is transmitted by the D / A converter 11a of the second conversion processing unit 11 in the case of a door phone call or an extension call to be described later. After being converted into an analog transmission voice signal, it is amplified by the amplifier AMP3 and output to the analog signal transmission unit 9. However, in the case of an interphone call or a dwelling unit call which will be described later, the digital transmitted voice signal after the call processing by the call processing unit 2 is directly output to the transmission processing unit 7. Further, the analog reception voice signal output from the analog signal transmission unit 9 is amplified by the amplifier AMP4 and then digitally received by the A / D converter 11b of the second conversion processing unit 11 (reception voice data). And is input to the call processing unit 2. However, the digital received voice signal output from the transmission processing unit 7 is directly input to the call processing unit 2. The analog signal transmission unit 9 is composed of a conventionally known 2-wire / 4-wire converter (hybrid transformer).
 アナログ信号伝送部9の2線側に第1の切換部12が接続されている。第1の切換部12は、アナログ信号伝送部9の2線側を子機接続線Lbに接続する状態と第2の切換部13に接続する状態とに択一的に切り換える。また第2の切換部13は、第1の切換部12を内線接続線Ldに接続する状態と接続しない状態とに択一的に切り換える。さらに第3の切換部14は、子機接続線Lbと内線接続線Lcとを接続する状態と接続しない状態とに択一的に切り換える。なお、第1~第3の切換部12,13,14の切換は、何れも制御部1によって制御される。 The first switching unit 12 is connected to the two-wire side of the analog signal transmission unit 9. The first switching unit 12 selectively switches between a state in which the two-wire side of the analog signal transmission unit 9 is connected to the slave unit connection line Lb and a state in which the analog signal transmission unit 9 is connected to the second switching unit 13. The second switching unit 13 selectively switches the first switching unit 12 between a state where it is connected to the extension connection line Ld and a state where it is not connected. Further, the third switching unit 14 selectively switches between a state in which the slave unit connection line Lb and the extension connection line Lc are connected and a state in which it is not connected. Note that the switching of the first to third switching units 12, 13, and 14 is all controlled by the control unit 1.
 制御部1はマイクロコンピュータを主構成要素とし、上記切換制御を含む住戸機A全体の制御を行っている。表示部3は、液晶ディスプレイなどの表示デバイスと当該表示デバイスを駆動するドライバ回路、入力デバイスとしてのタッチパネルなどを有している。映像処理部4は、後述するように伝送処理部7から受け取った映像信号を信号処理して表示部3に映像を表示させる。具体的には、ロビーインターホンLIからパケット伝送される来訪者の映像(静止画像又は動画像)が表示部3で表示される。 The control unit 1 has a microcomputer as a main component and controls the entire dwelling unit A including the switching control. The display unit 3 includes a display device such as a liquid crystal display, a driver circuit that drives the display device, a touch panel as an input device, and the like. As will be described later, the video processing unit 4 performs signal processing on the video signal received from the transmission processing unit 7 and displays the video on the display unit 3. Specifically, a video (still image or moving image) of a visitor packet-transmitted from the lobby interphone LI is displayed on the display unit 3.
 通話処理部2はマイクロプロセッサやASIC(Application Specific Integrated Circuit)あるいはDSP(Digital Signal Processor)などを備えて通話処理のための各種制御および各種演算を行うものであり、デジタルの音声信号(送話音声データ並びに受話音声データ)に対して種々の信号処理(通話処理)を行う。記憶部5は電気的に書換可能な不揮発性の半導体メモリ(フラッシュメモリなど)からなり、第1のソフトウェアと第2のソフトウェアを記憶している。第1のソフトウェアは、アナログ信号伝送部9によりアナログ伝送方式で伝送される音声信号に対して種々の通話処理を行うための複数のプログラムの集合体からなる。また第2のソフトウェアは、伝送処理部7によりパケット伝送方式で伝送される音声信号に対して種々の通話処理を行うための複数のプログラムの集合体からなる。なお、個々のプログラムの詳細については後述する。 The call processing unit 2 includes a microprocessor, an ASIC (Application Specific Integrated Circuit) or a DSP (Digital Signal Processor) and performs various controls and various calculations for call processing. Data and received voice data) are subjected to various signal processing (call processing). The storage unit 5 includes an electrically rewritable nonvolatile semiconductor memory (flash memory or the like), and stores first software and second software. The first software is composed of a collection of a plurality of programs for performing various call processing on the audio signal transmitted by the analog signal transmission unit 9 by the analog transmission method. The second software is composed of a collection of a plurality of programs for performing various call processing on the audio signal transmitted by the packet transmission method by the transmission processing unit 7. Details of each program will be described later.
 伝送処理部7は、制御装置CTや他の住戸機Aとの間で信号幹線Ls(住戸線Ldを含む。以下、同じ。)を介したパケット伝送を行うものである。伝送処理部7は、制御部1で作成される制御信号(制御データ)を分割してパケット(制御パケット)を作成し、同じく通話処理部2で作成される送話音声信号(送話音声データ)を分割してパケット(音声パケット)を作成する。さらに伝送処理部7は、制御パケットや音声パケットを符号化するとともに符号化されたビット列を電気信号に変換(変調)して信号幹線Lsに流す。また伝送処理部7は、信号幹線Lsを流れる電気信号をビット列に変換(復調)するとともに復調されたビット列からパケット(音声パケット、制御パケット、映像パケット)を復号化する。なお、伝送処理部7では、復号化したパケットのアドレスが自己のアドレス(住戸機Aのアドレス)に一致しない場合は当該パケットを破棄し、アドレスが一致する場合は当該パケットのデータフィールドに含まれるデータが映像データ(映像信号)であれば映像処理部4に、制御データ(制御信号)であれば制御部1に、音声データ(音声信号)であれば通話処理部2に、それぞれ出力する。 The transmission processing unit 7 performs packet transmission with the control device CT and other dwelling units A via the signal trunk line Ls (including the dwelling unit line Ld, the same applies hereinafter). The transmission processing unit 7 divides the control signal (control data) created by the control unit 1 to create a packet (control packet), and the transmission voice signal (transmission voice data) also created by the call processing unit 2. ) To create a packet (voice packet). Further, the transmission processing unit 7 encodes the control packet and the voice packet, converts (modulates) the encoded bit string into an electric signal, and sends the electric signal to the signal trunk line Ls. The transmission processing unit 7 converts (demodulates) an electric signal flowing through the signal trunk line Ls into a bit string and decodes a packet (voice packet, control packet, video packet) from the demodulated bit string. The transmission processing unit 7 discards the packet if the address of the decrypted packet does not match its own address (address of the dwelling unit A), and if the address matches, it is included in the data field of the packet. If the data is video data (video signal), it is output to the video processing unit 4, if it is control data (control signal), it is output to the control unit 1, and if it is audio data (voice signal), it is output to the call processing unit 2.
 対副親通信処理部8は、制御部1で作成される副親機用の制御データを符号化し且つ周波数変調して内線接続線Lcを介して副親機Cに送信するとともに、内線接続線Lcを介して副親機Cから送信される制御信号を周波数復調し且つ復号化して得た制御データを制御部1に渡すものである。 The secondary master communication processing unit 8 encodes and frequency-modulates the control data for the secondary master created by the control unit 1 and transmits it to the secondary master C via the extension connection line Lc. Control data obtained by frequency-demodulating and decoding a control signal transmitted from the sub-master unit C via Lc is passed to the control unit 1.
 次に、本実施形態における集合住宅用インターホンシステムの動作について説明する。まず、住戸機Aとドアホン子機Bとのドアホン通話について説明する。来訪者によってドアホン子機Bの呼出釦が操作されると、ドアホン子機Bから子機接続線Lbを介して呼出信号が送信される。住戸機Aでは、呼出信号を検出した呼出検出部6が制御部1に呼出検出信号を出力する。呼出検出信号を受け取った制御部1は、スピーカ2bから呼出音を鳴動させる。ここで、ドアホン子機Bがカメラ付きである場合、呼出釦が操作された後にカメラを起動して来訪者を撮像し、その撮像した映像がドアホン子機Bから子機接続線Lbを介して伝送される。住戸機Aでは、子機接続線Lbを介して伝送されてくる映像を映像処理部4により表示部3に表示させる。そして、呼出音を聞いた住人が表示部3に表示される来訪者の映像を確認して住戸機Aに設けられている応答釦(図示せず)を操作すると、制御部1は、第1の切換部12を制御してアナログ信号伝送部9の2線側を子機接続線Lbに接続させるとともに第3の切換部14を非接続状態に切り換え、通話処理部2に対して記憶部5に記憶されている第1のソフトウェアをロードして実行するように指示する。そして、図4Aに示すように通話処理部2が第1のソフトウェアを実行して通話処理を行うことにより、住戸の住人と来訪者が住戸機A及びドアホン子機Bを用いてドアホン通話することができる。 Next, the operation of the collective housing intercom system in this embodiment will be described. First, the door phone call between the dwelling unit A and the door phone slave unit B will be described. When a call button of the door phone slave unit B is operated by a visitor, a call signal is transmitted from the door phone slave unit B via the slave unit connection line Lb. In the dwelling unit A, the call detection unit 6 that has detected the call signal outputs a call detection signal to the control unit 1. Receiving the call detection signal, the control unit 1 sounds a ringing tone from the speaker 2b. Here, when the doorphone cordless handset B is equipped with a camera, after the call button is operated, the camera is activated to image a visitor, and the captured image is transmitted from the doorphone cordless handset B via the cordless handset connection line Lb. Is transmitted. In the dwelling unit A, the video transmitted through the slave unit connection line Lb is displayed on the display unit 3 by the video processing unit 4. When the resident who hears the ringing tone confirms the video of the visitor displayed on the display unit 3 and operates a response button (not shown) provided on the dwelling unit A, the control unit 1 performs the first operation. The switching unit 12 is controlled so that the two-wire side of the analog signal transmission unit 9 is connected to the slave unit connection line Lb, and the third switching unit 14 is switched to the disconnected state. To load and execute the first software stored in the. Then, as shown in FIG. 4A, the call processing unit 2 executes the first software to perform the call processing, so that a resident of the dwelling unit and a visitor make a doorphone call using the dwelling unit A and the doorphone slave unit B. Can do.
 ここで、呼出検出信号を受け取った制御部1は、対副親通信処理部8からドアホン呼出の制御信号を送信させるとともに第3の切換部14を接続状態に切り換えることで子機接続線Lbを介して伝送されてくる映像を内線接続線Lcを介して副親機Cへ伝送する。当該制御信号を受信した副親機Cでは呼出音がスピーカから鳴動されるとともに来訪者の映像がモニタに表示される。そして、呼出音を聞いた住人がモニタに表示された来訪者の映像を確認して副親機Cの応答釦を操作すると、内線接続線Lcを介して副親機Cから住戸機Aにドアホン応答の制御信号が伝送される。住戸機Aでは、対副親通信処理部8から制御部1にドアホン応答の制御信号(制御データ)が出力され、当該制御データを受け取った制御部1が第3の切換部14の接続状態をそのまま維持する。その結果、住戸の住人と来訪者が副親機C及びドアホン子機Bを用いてドアホン通話することができる。なお、この場合には住戸機Aの通話処理部2は一切の通話処理を行わない。 Here, the control unit 1 that has received the call detection signal causes the secondary phone communication processing unit 8 to transmit the doorphone call control signal and switches the third switching unit 14 to the connected state so that the slave unit connection line Lb is connected. The video transmitted via the extension connection line Lc is transmitted to the secondary master unit C. In the secondary master unit C that has received the control signal, a ringing tone is generated from the speaker and a video of the visitor is displayed on the monitor. Then, when the resident who has heard the ringing tone confirms the image of the visitor displayed on the monitor and operates the response button of the secondary master unit C, the secondary phone C to the residential unit A via the extension connection line Lc. A response control signal is transmitted. In the dwelling unit A, the control signal (control data) of the doorphone response is output from the peer-to-subordinate communication processing unit 8 to the control unit 1, and the control unit 1 that has received the control data changes the connection state of the third switching unit 14. Keep it as it is. As a result, a resident of the dwelling unit and a visitor can make a doorphone call using the sub-master C and the doorphone slave unit B. In this case, the call processing unit 2 of the dwelling unit A does not perform any call processing.
 続いて、住戸機Aと副親機Cとの内線通話について説明する。住人によって副親機Cの内線呼出釦が操作されると、副親機Cから内線接続線Lcを介して内線呼出の制御信号が送信される。住戸機Aでは、対副親通信処理部8から制御部1に内線呼出の制御信号(制御データ)が出力される。内線呼出の制御データを受け取った制御部1は、スピーカ2bから呼出音を鳴動させる。そして、呼出音を聞いた別の住人が住戸機Aに設けられている応答釦を操作すると、制御部1は、第1の切換部12を制御してアナログ信号伝送部9の2線側を第2の切換部13に接続させるとともに第2の切換部13を制御して第1の切換部12を内線接続線Lcに接続させる。さらに制御部1は、通話処理部2に対して記憶部5に記憶されている第1のソフトウェアをロードして実行するように指示する。そして、図4Bに示すように通話処理部2が第1のソフトウェアを実行して通話処理を行うことにより、同一住戸の住人同士が住戸機A及び副親機Cを用いて内線通話することができる。 Next, an extension call between the dwelling unit A and the secondary master unit C will be described. When the extension call button of the secondary master unit C is operated by the resident, a control signal for extension call is transmitted from the secondary master unit C via the extension connection line Lc. In the dwelling unit A, an extension call control signal (control data) is output from the secondary master communication processing unit 8 to the control unit 1. Upon receiving the extension call control data, the control unit 1 causes the speaker 2b to ring. When another resident who hears the ringing tone operates the response button provided on the dwelling unit A, the control unit 1 controls the first switching unit 12 so that the two-wire side of the analog signal transmission unit 9 is switched. The second switching unit 13 is connected and the second switching unit 13 is controlled to connect the first switching unit 12 to the extension connection line Lc. Further, the control unit 1 instructs the call processing unit 2 to load and execute the first software stored in the storage unit 5. Then, as shown in FIG. 4B, when the call processing unit 2 executes the first software to perform the call processing, the residents in the same dwelling unit can make an extension call using the dwelling unit A and the sub-master unit C. it can.
 なお、一方の副親機Cから送信される内線呼出の制御信号は、住戸機Aだけでなく他方の副親機Cでも受信される。そして、当該制御信号を受信した他方の副親機Cで応答釦が操作されると、内線接続線Lcを介して2台の副親機C,Cの間に通話路が形成され、同一住戸の住人同士が各々副親機C,Cを用いて内線通話することができる。 Note that the extension call control signal transmitted from one secondary master unit C is received not only by the dwelling unit A but also by the other secondary master unit C. When the response button is operated on the other secondary master unit C that has received the control signal, a communication path is formed between the two secondary master units C and C via the extension connection line Lc, and the same dwelling unit Residents can make extension calls using the sub-masters C and C, respectively.
 ここで、通話処理部2が第1のソフトウェアを実行して行う通話処理について説明する。第1のソフトウェアには、通話方向を切り換える音声スイッチ処理のプログラムと、音響エコーを抑圧する音響側エコーキャンセラ処理のプログラムと、回線エコーを抑圧する回線側エコーキャンセラ処理のプログラムと、スピーカ2bから出力される通話相手の音声の速度(話速)を遅く又は速くする話速変換処理のプログラムとが含まれている。 Here, the call processing performed by the call processing unit 2 executing the first software will be described. The first software includes a voice switch processing program for switching the call direction, an acoustic echo canceller processing program for suppressing acoustic echo, a line echo canceller processing program for suppressing line echo, and an output from the speaker 2b. And a speech speed conversion processing program for reducing or speeding up the speed (speech speed) of the voice of the other party to be called.
 第1のソフトウェアを実行している通話処理部2は、図2に示すように音声スイッチVS、音響側エコーキャンセラEC1、回線側エコーキャンセラEC2、話速変換処理部SEを備えている。ただし、音声スイッチVS、音響側エコーキャンセラEC1、回線側エコーキャンセラEC2、話速変換処理部SEは、通話処理部2を構成するDSPなどの信号処理回路が音声スイッチ処理のプログラム、音響側エコーキャンセラ処理のプログラム、回線側エコーキャンセラ処理のプログラム、話速変換処理のプログラムをそれぞれ実行することで実現されるものである。また、図2においては第1及び第2の変換処理部10,11の図示は省略している。 The call processing unit 2 executing the first software includes a voice switch VS, an acoustic side echo canceller EC1, a line side echo canceller EC2, and a speech rate conversion processing unit SE as shown in FIG. However, the voice switch VS, the acoustic side echo canceller EC1, the line side echo canceller EC2, and the speech rate conversion processing unit SE, the signal processing circuit such as DSP constituting the speech processing unit 2 is a voice switch processing program, and the acoustic side echo canceller. This is realized by executing a processing program, a line-side echo canceller processing program, and a speech rate conversion processing program, respectively. In FIG. 2, the first and second conversion processing units 10 and 11 are not shown.
 音響側エコーキャンセラEC1は適応フィルタADF1と減算器SUB1からなる従来周知の構成を有し、スピーカ2b-マイクロホン2a間の音響結合により形成される帰還経路(音響エコー経路)HACのインパルス応答を適応フィルタADF1により適応的に同定し、参照信号(第1の変換処理部10への出力信号)から推定したエコー成分(音響エコー)を減算器SUB1により第1の変換処理部10からの入力信号(送話音声信号)から減算することでエコー成分を抑圧するものである。また、回線側エコーキャンセラEC2も適応フィルタADF2と減算器SUB2からなる従来周知の構成を有し、アナログ信号伝送部9と伝送路(子機接続線Lb又は内線接続線Lc)との間のインピーダンスの不整合による反射および相手側の拡声通話装置(ドアホン子機B若しくは副親機C)におけるスピーカ-マイクロホン間の音響結合とにより形成される帰還経路(回線エコー経路)HLINのインパルス応答を適応フィルタADF2により適応的に同定し、参照信号(第2の変換処理部11への出力信号、すなわち送話音声信号)から推定したエコー成分(回線エコー)を減算器SUB2により受話音声信号から減算することでエコー成分を抑圧するものである。 Acoustic side echo canceller EC1 has a conventionally known structure comprising an adaptive filter ADF1 a subtractor SUB1, adapting the impulse response of the feedback path (acoustic echo path) H AC formed by the acoustic coupling between the loudspeaker 2b- microphones 2a An echo component (acoustic echo) that is adaptively identified by the filter ADF1 and estimated from the reference signal (output signal to the first conversion processing unit 10) is input by the subtractor SUB1 from the first conversion processing unit 10 ( The echo component is suppressed by subtracting from the transmitted voice signal). The line-side echo canceller EC2 also has a conventionally known configuration including an adaptive filter ADF2 and a subtractor SUB2, and impedance between the analog signal transmission unit 9 and the transmission path (slave unit connection line Lb or extension connection line Lc). Adapted to the impulse response of the feedback path (line echo path) H LIN formed by the reflection due to mismatch and the acoustic coupling between the speaker and microphone in the other party's loudspeaker (doorphone slave unit B or submaster unit C) An echo component (line echo) that is adaptively identified by the filter ADF2 and estimated from the reference signal (the output signal to the second conversion processing unit 11, that is, the transmitted voice signal) is subtracted from the received voice signal by the subtractor SUB2. In this way, the echo component is suppressed.
 また、音響側エコーキャンセラEC1と回線側エコーキャンセラEC2との間に音声スイッチVSが設けてある。この音声スイッチVSは、送話音声信号を減衰させる送話側減衰器100と、受話音声信号を減衰させる受話側減衰器101と、送話側及び受話側の各減衰器100,101における減衰量(挿入損失量)を制御する挿入損失量制御部102とを具備する。挿入損失量制御部102は、総損失量算出部103と、挿入損失量分配処理部104とで構成される。総損失量算出部103は、受話側減衰器101の出力点Routから音響エコー経路HACを介して送話側減衰器100の入力点Tinへ帰還する経路(以下、「音響側帰還経路」という)の音響側帰還利得αを推定するとともに、送話側減衰器100の出力点Toutから回線エコー経路HLINを介して受話側減衰器101の入力点Rinへ帰還する経路(以下、「回線側帰還経路」という)の回線側帰還利得βを推定し、音響側及び回線側の各帰還利得α,βの推定値α',β'に基づいて閉ループに挿入すべき損失量の総和(送話側減衰器100の減衰量<挿入損失量>と受話側減衰器101の減衰量<挿入損失量>の和)を算出する。挿入損失量分配処理部104は、送話音声信号及び受話音声信号を監視して通話状態を推定し、この推定結果と総損失量算出部103の算出値に応じて送話側減衰器100及び受話側減衰器101の各減衰量(挿入損失量)の配分を決定する。 A voice switch VS is provided between the acoustic echo canceller EC1 and the line echo canceller EC2. The voice switch VS includes a transmission side attenuator 100 for attenuating a transmission voice signal, a reception side attenuator 101 for attenuating a reception voice signal, and attenuation amounts (insertion) in the transmission side and reception side attenuators 100, 101. And an insertion loss amount control unit 102 for controlling the loss amount). The insertion loss amount control unit 102 includes a total loss amount calculation unit 103 and an insertion loss amount distribution processing unit 104. Total loss amount calculation unit 103, the route for returning from the output point Rout of the receiving side attenuator 101 to the input point Tin of the transmitting end attenuator 100 via the acoustic echo path H AC (hereinafter referred to as "acoustic side feedback path" ) On the acoustic side feedback gain α, and a feedback path from the output point Tout of the transmitting side attenuator 100 to the input point Rin of the receiving side attenuator 101 via the line echo path H LIN (hereinafter referred to as `` line side The line-side feedback gain β of the feedback path) is estimated, and the total amount of loss to be inserted into the closed loop based on the estimated values α ′ and β ′ of the feedback gains α and β on the acoustic side and the line side (transmission) The sum of the attenuation amount <insertion loss amount> of the side attenuator 100 and the attenuation amount <insertion loss amount> of the reception side attenuator 101 is calculated. The insertion loss amount distribution processing unit 104 monitors the transmission voice signal and the reception voice signal to estimate the call state, and according to the estimation result and the calculated value of the total loss amount calculation unit 103, the transmission side attenuator 100 and The distribution of each attenuation amount (insertion loss amount) of the receiving side attenuator 101 is determined.
 総損失量算出部103では、整流平滑器や低域通過フィルタ等を用いて送話側減衰器100の入力信号(送話音声信号)の短時間における時間平均パワーを推定し、同じく整流平滑器や低域通過フィルタ等を用いて受話側減衰器101の出力信号(受話音声信号)の短時間における時間平均パワーを推定し、音響側帰還経路HACにて想定される最大遅延時間において受話側減衰器101の出力信号の時間平均パワーの推定値の最小値を求め、この最小値で送話側減衰器100の入力信号の時間平均パワーの推定値を除算した値を音響側帰還利得αの推定値α'とする。さらに総損失量算出部103は、整流平滑器や低域通過フィルタ等を用いて受話側減衰器101の入力信号(受話音声信号)の短時間における時間平均パワーを推定し、同じく整流平滑器や低域通過フィルタ等を用いて送話側減衰器100の出力信号(送話音声信号)の短時間における時間平均パワーを推定し、回線側帰還経路HLINにて想定される最大遅延時間において送話側減衰器100の出力信号の時間平均パワーの推定値の最小値を求め、この最小値で受話側減衰器101の入力信号(受話音声信号)の時間平均パワーの推定値を除算した値を回線側帰還利得βの推定値β'とする。そして、総損失量算出部103は音響側帰還利得α及び回線側帰還利得βの各推定値α',β'から所望の利得余裕MGを得るために必要な総損失量Ltを算出し、その値Ltを挿入損失量分配処理部104に出力する。 The total loss calculation unit 103 estimates the time-average power in a short time of the input signal (speech voice signal) of the transmission side attenuator 100 using a rectifier smoother, a low-pass filter, etc. and using a low-pass filter or the like to estimate the time average power in a short time of the output signal of the receiving side attenuator 101 (received voice signal), the receiving side in the maximum delay time assumed in acoustic side feedback path H AC The minimum value of the estimated value of the time average power of the output signal of the attenuator 101 is obtained, and the value obtained by dividing the estimated value of the time average power of the input signal of the transmission side attenuator 100 by this minimum value is the acoustic feedback gain α. The estimated value α ′. Further, the total loss calculation unit 103 estimates the time average power of the input signal (received voice signal) of the reception side attenuator 101 in a short time using a rectifier smoother, a low-pass filter, etc. Estimate the short time average power of the output signal (speech voice signal) of the transmission side attenuator 100 using a low-pass filter etc., and send it at the maximum delay time assumed in the line side feedback path H LIN . Obtain the minimum value of the estimated value of the time average power of the output signal of the talker attenuator 100, and divide the estimated value of the time average power of the input signal (received voice signal) of the receive side attenuator 101 by this minimum value. Assume that the estimated value β ′ of the line-side feedback gain β. Then, the total loss amount calculation unit 103 calculates the total loss amount Lt necessary to obtain a desired gain margin MG from the estimated values α ′ and β ′ of the acoustic side feedback gain α and the line side feedback gain β, The value Lt is output to the insertion loss amount distribution processing unit 104.
 挿入損失量分配処理部104では、送話側減衰器100の入出力信号及び受話側減衰器101の入出力信号を監視し、これらの信号のパワーレベルの大小関係並びに音声の有無などの情報から通話状態(受話状態、送話状態等)を判定するとともに、判定された通話状態に応じた割合で総損失量Ltを送話側減衰器100と受話側減衰器101に分配するように各減衰器100,101の減衰量(挿入損失量)を調整する。 The insertion loss distribution processing unit 104 monitors the input / output signals of the transmitting side attenuator 100 and the input / output signals of the receiving side attenuator 101, and determines the power level of these signals and information such as the presence or absence of speech. Attenuate the call state (receiving state, transmitting state, etc.) and distribute each loss so that the total loss Lt is distributed to the transmitting side attenuator 100 and the receiving side attenuator 101 at a rate according to the determined call state The attenuation amount (insertion loss amount) of the devices 100 and 101 is adjusted.
 ところで総損失量算出部103は、上述のように各帰還利得α,βの推定値α',β'に基づいて閉ループに挿入すべき損失量の総和を算出して適応更新する更新モード、並びに総損失量を所定の初期値に固定する固定モードの2つの動作モードを有している。そして、総損失量算出部103は、相手側通話端末との通話開始から音響側及び回線側のエコーキャンセラEC1,EC2が充分に収束するまでの期間には固定モードで動作するとともに音響側及び回線側のエコーキャンセラEC1,EC2が充分に収束した後の期間には更新モードで動作する。すなわち、総損失量算出部103は音響側帰還利得α及び回線側帰還利得βの推定値α',β'がともに通話開始から所定時間(数百ミリ秒)以上継続して所定の閾値ε(例えば、通話開始時における各推定値α',β'に対して10dB~15dB小さい値)を下回った時点で音響側及び回線側のエコーキャンセラEC1,EC2が充分に収束したものとみなし、上記時点以前には総損失量を初期値に固定する固定モードで動作し、上記時点以降には各推定値α',β'に基づいて総損失量を適応更新する更新モードに動作モードを切り換える。なお、固定モードにおける総損失量の初期値は更新モードにおいて随時更新される総損失量よりも充分に大きな値に設定される。 By the way, the total loss calculation unit 103 calculates an adaptive update by calculating the sum of loss amounts to be inserted into the closed loop based on the estimated values α ′ and β ′ of the feedback gains α and β as described above, and There are two operation modes, a fixed mode for fixing the total loss amount to a predetermined initial value. The total loss amount calculation unit 103 operates in the fixed mode during the period from the start of the call with the other party's call terminal until the echo cancellers EC1 and EC2 on the acoustic side and the line side sufficiently converge, and the acoustic side and the line In the period after the echo cancellers EC1 and EC2 on the side have sufficiently converged, it operates in the update mode. That is, the total loss amount calculation unit 103 has both the estimated values α ′ and β ′ of the acoustic feedback gain α and the line feedback gain β continuously for a predetermined time (several hundred milliseconds) from the start of a call for a predetermined threshold ε ( For example, it is considered that the echo cancellers EC1 and EC2 on the acoustic side and the line side have sufficiently converged when the values are below 10 dB to 15 dB smaller than the estimated values α ′ and β ′ at the start of the call. Before, the operation mode is switched to the update mode in which the total loss amount is adaptively updated based on the estimated values α ′ and β ′. Note that the initial value of the total loss amount in the fixed mode is set to a value sufficiently larger than the total loss amount updated as needed in the update mode.
 而して、通話開始直後の音響側及び回線側のエコーキャンセラEC1,EC2が充分に収束していない状態においては、固定モードで動作する総損失量算出部103によって充分に大きな値に設定される初期値の総損失量が閉ループに挿入されるため、不快なエコー(音響エコー並びに回線エコー)やハウリングの発生を抑制して安定した半二重通話を実現することができる。また、通話開始から時間が経過して音響側及び回線側のエコーキャンセラEC1,EC2が充分に収束した状態においては、総損失量算出部103の動作モードが固定モードから更新モードに切り換わって閉ループに挿入する総損失量が初期値よりも充分に低い値に減少するため、双方向の同時通話が実現できるものである。 Thus, in a state where the echo cancellers EC1 and EC2 on the acoustic side and the line side immediately after the start of the call are not sufficiently converged, a sufficiently large value is set by the total loss amount calculation unit 103 operating in the fixed mode. Since the initial total loss amount is inserted into the closed loop, it is possible to suppress the generation of unpleasant echoes (acoustic echoes and line echoes) and howling, and realize a stable half-duplex call. Also, in the state where the echo cancellers EC1 and EC2 on the acoustic side and the line side have sufficiently converged after the start of the call, the operation mode of the total loss calculation unit 103 is switched from the fixed mode to the update mode and closed loop. Since the total loss amount to be inserted into the value decreases to a value sufficiently lower than the initial value, two-way simultaneous calls can be realized.
 ここで、更新モードにおける総損失量算出部103の具体的な動作を図3のフローチャートを参照して説明する。 Here, the specific operation of the total loss amount calculation unit 103 in the update mode will be described with reference to the flowchart of FIG.
 総損失量算出部103は、固定モードから更新モードに移行した時点から所定のサンプリング周期で音響側帰還利得α並びに回線側帰還利得βの推定処理を実行してその推定値α'(n),β'(n)を算出し(ステップ1)、これら2つの推定値α'(n),β'(n)の積と利得余裕MGとから、閉ループの利得余裕をMG[dB]に保つために必要とされる総損失量所望値Lr(n)を下式により算出する(ステップ2)。 The total loss calculation unit 103 executes an estimation process of the acoustic side feedback gain α and the line side feedback gain β at a predetermined sampling period from the time when the fixed mode is changed to the update mode, and the estimated value α ′ (n), β ′ (n) is calculated (step 1), and the gain margin of the closed loop is maintained at MG [dB] from the product of these two estimated values α ′ (n) and β ′ (n) and the gain margin MG. The desired total loss amount Lr (n) required for the above is calculated by the following equation (step 2).
  Lr(n)=20log|α'(n)・β'(n)|+MG[dB]
なお、α'(n),β'(n),Lr(n)はそれぞれ更新モード移行時点からn回目のサンプリングによって算出された帰還利得の推定値並びに総損失量所望値を示す。さらに、総損失量算出部103は上式から算出したn回目の総損失量所望値Lr(n)と、前回(n-1回目)の総損失量Lt(n-1)、すなわち前回の処理で決定されて実際に挿入された総損失量に対して今回算出した総損失量所望値Lr(n)が大きい場合、前回の総損失量Lt(n-1)に微少な増加量Δi[dB]を加算した値を今回の総損失量Lt(n)=Lt(n-1)+Δiとし(ステップ3、ステップ4)、前回の総損失量Lt(n-1)に対して今回算出した総損失量所望値Lr(n)が小さい場合、前回の総損失量Lt(n-1)から微少な減少量Δd[dB]を減算した値を今回の総損失量Lt(n)=Lt(n-1)-Δdとする(ステップ5、ステップ6)。
Lr (n) = 20log | α '(n) · β' (n) | + MG [dB]
Note that α ′ (n), β ′ (n), and Lr (n) indicate an estimated value of feedback gain and a desired total loss amount calculated by the nth sampling from the update mode transition point, respectively. Further, the total loss amount calculation unit 103 calculates the n-th total loss amount desired value Lr (n) calculated from the above formula and the previous (n−1th) total loss amount Lt (n−1), that is, the previous processing. When the desired total loss amount Lr (n) calculated this time is larger than the total loss amount determined and actually inserted, a slight increase Δi [dB in the previous total loss amount Lt (n-1) ] Is the current total loss amount Lt (n) = Lt (n-1) + Δi (steps 3 and 4), and this time is calculated for the previous total loss amount Lt (n-1). When the total loss desired value Lr (n) is small, a value obtained by subtracting a slight decrease Δd [dB] from the previous total loss Lt (n−1) is set to the current total loss Lt (n) = Lt ( n−1) −Δd (steps 5 and 6).
 このように総損失量算出部103による総損失量の増減をΔi又はΔdの微少な値に抑えることにより、相手側通話端末(ドアホン子機B又は副親機C)との通話開始直後のように音響側及び回線側のエコーキャンセラEC1,EC2が収束に向かって活発に係数を更新しているために音響側帰還利得α及び回線側帰還利得βの変化が激しい状態においても、聴感上の違和感をなくすことができる。 In this way, by suppressing the increase / decrease in the total loss amount by the total loss calculation unit 103 to a small value of Δi or Δd, just after the start of a call with the other party's call terminal (door phone slave unit B or sub master unit C). In addition, the acoustic side and line side echo cancellers EC1 and EC2 actively update the coefficients toward convergence, so even when the acoustic side feedback gain α and the line side feedback gain β change drastically, there is a sense of discomfort in hearing. Can be eliminated.
 話速変換処理部SEは、音声(受話音声)を伸長又は圧縮することで元の音声の話速を変換するものであって、例えば、PICOLA(Pointer Interval Controlled OverLap and Add)と呼ばれる従来周知の話速変換アルゴリズムに基づき、ピッチ単位で波形の挿入または削除を行うことによって話速を変換(速く又は遅く)している。なお、「ピッチ」とは声帯の振動周期で決まる声の高さのことであって、声帯の振動周期が短いと声の高さは高くなり、振動周期が長いと声の高さは低くなる。したがって、ドアホン子機Bとのドアホン通話時や副親機Cとの内線通話時に話速変換処理部SEに話速変換処理を行わせれば、住戸機Aのスピーカ2bから鳴動される通話相手の音声の話速を、実際に通話相手が発した音声の話速よりも速く又は遅くすることができる。 The speech rate conversion processing unit SE converts the speech rate of the original speech by expanding or compressing the speech (received speech) .For example, the well-known conventionally called PICOLA (Pointer Interval Controlled OverLap and Add) Based on the speech speed conversion algorithm, the speech speed is converted (fast or slow) by inserting or deleting waveforms in pitch units. “Pitch” is the pitch of the voice determined by the vibration period of the vocal cords. If the vibration period of the vocal cords is short, the voice will be high, and if the vibration period is long, the voice will be low. . Therefore, if the speech speed conversion processing unit SE performs the speech speed conversion process during a doorphone call with the doorphone slave unit B or an extension call with the sub-master unit C, the other party of the call that is ringed from the speaker 2b of the dwelling unit A The speech speed can be made faster or slower than the speech speed actually spoken by the other party.
 次に、住戸機AとロビーインターホンLIとのインターホン通話について説明する。ロビーインターホンLIでは、来訪者がテンキースイッチ又はタッチパネルを操作して何れかの住戸の住戸番号の操作入力を受け付けると、データフィールドに当該住戸番号を格納したパケット、並びに撮像装置で撮像した来訪者の映像(映像データ)をデータフィールドに格納したパケットを伝送部より信号幹線Lsを介して制御装置CTのアドレス宛に送信(パケット伝送)する。制御装置CTは、ロビーインターホンLIからの呼出を通知するための呼出コマンドをデータフィールドに格納したパケット並びに前記映像データをデータフィールドに格納したパケットを信号幹線Lsに送出する。 Next, the intercom call between the dwelling unit A and the lobby intercom LI will be described. In the lobby intercom LI, when a visitor accepts an operation input of a dwelling unit number of any dwelling unit by operating the numeric keypad or touch panel, the packet storing the dwelling unit number in the data field and the visitor imaged by the imaging device A packet storing video (video data) in the data field is transmitted (packet transmission) from the transmission unit to the address of the control device CT via the signal trunk line Ls. The control device CT sends a packet storing a call command for notifying a call from the lobby intercom LI in the data field and a packet storing the video data in the data field to the signal trunk line Ls.
 前記住戸番号の住戸に設置されている住戸機Aでは、住戸線Ldを介して伝送処理部7で前記パケットを受信すると、当該パケットのデータフィールドに格納されている呼出コマンド(制御信号)を制御部1に出力するとともに、データフィールドに格納されている映像データを映像処理部4に出力する。制御部1は、呼出コマンドを受け取るとスピーカ2bから呼出音を鳴動させる。また映像処理部4は、伝送処理部7から受け取った映像信号を処理して表示部3に来訪者の映像を表示させる。そして、呼出音を聞いた住人が住戸機Aの表示部3に表示されている来訪者の映像を確認した後、応答釦を操作すると、制御部1は、通話処理部2に対して記憶部5に記憶されている第2のソフトウェアをロードして実行するように指示する。そして、図5Aに示すように通話処理部2が第2のソフトウェアを実行して通話処理を行うことにより、住戸の住人と来訪者が住戸機A及びロビーインターホンLIを用いてインターホン通話することができる。ここで、ロビーインターホンLIは、図5Aの左側に示すように話速変換処理部SEを除いて図5Aの右側の住戸機Aとほぼ同一の構成を有しており、説明を簡単にするため、住戸機Aの各部と共通の機能を有するものには同一の符号を付している。 In the dwelling unit A installed in the dwelling unit with the dwelling unit number, when the transmission processing unit 7 receives the packet via the dwelling unit line Ld, the paging command (control signal) stored in the data field of the packet is controlled. The video data stored in the data field is output to the video processing unit 4 while being output to the unit 1. When the control unit 1 receives the call command, the control unit 1 causes the speaker 2b to ring. The video processing unit 4 processes the video signal received from the transmission processing unit 7 and causes the display unit 3 to display the video of the visitor. When the resident who has heard the ringing tone confirms the video of the visitor displayed on the display unit 3 of the dwelling unit A and then operates the response button, the control unit 1 causes the call processing unit 2 to store the storage unit. The second software stored in 5 is instructed to be loaded and executed. Then, as shown in FIG. 5A, when the call processing unit 2 executes the second software to perform the call processing, the resident of the dwelling unit and the visitor can make an interphone call using the dwelling unit A and the lobby intercom LI. it can. Here, the lobby interphone LI has almost the same configuration as the right side dwelling unit A in FIG. 5A except for the speech speed conversion processing unit SE as shown on the left side in FIG. Those having the same functions as those of the units of the dwelling unit A are given the same reference numerals.
 続いて、住戸機Aと管理室装置Xとのインターホン通話について説明する。管理室装置Xでは、管理人がテンキースイッチ又はタッチパネルを操作して何れかの住戸の住戸番号の操作入力を受け付けると、データフィールドに当該住戸番号を格納したパケットを伝送部より信号幹線Lsを介して制御装置CTのアドレス宛に送信(パケット伝送)する。制御装置CTは、管理室装置Xからの呼出を通知するための呼出コマンドをデータフィールドに格納したパケットを信号幹線Lsに送出する。 Subsequently, an intercom call between the dwelling unit A and the management room device X will be described. In the management room device X, when the manager operates the numeric keypad or the touch panel and receives the operation input of the dwelling unit number of any dwelling unit, the packet storing the dwelling unit number in the data field is transmitted from the transmission unit via the signal trunk line Ls. To the address of the control device CT (packet transmission). The control device CT sends a packet storing a call command for notifying a call from the management room device X in the data field to the signal trunk line Ls.
 前記住戸番号の住戸に設置されている住戸機Aでは、住戸線Ldを介して伝送処理部7で前記パケットを受信すると、当該パケットのデータフィールドに格納されている呼出コマンド(制御信号)を制御部1に出力する。制御部1は、呼出コマンドを受け取るとスピーカ2bから呼出音を鳴動させる。そして、呼出音を聞いた住人が応答釦を操作すると、制御部1は、通話処理部2に対して記憶部5に記憶されている第2のソフトウェアをロードして実行するように指示する。そして、図5Bに示すように通話処理部2が第2のソフトウェアを実行して通話処理を行うことにより、住戸の住人と管理人が住戸機A及び管理室装置Xを用いてインターホン通話することができる。ここで、管理室装置Xは、図5Bの左側に示すように話速変換処理部SEを除いて図5Bの右側の住戸機Aとほぼ同一の構成を有しており、説明を簡単にするため、住戸機Aの各部と共通の機能を有するものには同一の符号を付している。 In the dwelling unit A installed in the dwelling unit with the dwelling unit number, when the transmission processing unit 7 receives the packet via the dwelling unit line Ld, the paging command (control signal) stored in the data field of the packet is controlled. Output to part 1. When the control unit 1 receives the call command, the control unit 1 causes the speaker 2b to ring. When the resident who hears the ringing tone operates the response button, the control unit 1 instructs the call processing unit 2 to load and execute the second software stored in the storage unit 5. Then, as shown in FIG. 5B, the call processing unit 2 executes the second software to perform the call processing, so that the resident and the manager of the dwelling unit make an interphone call using the dwelling unit A and the management room device X. Can do. Here, as shown on the left side of FIG. 5B, the management room apparatus X has substantially the same configuration as the dwelling unit A on the right side of FIG. Therefore, the same code | symbol is attached | subjected to what has a common function with each part of the dwelling machine A. FIG.
 但し、ロビーインターホンLIや管理室装置Xからの呼出に対して副親機Cで応答することも可能である。そして、ロビーインターホンLIや管理室装置Xからの呼出に対して副親機Cで応答した場合、図5Dに示すように住戸機Aの通話処理部2が第2のソフトウェアを実行して通話処理を行うことにより、住戸の住人と来訪者若しくは管理人が副親機C及びロビーインターホンLI若しくは管理室装置Xを用いてインターホン通話することができる。 However, it is also possible for the secondary master unit C to respond to a call from the lobby intercom LI or the management room device X. When the secondary master unit C responds to a call from the lobby interphone LI or the management room device X, the call processing unit 2 of the dwelling unit A executes the second software as shown in FIG. By doing this, the residents of the dwelling unit and the visitors or managers can make interphone calls using the sub-master C and the lobby intercom LI or the management room device X.
 さらに、異なる住戸に設置された住戸機A同士のインターホン通話について説明する。住戸機Aでは、住人がテンキースイッチを操作して別の住戸の住戸番号の操作入力を受け付けると、データフィールドに当該住戸番号を格納したパケットを伝送部より信号幹線Lsを介して制御装置CTのアドレス宛に送信(パケット伝送)する。制御装置CTは、住戸機Aからの呼出を通知するための呼出コマンドをデータフィールドに格納したパケットを信号幹線Lsに送出する。 Furthermore, the intercom call between the dwelling units A installed in different dwelling units will be described. In the dwelling unit A, when the resident operates the numeric keypad and receives an operation input of the dwelling unit number of another dwelling unit, a packet storing the dwelling unit number in the data field is transmitted from the transmission unit via the signal trunk line Ls of the control device CT. Send to address (packet transmission). The control device CT sends a packet storing a call command for notifying the call from the dwelling unit A in the data field to the signal trunk line Ls.
 前記住戸番号の住戸に設置されている別の住戸機Aでは、住戸線Ldを介して伝送処理部7で前記パケットを受信すると、当該パケットのデータフィールドに格納されている呼出コマンド(制御信号)を制御部1に出力する。制御部1は、呼出コマンドを受け取るとスピーカ2bから呼出音を鳴動させる。そして、呼出音を聞いた住人が応答釦を操作すると、制御部1は、通話処理部2に対して記憶部5に記憶されている第2のソフトウェアをロードして実行するように指示する。そして、図5Cに示すようにそれぞれの住戸の住戸機Aにおける通話処理部2が第2のソフトウェアを実行して通話処理を行うことにより、異なる住戸の住人同士がそれぞれの住戸機Aを用いてインターホン通話することができる。 In another dwelling unit A installed in the dwelling unit with the dwelling unit number, when the transmission processing unit 7 receives the packet via the dwelling line Ld, a call command (control signal) stored in the data field of the packet Is output to the control unit 1. When the control unit 1 receives the call command, the control unit 1 causes the speaker 2b to ring. When the resident who hears the ringing tone operates the response button, the control unit 1 instructs the call processing unit 2 to load and execute the second software stored in the storage unit 5. Then, as shown in FIG. 5C, the call processing unit 2 in the dwelling unit A of each dwelling unit executes the second software to perform call processing, so that residents in different dwelling units use the dwelling unit A. Intercom calls can be made.
 ここで、通話処理部2が第2のソフトウェアを実行して行う通話処理について説明する。第2のソフトウェアには、通話方向を切り換える音声スイッチ処理のプログラムと、音響エコーを抑圧する音響側エコーキャンセラ処理のプログラムと、残留エコーを抑圧するエコーサプレッサ処理のプログラムと、パケット伝送に伴うパケット損失に起因した音声データの欠落を補償する音声データ欠落補償処理のプログラムと、パケット伝送に伴う遅延及び揺らぎ(ジッタ)を吸収する揺らぎ吸収処理のプログラムと、スピーカ2bから出力される通話相手の音声の速度(話速)を遅く又は速くする話速変換処理のプログラムとが含まれている。 Here, the call processing performed by the call processing unit 2 by executing the second software will be described. The second software includes a voice switch processing program for switching the call direction, an acoustic echo canceller processing program for suppressing acoustic echo, an echo suppressor processing program for suppressing residual echo, and packet loss associated with packet transmission. Audio data loss compensation processing program that compensates for loss of audio data due to noise, a fluctuation absorption processing program that absorbs delay and fluctuation (jitter) associated with packet transmission, and the voice of the other party's voice output from the speaker 2b And a speech speed conversion processing program for decreasing or increasing the speed (speech speed).
 第2のソフトウェアを実行している通話処理部2は、図6に示すように音声スイッチVS、音響側エコーキャンセラEC1、エコーサプレッサES、話速変換処理部SE、音声データ欠落補償部VC、揺らぎ吸収処理部JAを備えている。ただし、音声スイッチVS、音響側エコーキャンセラEC1、エコーサプレッサES、話速変換部SE、音声データ欠落補償部VC、揺らぎ吸収処理部JAは、通話処理部2を構成するDSPなどの信号処理回路が音声スイッチ処理のプログラム、音響側エコーキャンセラ処理のプログラム、エコーサプレッサ処理のプログラム、話速変換処理のプログラム、音声データ欠落補償処理のプログラム、揺らぎ吸収処理のプログラムをそれぞれ実行することで実現されるものである。また、図6においては第1及び第2の変換処理部10,11の図示は省略している。 As shown in FIG. 6, the call processing unit 2 executing the second software includes a voice switch VS, an acoustic echo canceller EC1, an echo suppressor ES, a speech speed conversion processing unit SE, a voice data loss compensation unit VC, and fluctuations. Absorption processing unit JA is provided. However, the voice switch VS, the acoustic side echo canceller EC1, the echo suppressor ES, the speech speed conversion unit SE, the voice data loss compensation unit VC, and the fluctuation absorption processing unit JA are signal processing circuits such as a DSP constituting the call processing unit 2. Realized by executing a program for voice switch processing, a program for acoustic echo canceller processing, a program for echo suppressor processing, a program for speech rate conversion processing, a program for voice data loss compensation processing, and a program for fluctuation absorption processing, respectively It is. In FIG. 6, the first and second conversion processing units 10 and 11 are not shown.
 音響側エコーキャンセラEC1は、第1のソフトウェアが実行された場合の音響側エコーキャンセラEC1と共通の構成を有しているので、詳細な構成の図示は省略する。また、音声スイッチVSについても、第1のソフトウェアが実行された場合の音声スイッチVSと共通の構成を有しているので、詳細な構成の図示は省略する。但し、第2のソフトウェアにおける音声スイッチVSは、音響側帰還利得αの推定値α'の減少量に応じて総損失量算出部103が算出する総損失量を減少させる点で第1のソフトウェアにおける音声スイッチVSと異なっている。つまり、アナログ伝送方式に対応した第1のソフトウェアにおける音声スイッチVSでは、総損失量算出部103が音響側帰還利得αと回線側帰還利得βの2種類の帰還利得を考慮して総損失量を算出する必要がある。一方、パケット伝送方式においては帰還経路が形成されないために回線側帰還利得βを考慮する必要が無い。故に、第2のソフトウェアにおける音声スイッチVSでは、上述のように音響側帰還利得αの推定値α'の減少量に応じて総損失量算出部103が算出する総損失量を減少させることにより、双方向の同時通話をより確実に実現することができる。 Since the acoustic side echo canceller EC1 has the same configuration as the acoustic side echo canceller EC1 when the first software is executed, a detailed illustration of the configuration is omitted. Also, the voice switch VS has the same configuration as the voice switch VS when the first software is executed, and therefore detailed illustration of the configuration is omitted. However, the voice switch VS in the second software is different from the first software in that the total loss amount calculated by the total loss amount calculation unit 103 is reduced according to the reduction amount of the estimated value α ′ of the acoustic feedback gain α. It is different from the voice switch VS. That is, in the voice switch VS in the first software corresponding to the analog transmission method, the total loss calculation unit 103 considers two types of feedback gains of the acoustic side feedback gain α and the line side feedback gain β and calculates the total loss amount. It is necessary to calculate. On the other hand, in the packet transmission system, since no feedback path is formed, there is no need to consider the line side feedback gain β. Therefore, in the voice switch VS in the second software, by reducing the total loss amount calculated by the total loss amount calculation unit 103 according to the reduction amount of the estimated value α ′ of the acoustic feedback gain α as described above, A two-way simultaneous call can be realized more reliably.
 エコーサプレッサESは、送話音声信号の信号経路における伝送処理部7と音声スイッチVSの間に設けられて残留エコー(音響側エコーキャンセラEC1で抑圧できなかった音響エコー。以下同じ。)を減衰させるものである。つまり、音声データをパケットに分割して伝送するパケット伝送方式においては、アナログ伝送方式に比較して伝送遅延が長くなり、音響側エコーキャンセラEC1で抑圧しきれない残留エコーが発生してしまうので、エコーサプレッサESによってエコー抑圧量を増やす必要がある。なお、エコーサプレッサESは残留エコーを効果的に減衰する一方で、送出すべき音声信号(送話音声信号)は減衰させない必要がある。 The echo suppressor ES is provided between the transmission processing unit 7 and the voice switch VS in the signal path of the transmission voice signal, and attenuates residual echo (acoustic echo that could not be suppressed by the acoustic echo canceller EC1, the same applies hereinafter). Is. In other words, in the packet transmission system that divides voice data into packets and transmits it, the transmission delay is longer than in the analog transmission system, and a residual echo that cannot be suppressed by the acoustic echo canceller EC1 occurs. It is necessary to increase the amount of echo suppression by the echo suppressor ES. Note that the echo suppressor ES effectively attenuates the residual echo, while the audio signal to be transmitted (transmitted audio signal) needs not to be attenuated.
 エコーサプレッサESは音声スイッチVSと連動して送話音声信号を減衰させており、具体的には、図7のフローチャートに示すように動作する。すなわち、エコーサプレッサESは音声スイッチVSの状態(挿入損失量分配処理部104による通話状態<受話状態又は送話状態>の推定結果)を常に監視し(ステップ1)、音声スイッチVSが受話状態である場合は信号経路に送出すべき送話音声信号は無いとみなし、入力信号に所定の減衰係数を掛ける(乗算する)ことで入力信号を減衰させて出力する(ステップ2)。一方、音声スイッチVSが受話状態でない場合、エコーサプレッサESは消去すべき残留エコーが無いかまたは送出すべき送話音声信号があると判断し、入力信号に減衰係数を掛けないことで入力信号を減衰せずにそのままのレベルで出力する(ステップ3)。 The echo suppressor ES attenuates the transmitted voice signal in conjunction with the voice switch VS, and specifically operates as shown in the flowchart of FIG. That is, the echo suppressor ES always monitors the state of the voice switch VS (the estimation result of the call state <receiving state or transmitting state> by the insertion loss distribution processing unit 104) (step 1), and the voice switch VS is in the receiving state. In some cases, it is assumed that there is no transmission voice signal to be transmitted to the signal path, and the input signal is attenuated by being multiplied (multiplied) by the input signal (step 2). On the other hand, when the voice switch VS is not in the reception state, the echo suppressor ES determines that there is no residual echo to be canceled or there is a transmission voice signal to be transmitted, and does not apply an attenuation coefficient to the input signal. The output is output as it is without being attenuated (step 3).
 而して、相手側の通話装置(ロビーインターホンLI、管理室装置X、他の住戸機A)との間で伝送される音声に伝送遅延が生じる場合においても、当該伝送遅延に起因して送話音声信号の信号経路に発生する残留エコーをエコーサプレッサESによって減衰させることができる。その結果、パケット伝送方式においても双方向の同時通話が確実に実現できる。ここで、音声スイッチVSが受話状態でない場合、例えば、送話状態である場合にエコーサプレッサESが送話音声信号を減衰してしまうと、近端側話者(住戸機Aで通話する住人)の発した音声が誤って減衰されてしまうことにより、相手側の通話装置から聞こえる近端側話者の音声が大きくなったり、小さくなったりする音声の抑揚が生じてしまう虞がある。しかしながら本実施形態では、音声スイッチVSが受話状態であるときにエコーサプレッサESが入力信号を減衰させ、音声スイッチVSが受話状態でないときにはエコーサプレッサESが入力信号を減衰させないので、上述のような音声の抑揚を生じさせることなく、通話時の不快なエコー(残留エコー)のみを減衰させることが可能である。なお、話速変換処理部SEについては、第1のソフトウェアに含まれる話速変換処理のプログラムと同一のプログラムを実行して実現されるものであるから説明は省略する。 Thus, even when a transmission delay occurs in the audio transmitted between the other party's communication devices (lobby interphone LI, management room device X, and other dwelling unit A), the transmission is caused by the transmission delay. The residual echo generated in the signal path of the speech signal can be attenuated by the echo suppressor ES. As a result, two-way simultaneous calls can be reliably realized even in the packet transmission method. Here, when the voice switch VS is not in the reception state, for example, when the echo suppressor ES attenuates the transmission voice signal in the transmission state, the near-end speaker (resident who talks on the dwelling unit A). May be attenuated inadvertently, resulting in an increase in the volume of the near-end speaker that can be heard from the other party's call device. However, in this embodiment, the echo suppressor ES attenuates the input signal when the voice switch VS is in the receiving state, and the echo suppressor ES does not attenuate the input signal when the voice switch VS is not in the receiving state. It is possible to attenuate only an unpleasant echo (residual echo) during a call without causing any inflection. Note that the speech speed conversion processing unit SE is realized by executing the same program as the speech speed conversion processing program included in the first software, and thus the description thereof is omitted.
 図9は、音声データ欠落補償処理(以下、「補償処理」と略す。)の基本原理を説明するための音声信号の波形図である。図9において縦軸は伝送処理部7から通話処理部2に入力される受話音声信号の強度を示し、横軸は時間を示している。音声パケットの受信に失敗し、パケットロス(音声データの欠落)が発生すると、音声データ欠落補償処理部VCは、パケットロスが発生する直前の所定期間の受話音声信号を基準信号(テンプレート)として設定する。 FIG. 9 is a waveform diagram of an audio signal for explaining the basic principle of audio data loss compensation processing (hereinafter abbreviated as “compensation processing”). In FIG. 9, the vertical axis indicates the intensity of the received voice signal input from the transmission processing unit 7 to the call processing unit 2, and the horizontal axis indicates time. When reception of a voice packet fails and a packet loss (voice data loss) occurs, the voice data loss compensation processing unit VC sets the received voice signal of a predetermined period immediately before the packet loss as a reference signal (template). To do.
 次に、このテンプレートを受話音声信号に対してパケットロスが発生した時点から過去に向けてスライドさせるとともに、テンプレートと受話音声信号との相関演算を実行し、パケットロスが発生する直前の受話音声信号の基本周期(ピッチ)を検出する。そして、パケットロスが発生してから過去に遡って、1ピッチ分の受話音声信号を取り出し、その受話音声信号をロス期間に繰り返し当てはめることでロス期間(音声データが欠落している期間。以下、同じ。)を補償する。ここで、1ピッチ分の受話音声信号でロス期間を補償するのは、話者が例えば「あ」という音声を発した場合、この「あ」の音声は、20msec程度に区切られて(パケット化されて)1つの音声パケットにのせて送信されるため、ロス期間ではパケットロスが発生する直前の1ピッチ分の受話音声信号が繰り返されている可能性が高いからである。 Next, the template is slid toward the past from the time when the packet loss occurs with respect to the reception voice signal, and the correlation calculation between the template and the reception voice signal is performed, and the reception voice signal immediately before the packet loss occurs The basic period (pitch) is detected. Then, from the occurrence of packet loss, the received voice signal for one pitch is extracted retroactively, and the received voice signal is repeatedly applied to the loss period, whereby a loss period (period in which voice data is missing. The same.) Here, the loss period is compensated by the received voice signal for one pitch. For example, when the speaker utters the voice “A”, the voice “A” is divided into about 20 msec (packetization). This is because the received voice signal for one pitch immediately before the occurrence of the packet loss is likely to be repeated in the loss period because it is transmitted on one voice packet.
 音声データ欠落補償処理部VCは、図8に示すように遅延揺らぎ吸収バッファ(ジッタバッファ)20、タイマ21、パケットロス検出部22、検出処理部23、補償処理部24を備えている。但し、これらの各部は、通話処理部2のDSPで音声データ欠落補償処理プログラムを実行することによって実現されている。 The audio data loss compensation processing unit VC includes a delay fluctuation absorbing buffer (jitter buffer) 20, a timer 21, a packet loss detection unit 22, a detection processing unit 23, and a compensation processing unit 24 as shown in FIG. However, each of these units is realized by executing a voice data loss compensation processing program by the DSP of the call processing unit 2.
 ここで、音声パケットのヘッダには、元の音声信号を分割(パケット化)する際に順番に割り当てられる番号(シーケンス番号)が格納されており、音声パケットの音声データ(受話音声信号)をシーケンス番号の順番通りにつなぎ合わせることで元の音声信号を復元することができる。そして、伝送処理部7はシーケンス番号に従って、受信した受話音声信号(受話音声データ)を時系列順にジッタバッファ20に出力する。なお、音声パケットのヘッダにはシーケンス番号の他にタイムスタンプが含まれている。シーケンス番号は音声パケットの送信順序を示し、タイムスタンプは、元の音声波形における音声信号の相対的な位置を示している。 Here, in the header of the voice packet, a number (sequence number) assigned in order when the original voice signal is divided (packetized) is stored, and the voice data (received voice signal) of the voice packet is sequenced. The original audio signal can be restored by connecting them in the order of the numbers. Then, the transmission processing unit 7 outputs the received received voice signal (received voice data) to the jitter buffer 20 in chronological order according to the sequence number. The voice packet header includes a time stamp in addition to the sequence number. The sequence number indicates the transmission order of the voice packets, and the time stamp indicates the relative position of the voice signal in the original voice waveform.
 ジッタバッファ20は、伝送処理部7から出力された受話音声データを一旦保持し、所定時間遅延させて検出処理部23に出力することで音声パケットの遅延ゆらぎを吸収する。 The jitter buffer 20 temporarily holds the received voice data output from the transmission processing unit 7, delays it for a predetermined time, and outputs it to the detection processing unit 23 to absorb the delay fluctuation of the voice packet.
 タイマ21は、パケットロス検出部22がパケットロスを検出する際に用いられる。パケットロス検出部22は、ジッタバッファ20が検出処理部23に受話音声データを出力した時点でタイマ21の計時を開始させ、ジッタバッファ20が次の受話音声データを出力する前に、タイマ21による計時時間がパケットロスが発生したと想定される所定時間を超えた場合、パケットロスが発生したと判定する。 The timer 21 is used when the packet loss detection unit 22 detects a packet loss. The packet loss detection unit 22 starts the timer 21 timing when the jitter buffer 20 outputs the reception voice data to the detection processing unit 23, and before the jitter buffer 20 outputs the next reception voice data, the timer 21 If the measured time exceeds a predetermined time in which packet loss is assumed to occur, it is determined that packet loss has occurred.
 検出処理部23は、パケットロス検出部22によりパケットロスが検出された場合、ジッタバッファ20から出力された受話音声データに対して基本周期(ピッチ)の検出処理を行い、パケットロス検出部22によりパケットロスが検出されなかった場合、受話音声データに対して何も行わない。なお、検出処理部23は、過去一定期間の受話音声データを保持している。 When a packet loss is detected by the packet loss detection unit 22, the detection processing unit 23 performs a basic period (pitch) detection process on the received voice data output from the jitter buffer 20, and the packet loss detection unit 22 If no packet loss is detected, nothing is performed on the received voice data. The detection processing unit 23 holds received voice data for a certain period in the past.
 ここで、検出処理部23は、テンプレート設定部23a及びピッチ検出部23bを備えている。テンプレート設定部23aは、パケットロスが発生したロス発生時点から過去に向けて所定時間幅の受話音声データをテンプレートとして設定する。ここで、テンプレート設定部23aは、ピッチ検出部23bがテンプレートのスライド量を増大させるにつれてテンプレートの前記時間幅を増大させる。 Here, the detection processing unit 23 includes a template setting unit 23a and a pitch detection unit 23b. The template setting unit 23a sets received voice data having a predetermined time width as a template from the loss occurrence time to the past when the packet loss has occurred. Here, the template setting unit 23a increases the time width of the template as the pitch detection unit 23b increases the slide amount of the template.
 ピッチ検出部23bは、テンプレート設定部23aにより設定されたテンプレートを受話音声データに対してロス発生時点から過去に向けてスライドさせ、テンプレートと受話音声データとの相互相関を求め、テンプレートと受話音声データとの相関ピークが最も強く現れたときのスライド量からロス発生時点の直前の受話音声信号のピッチを検出する。 The pitch detection unit 23b slides the template set by the template setting unit 23a toward the past from the point of occurrence of loss with respect to the reception voice data, obtains the cross-correlation between the template and the reception voice data, and calculates the template and the reception voice data. The pitch of the received voice signal immediately before the point of occurrence of loss is detected from the amount of slide when the correlation peak with the maximum appears.
 図10は、テンプレート設定部23a及びピッチ検出部23bの処理を説明するための受話音声信号の波形図である。なお、図10に示す縦軸は受話音声信号の強度を示し、横軸は時間をサンプル数で示したものである。また、図10に示すテンプレートTJは従来の補償処理に使用されていたテンプレートを示している。 FIG. 10 is a waveform diagram of a received voice signal for explaining the processing of the template setting unit 23a and the pitch detection unit 23b. In addition, the vertical axis | shaft shown in FIG. 10 shows the intensity | strength of a received voice signal, and the horizontal axis shows time by the number of samples. A template TJ shown in FIG. 10 indicates a template used in the conventional compensation process.
 パケットロスが発生すると、従来では、例えば、ロス発生時点RTから過去の所定期間分の受話音声信号をテンプレートTJとして設定する。そして、このテンプレートTJを受話音声信号に対してロス発生時点RTから過去に向けてスライドさせることで、受話音声信号とテンプレートTJの相互相関を求め、最も強い相関ピークが得られたときのテンプレートTJのスライド量から受話音声信号のピッチを検出していた。 When a packet loss occurs, conventionally, for example, a received voice signal for a predetermined period in the past from the loss occurrence time RT is set as a template TJ. Then, by sliding the template TJ toward the past from the loss occurrence time RT with respect to the received voice signal, the cross-correlation between the received voice signal and the template TJ is obtained, and the template TJ when the strongest correlation peak is obtained. The pitch of the received voice signal was detected from the slide amount.
 図11は、従来のテンプレートTJを用いたときのテンプレートTJと受話音声信号との相関値の演算結果を示したグラフである。なお、図11においては、従来周知である平均振幅差関数(Average Magnitude Difference Function)を用いて相関値が算出されている。また、図11において、縦軸は相関値を示し、横軸はロス発生時点RTを0としたときの時間をサンプル数で示したものである。また、図11はAMDFによる相関値であるため、値が小さいほど受話音声信号とテンプレートTJとの相関が強い。 FIG. 11 is a graph showing the calculation result of the correlation value between the template TJ and the received voice signal when the conventional template TJ is used. In FIG. 11, the correlation value is calculated using a conventionally known average amplitude difference function (Average (Magnitude Difference Function). In FIG. 11, the vertical axis indicates the correlation value, and the horizontal axis indicates the time when the loss occurrence time RT is 0 as the number of samples. Further, since FIG. 11 shows the correlation value by AMDF, the smaller the value, the stronger the correlation between the received voice signal and the template TJ.
 図11では、まず、37サンプルの時点で下に凸の相関ピークPK1が現れ、次に、47サンプルの時点で下に凸の相関ピークPK2が現れ、以後、およそ37サンプルの周期で下に凸の相関ピークが繰り返し現れている。そして、相関ピークPK1の方が相関ピークPK2よりも小さく現れている。そのため、従来の手法では37サンプルが受話音声信号のピッチとして検出されてしまう。 In FIG. 11, first, a downwardly-correlated correlation peak PK1 appears at the time of 37 samples, and then a downwardly-correlated correlation peak PK2 appears at the time of 47 samples, and thereafter convex downward at a period of approximately 37 samples. The correlation peak of appears repeatedly. The correlation peak PK1 appears smaller than the correlation peak PK2. Therefore, in the conventional method, 37 samples are detected as the pitch of the received voice signal.
 一方、図10に示すようにロス発生時点RTの直前の受話音声信号のピッチは、47サンプルである。そのため、従来の手法では、ロス発生時点RTの直前の受話音声信号のピッチが精度良く検出されていないことが分かる。 On the other hand, as shown in FIG. 10, the pitch of the received voice signal immediately before the loss occurrence time RT is 47 samples. Therefore, it can be seen that in the conventional method, the pitch of the received voice signal immediately before the loss occurrence time RT is not accurately detected.
 これは、テンプレートTJの時間幅が47サンプルより遙かに大きく、テンプレートTJには検出対象となるピッチが47サンプルの受話音声信号は1周期分しか含まれていないが、検出対象でないピッチが37サンプルの受話音声信号は3周期分も含まれているため、37サンプルで強い相関ピークが現れたことが原因と考えられる。 This is because the time width of the template TJ is much larger than 47 samples, and the template TJ includes only one period of the received voice signal whose pitch to be detected is 47 samples, but the pitch that is not to be detected is 37. Since the sample received voice signal includes three periods, it is considered that a strong correlation peak appeared at 37 samples.
 この場合、ロス発生時点RTから過去に遡って37サンプル分の受話音声信号を取り出し、この受話音声信号をロス期間に繰り返し当てはめることで、補償処理が行われる。 In this case, 37 samples of received voice signals are extracted retroactively from the loss occurrence time RT, and compensation processing is performed by repeatedly applying the received voice signals to the loss period.
 そのため、ロス期間の波形とロス期間以外の波形とを滑らかに繋ぐことが困難となり、補償処理を精度良く行うことが困難となってしまう。 Therefore, it is difficult to smoothly connect the waveform of the loss period and the waveform other than the loss period, and it is difficult to perform the compensation process with high accuracy.
 一方、テンプレートの時間幅が47サンプルより小さい場合、47サンプルのピッチを検出することはできない。 On the other hand, if the template time width is smaller than 47 samples, the pitch of 47 samples cannot be detected.
 そこで、本実施形態における検出処理部23では、図10に示すようにテンプレートTMのスライド量を増大するにつれて、テンプレートTMの時間幅を増大している。 Therefore, in the detection processing unit 23 in the present embodiment, the time width of the template TM is increased as the slide amount of the template TM is increased as shown in FIG.
 そのため、例えば図10の3段目に示すテンプレートTMのように、ある程度テンプレートTMをスライドさせたとき、そのテンプレートには、ほぼ検出対象となる47サンプルの受話音声信号のみが含まれるようになる。一方、図10の4段目のテンプレートTMにおいては、ピッチが47サンプルの受話音声信号に加えて、ピッチが37サンプルの受話音声信号も含まれている。そのため、3段目のテンプレートTMと受話音声信号との相関の方が、4段目のテンプレートTMと受話音声信号との相関よりも強く表れ、ロス発生時点RTの直前の受話音声信号のピッチを精度良く検出することが可能となる。 Therefore, for example, when the template TM is slid to some extent as in the template TM shown in the third row of FIG. 10, the template includes only 47 samples of received voice signals that are to be detected. On the other hand, the template TM at the fourth stage in FIG. 10 includes a received voice signal with a pitch of 37 samples in addition to a received voice signal with a pitch of 47 samples. Therefore, the correlation between the third-stage template TM and the received voice signal is stronger than the correlation between the fourth-stage template TM and the received voice signal, and the pitch of the received voice signal immediately before the loss occurrence time RT is increased. It becomes possible to detect with high accuracy.
 ここで、ピッチ検出部23bは、相関演算として、例えば式(1)に示すAMDFを採用することが好ましい。 Here, it is preferable that the pitch detection unit 23b adopts, for example, AMDF shown in the equation (1) as the correlation calculation.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 但し、φ(τ)は相関値、NはテンプレートTMの時間幅、x(j)はテンプレートTM、x(j-τ)は受話音声信号、k+1はテンプレートTMの開始点、aは予め定められた係数、τはテンプレートTMのスライド量、jは受話音声信号の各サンプリング点のサンプリング番号をそれぞれ示している。 Where φ (τ) is the correlation value, N is the time width of the template TM, x (j) is the template TM, x (j−τ) is the received voice signal, k + 1 is the starting point of the template TM, and a is in advance The determined coefficient, τ indicates the slide amount of the template TM, and j indicates the sampling number of each sampling point of the received voice signal.
 また、テンプレート設定部23aは、テンプレートTMのスライド量が所定のスライド基準値になるまで、テンプレートTMの時間幅を所定の初期時間幅に設定することが好ましい。 Further, it is preferable that the template setting unit 23a sets the time width of the template TM to a predetermined initial time width until the slide amount of the template TM reaches a predetermined slide reference value.
 こうすることで、テンプレートTMのスライド量が比較的小さい場合は、テンプレートTMの時間幅が初期時間幅に設定され、スライド量が小さい場合であってもテンプレートTMの時間幅を一定の大きさ以上確保することが可能となり、テンプレートTMと受話音声信号(入力信号)の間の相関をより精度良く求めることができる。 By doing this, when the slide amount of the template TM is relatively small, the time width of the template TM is set to the initial time width, and even when the slide amount is small, the time width of the template TM is larger than a certain amount. The correlation between the template TM and the received voice signal (input signal) can be obtained with higher accuracy.
 更に、テンプレートTMのスライド量がスライド基準値になるまで、テンプレートTMの時間幅は初期時間幅に設定されるが、この初期時間幅を比較的短くすることで、計算量を少なくすることができる。 Furthermore, the time width of the template TM is set to the initial time width until the slide amount of the template TM reaches the slide reference value, but the amount of calculation can be reduced by relatively shortening the initial time width. .
 なお、初期時間幅としては、想定される受話音声信号のピッチの最小値程度を採用することが好ましい。また、スライド基準値としては、例えば初期時間幅を採用すればよい。 Note that, as the initial time width, it is preferable to adopt the assumed minimum value of the pitch of the received voice signal. As the slide reference value, for example, an initial time width may be adopted.
 図12は、テンプレート設定部23a及びピッチ検出部23bの処理を説明する図である。図12に示す直線上の各点は受話音声信号のサンプリング点を示している。また、右端のサンプリング点はロス発生時点RTを示し、各サンプリング点は、左に向かうにつれて過去のサンプリング点を示している。また、ロス発生時点RTを0番目のサンプリング点とする。受話音声信号のピッチは、短い場合で3msec程度であり、サンプリング周波数が8kHzとすると、24サンプルに相当する。したがって、初期時間幅として、例えば24サンプルとすればよいが、図12では、説明の便宜上、テンプレートTMの初期時間幅を4とし、a=1とし、スライド基準値を5とする。 FIG. 12 is a diagram for explaining processing of the template setting unit 23a and the pitch detection unit 23b. Each point on the straight line shown in FIG. 12 indicates a sampling point of the received voice signal. The rightmost sampling point indicates a loss occurrence time RT, and each sampling point indicates a past sampling point toward the left. The loss occurrence time RT is set as the 0th sampling point. The pitch of the received voice signal is about 3 msec in a short case, and if the sampling frequency is 8 kHz, it corresponds to 24 samples. Therefore, the initial time width may be 24 samples, for example. In FIG. 12, for convenience of explanation, the initial time width of the template TM is set to 4, a = 1, and the slide reference value is set to 5.
 まず、パケットロスが発生すると、ピッチ検出部23bは、τ=0に設定し、テンプレートTMの初期時間幅が4であるため、ロス発生時点RTから左に4番目のサンプリング点を基準サンプリング点kとして設定し、kからロス発生時点RTに向かうにつれて、1ずつ増えるように各サンプリング点にサンプリング番号を付与し、kから過去に向かうにつれて、1ずつ減少するように各サンプリング点にサンプリング番号を付与する。 First, when a packet loss occurs, the pitch detection unit 23b sets τ = 0 and the initial time width of the template TM is 4. Therefore, the fourth sampling point on the left from the loss occurrence time RT is set as the reference sampling point k. And set the sampling number to each sampling point so that it increases by 1 from k to the loss occurrence time RT, and assign the sampling number to each sampling point so that it decreases by 1 from k to the past. To do.
 そして、テンプレート設定部23aは、受話音声信号x(k+1)~x(k+4)をテンプレートTM0として設定する。 Then, the template setting unit 23a sets the reception voice signals x (k + 1) to x (k + 4) as the template TM0.
 そして、ピッチ検出部23bは、式(1)を用いて、テンプレートTM0と受話音声信号x(j-0)との相関値φ(0)を算出する。この場合テンプレートTM0は、音声信号x(k+1)~x(k+4)に当てはめられる。 Then, the pitch detection unit 23b calculates a correlation value φ (0) between the template TM0 and the received voice signal x (j-0) using the equation (1). In this case, the template TM0 is applied to the audio signals x (k + 1) to x (k + 4).
 次に、ピッチ検出部23bは、τ=1に設定し、τ=0と同様にして、式(1)を用いて、テンプレートTM0と音声信号x(j-1)との相関値φ(1)を算出する。この場合、テンプレートTM0は、音声信号x(k)~x(k+3)に当てはめられる。 Next, the pitch detection unit 23b sets τ = 1, and similarly to τ = 0, using the equation (1), the correlation value φ (1) between the template TM0 and the audio signal x (j−1). ) Is calculated. In this case, the template TM0 is applied to the audio signals x (k) to x (k + 3).
 以下、τ=4になるまで、テンプレートTM0が受話音声信号に対して過去に向けてスライドされ、式(1)を用いてφ(2),φ(3),φ(4)が算出される。 Thereafter, the template TM0 is slid toward the past with respect to the received voice signal until τ = 4, and φ (2), φ (3), φ (4) are calculated using Equation (1). .
 次に、ピッチ検出部23bは、τ=5に設定すると、τ≧スライド基準値(=5)であるため、ロス発生時点RTから左に5番目のサンプリング点を基準サンプリング点kとして設定する。そして、テンプレート設定部23aは、音声信号x(k+1)~x(k+5)をテンプレートTM5として設定する。そして、ピッチ検出部23bは、式(1)を用いてテンプレートTM5と音声信号x(j-5)との相関値φ(5)を求める。この場合、テンプレートTM5は、音声信号x(k-4)~x(k)に当てはめられる。 Next, when τ = 5, the pitch detection unit 23b sets τ ≧ slide reference value (= 5), and therefore sets the fifth sampling point to the left from the loss occurrence time RT as the reference sampling point k. Then, the template setting unit 23a sets the audio signals x (k + 1) to x (k + 5) as the template TM5. Then, the pitch detection unit 23b obtains a correlation value φ (5) between the template TM5 and the audio signal x (j-5) using Expression (1). In this case, the template TM5 is applied to the audio signals x (k-4) to x (k).
 次に、ピッチ検出部23bは、τ=6に設定し、ロス発生時点RTから左に6番目のサンプリング点を基準サンプリング点kとして設定する。そして、テンプレート設定部23aは、受話音声信号x(k+1)~x(k+6)をテンプレートTM6として設定する。そして、ピッチ検出部23bは、式(1)を用いてテンプレートTM6と受話音声信号x(j-6)との相関値φ(6)を求める。この場合、テンプレートTM6は、音声信号x(k-5)~x(k)に当てはめられる。 Next, the pitch detection unit 23b sets τ = 6, and sets the sixth sampling point to the left from the loss occurrence time RT as the reference sampling point k. Then, the template setting unit 23a sets the received voice signals x (k + 1) to x (k + 6) as the template TM6. Then, the pitch detection unit 23b obtains a correlation value φ (6) between the template TM6 and the received voice signal x (j-6) using Expression (1). In this case, the template TM6 is applied to the audio signals x (k-5) to x (k).
 以後、ピッチ検出部23bは、τが最大スライド量であるτmaxになるまで、上記処理を繰り返し、φ(τ)を求める。これにより、テンプレートTMは、スライド量が増大するにつれて、時間幅が増大されることになる。 Thereafter, the pitch detection unit 23b repeats the above processing until τ reaches the maximum slide amount τmax, and obtains φ (τ). As a result, the time width of the template TM is increased as the slide amount increases.
 図13は、図10に示す受話音声信号に対して本実施形態による手法を用いて相関値φ(τ)を求めたときの相関値φ(τ)のグラフを示している。なお、図13において、縦軸は相関値φ(τ)を示し、横軸は時間をサンプル数で示したものである。また、図13においては、AMDFにより相関値φ(τ)が算出されている。したがって、図11と同様、相関値の低い相関ピークほど受話音声信号とテンプレートTMとの相関が強い。 FIG. 13 shows a graph of the correlation value φ (τ) when the correlation value φ (τ) is obtained for the received voice signal shown in FIG. 10 using the method according to the present embodiment. In FIG. 13, the vertical axis indicates the correlation value φ (τ), and the horizontal axis indicates time in terms of the number of samples. In FIG. 13, the correlation value φ (τ) is calculated by AMDF. Therefore, as in FIG. 11, the correlation peak with the lower correlation value has a stronger correlation between the received voice signal and the template TM.
 図13においては、ロス発生時点RT(=0)からおよそ47サンプル経過したときに下に凸の相関ピークPK1が現れ、次に、相関ピークPK1が現れてからおよそ37サンプル経過したときに下に凸の相関ピークPK2が現れ、以後、およそ37サンプル経過する毎に下に凸の相関ピークが現れている。また、相関ピークは時間が経過するにつれて値が大きくなっており、テンプレートTMと受話音声信号との相関が弱くなっている。なお、サンプリング周波数を8kHzとすると、37サンプルは、37×0.125msec=4.625msecに相当し、47サンプルは、47×0.125=5.875msecに相当する。 In FIG. 13, a convex correlation peak PK1 appears downward when approximately 47 samples have elapsed from the loss occurrence time RT (= 0), and then downward when approximately 37 samples have elapsed since the correlation peak PK1 appeared. A convex correlation peak PK2 appears, and thereafter a convex correlation peak appears every approximately 37 samples. Further, the value of the correlation peak increases with time, and the correlation between the template TM and the received voice signal is weakened. If the sampling frequency is 8 kHz, 37 samples correspond to 37 × 0.125 msec = 4.625 msec, and 47 samples correspond to 47 × 0.125 = 5.875 msec.
 つまり、図13に示す相関ピークのうち、テンプレートTMを47サンプル分ずらしたときの相関ピークPK1が最小となっている。 That is, among the correlation peaks shown in FIG. 13, the correlation peak PK1 when the template TM is shifted by 47 samples is the smallest.
 そのため、ピッチ検出部23bは、最小の相関ピークPK1が現れた時刻である47サンプルをロス発生時点RTの直前の受話音声信号のピッチとして検出する。したがって、ピッチ検出部23bは、図10に示すロス発生時点RTの直前の受話音声信号のピッチである47サンプルを検出できていることが分かる。 Therefore, the pitch detector 23b detects 47 samples, which are the time when the minimum correlation peak PK1 appears, as the pitch of the received voice signal immediately before the loss occurrence time RT. Therefore, it can be seen that the pitch detector 23b can detect 47 samples, which are the pitches of the received voice signal immediately before the loss occurrence time RT shown in FIG.
 補償処理部24は、ロス発生時点RTから過去に向けてピッチ検出部23bにより検出された1ピッチ分の受話音声信号を取り出し、取り出した受話音声信号でパケットロスが発生したロス期間を補償する補償処理を行う。 The compensation processing unit 24 extracts a reception voice signal for one pitch detected by the pitch detection unit 23b from the loss occurrence time point RT to the past, and compensates for a loss period in which a packet loss has occurred in the extracted reception voice signal Process.
 ここで、補償処理部24は、例えば、図10に示す受話音声信号が入力され、ピッチ検出部23bがピッチとして47サンプルを検出したとすると、ロス発生時点RTから過去に向けて47サンプルの受話音声信号を取り出し、取り出された受話音声信号をロス期間の最後まで繰り返し当てはめてロス期間を補償する。 Here, for example, if the received voice signal shown in FIG. 10 is input to the compensation processing unit 24 and the pitch detection unit 23b detects 47 samples as the pitch, the reception of 47 samples from the loss occurrence time RT to the past is performed. A voice signal is extracted, and the received reception voice signal is repeatedly applied to the end of the loss period to compensate for the loss period.
 図14は、音声データ欠落補償処理部VCの動作(音声データ欠落補償処理)の手順を示すフローチャートである。なお、図14のフローチャートでは、説明の便宜上、a=1としている。まず、ステップS1において、パケットロス検出部22が、パケットロスを検出すると(ステップS1)、ピッチ検出部23bは、τ=0に設定する(ステップS2)。 FIG. 14 is a flowchart showing the procedure of the operation (audio data loss compensation processing) of the audio data loss compensation processing unit VC. In the flowchart of FIG. 14, a = 1 is set for convenience of explanation. First, in step S1, when the packet loss detection unit 22 detects a packet loss (step S1), the pitch detection unit 23b sets τ = 0 (step S2).
 次に、テンプレート設定部23aは、τの値に応じた時間幅のテンプレートTMを受話音声信号から設定する(ステップS3)。このとき、テンプレート設定部23aは、τ<スライド基準値の場合は、テンプレートTMの時間幅を初期時間幅に設定し、τ≧スライド基準値の場合、テンプレートTMの時間幅をN=τに設定する。 Next, the template setting unit 23a sets a template TM having a time width corresponding to the value of τ from the received voice signal (step S3). At this time, the template setting unit 23a sets the time width of the template TM to the initial time width if τ <slide reference value, and sets the time width of the template TM to N = τ if τ ≧ slide reference value. To do.
 次に、ピッチ検出部23bは、k+1がテンプレートTMの開始点となるように、基準サンプリング点kを設定し、各サンプリング点にサンプリング番号を付与する(ステップS4)。 Next, the pitch detection unit 23b sets a reference sampling point k so that k + 1 becomes the starting point of the template TM, and assigns a sampling number to each sampling point (step S4).
 次に、ピッチ検出部23bは、式(1)を用いてテンプレートTMと受話音声信号との相関値を算出する(ステップS5)。 Next, the pitch detection unit 23b calculates a correlation value between the template TM and the received voice signal using the equation (1) (step S5).
 次に、ピッチ検出部23bは、τ=τ+1とする(ステップS6)。次に、ピッチ検出部23bは、τ≧スライド基準値の場合(ステップS7)、すなわち、テンプレートTMのスライド量がスライド基準値を超えた場合、処理をステップS8に進め、τ<スライド基準値の場合(ステップS7)、処理をステップS5に戻す。ステップS5~S7の処理が繰り返されることで、初期時間幅のテンプレートTMは、スライド基準値となるまで、受話音声信号に対して過去に向けてスライドされる。 Next, the pitch detector 23b sets τ = τ + 1 (step S6). Next, when τ ≧ slide reference value (step S7), that is, when the slide amount of the template TM exceeds the slide reference value, the pitch detection unit 23b advances the process to step S8, where τ <slide reference value If so (step S7), the process returns to step S5. By repeating the processes of steps S5 to S7, the template TM having the initial time width is slid toward the past with respect to the received voice signal until the slide TM becomes the slide reference value.
 ステップS8において、τ<τmaxである場合(ステップS8)、処理がステップS3に戻され、τ≧τmaxとなるまで、ステップS3~S8の処理が繰り返される。これにより、テンプレートTMは、スライド量であるτが増大するにつれて時間幅が増大される。 In step S8, if τ <τmax (step S8), the process returns to step S3, and the processes of steps S3 to S8 are repeated until τ ≧ τmax. Thereby, the time width of the template TM is increased as τ which is the slide amount increases.
 ステップS8において、τ≧τmaxとなった場合(ステップS8)、ピッチ検出部23bは、ステップS5で算出した相関値から相関ピークを検出し、検出した相関ピークのうち、テンプレートTMと受話音声信号との相関が最も強い相関ピークのスライド量を特定し、特定したスライド量からピッチを検出する(ステップS9)。ここで、式(1)を採用した場合、相関値が最小の値を示す相関ピークがテンプレートTMと受話音声信号との最も強い相関を示す。 In step S8, when τ ≧ τmax (step S8), the pitch detector 23b detects a correlation peak from the correlation value calculated in step S5, and among the detected correlation peaks, the template TM and the received voice signal The slide amount of the correlation peak with the strongest correlation is identified, and the pitch is detected from the identified slide amount (step S9). Here, when Equation (1) is adopted, the correlation peak indicating the minimum correlation value indicates the strongest correlation between the template TM and the received voice signal.
 また、ピッチ検出部23bは、特定したスライド量に音声信号のサンプリング周期を乗じることで、ピッチを算出すればよい。 Further, the pitch detection unit 23b may calculate the pitch by multiplying the specified slide amount by the sampling period of the audio signal.
 次に、補償処理部24は、ステップS9で検出されたピッチに従って受話音声信号を取り出し、取り出した受話音声信号を用いてロス期間を補償する(ステップS10)。 Next, the compensation processing unit 24 extracts the received voice signal according to the pitch detected in step S9, and compensates the loss period using the received received voice signal (step S10).
 なお、図12の説明では、テンプレート設定部23aは、a=1に設定したが、これに限定されず、テンプレートTMのスライド量が所定の変更基準値を超えるまで、aを1≦a<2の範囲内の所定の固定値に設定し、スライド量が変更基準値を超えると、スライド量が最大スライド量(τmax)に近づくにつれて、1に近づくようにaの値を漸次減少させてもよい。変更基準値としては、例えば上記のスライド基準値を採用することができる。 In the description of FIG. 12, the template setting unit 23a sets a = 1. However, the present invention is not limited to this, and a is set to 1 ≦ a <2 until the slide amount of the template TM exceeds a predetermined change reference value. When the slide amount exceeds the change reference value, the value of a may be gradually decreased so as to approach 1 as the slide amount approaches the maximum slide amount (τmax). . As the change reference value, for example, the above-described slide reference value can be adopted.
 これにより、スライド量が小さい場合、テンプレートTMの時間幅をスライド量に対して大きめに設定することができ、スライド量が大きい場合、テンプレートTMの時間幅をスライド量程度の値に設定することができる。そのため、スライド量が小さい場合において、テンプレートTMの時間幅が小さくなりすぎることによる相関演算精度の低下を防止することができる。 As a result, when the slide amount is small, the time width of the template TM can be set larger than the slide amount, and when the slide amount is large, the time width of the template TM can be set to a value about the slide amount. it can. Therefore, when the slide amount is small, it is possible to prevent the correlation calculation accuracy from being lowered due to the time width of the template TM becoming too small.
 また、相関演算としては、式(1)に示すAMDFに代えて、従来周知である相互相関や平均自乗差関数(Average Square Difference Function)等の手法を採用してもよい。 Further, as the correlation calculation, instead of AMDF shown in Equation (1), a conventionally known method such as cross-correlation or mean square difference function (Average Difference Function) may be employed.
 このように、本実施形態における音声データ欠落補償処理部VCによれば、パケットロスの発生時点RTから過去に向けてある時間幅の受話音声信号がテンプレートTMとして設定される。そして、設定されたテンプレートTMが受話音声信号に対して現時点から過去に向けてスライドされる。そして、テンプレートTMと受話音声信号との相関が求められ、受話音声信号のピッチが検出される。 As described above, according to the voice data loss compensation processing unit VC in the present embodiment, the received voice signal having a time width from the packet loss occurrence time point RT to the past is set as the template TM. Then, the set template TM is slid toward the past from the present time with respect to the received voice signal. Then, the correlation between the template TM and the received voice signal is obtained, and the pitch of the received voice signal is detected.
 ここで、テンプレートTMはスライド量が増大するにつれて時間幅が増大される。したがって、スライド量が小さい比較的初期の段階において、現時点のほぼ直前の1ピッチ分の受話音声信号がテンプレートTMとされるタイミングが発生する。このとき、テンプレートTMと受話音声信号との間で強い相関ピークが現れる。一方、スライド量が大きくなると、それに応じてテンプレートTMの時間幅も増大され、テンプレートTMには複数の周波数成分が含まれるようになる。そのため、上記のタイミングで得られる相関ピークほど強い相関ピークを得ることはできなくなる。よって、現時点のほぼ直前の受話音声信号のピッチを精度良く検出することが可能となる。 Here, the time width of the template TM increases as the slide amount increases. Therefore, at a relatively early stage where the slide amount is small, a timing occurs when the received voice signal for one pitch almost immediately before the current time is used as the template TM. At this time, a strong correlation peak appears between the template TM and the received voice signal. On the other hand, when the slide amount increases, the time width of the template TM increases accordingly, and the template TM includes a plurality of frequency components. Therefore, it becomes impossible to obtain a stronger correlation peak as the correlation peak obtained at the above timing. Therefore, it is possible to accurately detect the pitch of the received voice signal almost immediately before the current time.
 揺らぎ吸収処理部JAは、図15に示すようにジッタバッファ30、カウント部31、バッファサイズ変更部32、受信時刻記録部33、基準値記憶部34、隠蔽処理部35、出力部36、観測履歴保持部37を備えている。但し、これらの各部は、通話処理部2のDSPが第2のソフトウェアにおける揺らぎ吸収処理プログラムを実行することで実現される。また、ジッタバッファ30は音声データ欠落補償処理部VCのジッタバッファ20と共用される。 As shown in FIG. 15, the fluctuation absorption processing unit JA includes a jitter buffer 30, a counting unit 31, a buffer size changing unit 32, a reception time recording unit 33, a reference value storage unit 34, a concealment processing unit 35, an output unit 36, and an observation history. A holding part 37 is provided. However, these units are realized by the DSP of the call processing unit 2 executing a fluctuation absorbing processing program in the second software. The jitter buffer 30 is shared with the jitter buffer 20 of the audio data loss compensation processing unit VC.
 受信時刻記録部33は、伝送処理部7が音声パケット(受話音声パケット)を受信した時刻(タイムスタンプ)を、受信したパケットのシーケンス番号に対応付けて記録する。 The reception time recording unit 33 records the time (time stamp) when the transmission processing unit 7 receives the voice packet (received voice packet) in association with the sequence number of the received packet.
 ジッタバッファ30は、例えばリングバッファにより構成され、伝送処理部7により受信されたパケットを時系列順で蓄積する。これにより、信号幹線Lsを介して伝送される音声パケットの伝送遅延の揺らぎが吸収される。なお、ジッタバッファ30のサイズとしては、後述する基準値よりも大きなサイズが採用されている。 The jitter buffer 30 is configured by, for example, a ring buffer, and accumulates packets received by the transmission processing unit 7 in chronological order. As a result, fluctuations in the transmission delay of the voice packet transmitted via the signal trunk line Ls are absorbed. As the size of the jitter buffer 30, a size larger than a reference value described later is adopted.
 カウント部31は、音声がパケット化される周期(パケット化周期)以下の所定周期(カウント周期)で、ジッタバッファ30に蓄積されている蓄積パケット数をカウントすることでパケットカウント値を算出する。そして、カウント部31が算出したパケットカウント値は観測履歴保持部37に保持される。観測履歴保持部37は、例えば、揮発性の半導体メモリにより構成され、カウント部31により算出された過去N(Nは正の整数)回のパケットカウント値を保持する。 The counting unit 31 calculates a packet count value by counting the number of accumulated packets accumulated in the jitter buffer 30 at a predetermined period (count period) that is equal to or less than a period in which voice is packetized (packetization period). The packet count value calculated by the count unit 31 is held in the observation history holding unit 37. The observation history holding unit 37 is composed of, for example, a volatile semiconductor memory, and holds the packet count value of the past N (N is a positive integer) calculated by the counting unit 31.
 図16は、カウント部31によるパケットカウント値の算出処理の説明図である。図16に示すように、カウント部31は、カウント周期Tbでパケットカウント値を算出している。 FIG. 16 is an explanatory diagram of packet count value calculation processing by the count unit 31. As shown in FIG. 16, the count unit 31 calculates a packet count value at the count cycle Tb.
 ここで、カウント部31は、パケットカウント値の算出タイミングである算出時刻Tkから、過去、パケット化周期Ta内に受信したパケットPSについては、カウント値をΔT/Taにより得られる値に設定し、算出時刻Tkから、過去、パケット化周期Ta以前に受信したパケットPLについては、カウント値を1に設定することでパケットカウント値を算出する。つまり、パケットPSのパケットカウント値は、受信時刻が算出時刻Tkに近づくにつれて差分ΔTが小さくなるため、値が小さくなる。 Here, the counting unit 31 sets the count value to a value obtained by ΔT / Ta for the packet PS received in the past in the packetization period Ta from the calculation time Tk that is the calculation timing of the packet count value, For the packet PL received before the packetization period Ta from the calculation time Tk, the packet count value is calculated by setting the count value to 1. That is, the packet count value of the packet PS decreases as the difference ΔT decreases as the reception time approaches the calculation time Tk.
 ここで、パケットPSについては、パケットカウント値を算出するにあたって受信時刻が用いられているため、受信時刻を保持しておく必要がある。一方、パケットPLについては、パケットカウント値を算出するにあたって受信時刻が不要であるため、受信時刻を記録しておく必要はない。 Here, for the packet PS, since the reception time is used in calculating the packet count value, it is necessary to hold the reception time. On the other hand, for the packet PL, since the reception time is not necessary for calculating the packet count value, it is not necessary to record the reception time.
 そこで、カウント部31は、パケットカウント値の算出処理が終了すると、算出時刻Tkから、過去、パケット化周期Taとカウント周期Tbとの差分(=Ta-Tb)以前に受信したパケットについては、受信時刻を受信時刻記録部33から削除する。 Therefore, when the packet count value calculation process ends, the counting unit 31 receives the packet received before the difference (= Ta−Tb) between the packetization period Ta and the count period Tb from the calculation time Tk. The time is deleted from the reception time recording unit 33.
 これにより、パケットカウント値の次の算出時刻である時刻Tk+1において、過去、パケット化周期Taに受信したパケットの受信時刻が受信時刻記録部33に保持されている結果、カウント部31は、時刻Tk+1において、過去、パケット化周期Taに受信したパケットの受信時刻を取得することができる。こうすることで、受信時刻記録部33の容量を節約することが可能となる。 Thereby, at the time Tk + 1 which is the next calculation time of the packet count value, as a result of the reception time of the packet received in the past in the packetization period Ta being held in the reception time recording unit 33, the counting unit 31 is At time Tk + 1, the reception time of the packet received in the past in the packetization period Ta can be acquired. In this way, the capacity of the reception time recording unit 33 can be saved.
 バッファサイズ変更部32は、カウント部31により算出されたパケットカウント値の過去のN回のパケットカウント値を観測履歴保持部37から読み出し、読み出したN回のパケットカウント値から、n番目に小さいパケットカウント値をパケットカウント値の代表値として算出し、算出した代表値が所定の基準値より大きければ、ジッタバッファ30に蓄積されているパケットを削除し、代表値が基準値より小さければ、ジッタバッファ30にパケットを挿入する。なお、基準値は基準値記憶部34に記憶されている。 The buffer size changing unit 32 reads the past N packet count values of the packet count value calculated by the counting unit 31 from the observation history holding unit 37, and the nth smallest packet from the read N packet count values The count value is calculated as a representative value of the packet count value. If the calculated representative value is larger than a predetermined reference value, the packet stored in the jitter buffer 30 is deleted. If the representative value is smaller than the reference value, the jitter buffer Insert packet into 30. The reference value is stored in the reference value storage unit 34.
 ここで、バッファサイズ変更部32は、代表値が基準値より小さい場合は、代表値が、基準値以上かつ基準値+1未満となるように、ジッタバッファ30にパケットを挿入すればよい。例えば、代表値が2.1、基準値が4の場合、代表値が4.1となるように2つのパケットがジッタバッファ30に挿入される。また、バッファサイズ変更部32は、代表値が基準値より大きい場合は、代表値が基準値以上かつ基準値+1未満となるように、ジッタバッファ30からパケットを削除すればよい。例えば、代表値が4.2、基準値が2の場合、代表値が2.2となるように2つのパケットがジッタバッファ30から削除される。 Here, when the representative value is smaller than the reference value, the buffer size changing unit 32 may insert a packet into the jitter buffer 30 so that the representative value is not less than the reference value and less than the reference value + 1. For example, when the representative value is 2.1 and the reference value is 4, two packets are inserted into the jitter buffer 30 so that the representative value is 4.1. In addition, when the representative value is larger than the reference value, the buffer size changing unit 32 may delete the packet from the jitter buffer 30 so that the representative value is not less than the reference value and less than the reference value + 1. For example, when the representative value is 4.2 and the reference value is 2, two packets are deleted from the jitter buffer 30 so that the representative value is 2.2.
 なお、nとしては、N×αにより整数値に丸め込んだ値を採用することが好ましい。また、基準値としては、集合住宅用インターホンシステムがインターホン通話(パケット伝送方式による通話)において許容する通話遅延時間に基づいて予め定められた値が採用されている。つまり、ジッタバッファ30の蓄積パケット数が基準値よりも大きければ、ジッタバッファ30において、出力待ちのパケット数が増大するため、通話遅延が発生する。そこで、上述のようにn番目のパケットカウント値である代表値が基準値よりも大きい場合は、ジッタバッファ30からパケットを削除することにより、通話遅延を防止することができる。 In addition, as n, it is preferable to adopt a value rounded to an integer value by N × α. In addition, as the reference value, a value determined in advance based on a call delay time allowed by the intercom system for collective housing in an interphone call (call using a packet transmission method) is adopted. That is, if the number of packets stored in the jitter buffer 30 is larger than the reference value, the number of packets waiting for output in the jitter buffer 30 increases, so that a call delay occurs. Therefore, as described above, when the representative value that is the nth packet count value is larger than the reference value, it is possible to prevent call delay by deleting the packet from the jitter buffer 30.
 一方、n番目のパケットカウント値である代表値が基準値よりも小さい場合は、ジッタバッファ30にパケットを挿入している。これにより、蓄積パケット数が基準値以下になる確率を、α(=n/N)%にすることができる。 On the other hand, when the representative value which is the nth packet count value is smaller than the reference value, the packet is inserted into the jitter buffer 30. As a result, the probability that the number of stored packets is equal to or less than the reference value can be set to α (= n / N)%.
 隠蔽処理部35は、ジッタバッファ30に挿入された無効パケット(音声が含まれないパケット。以下、同じ。)に対して、パケットロス隠蔽処理を行うとともに、ジッタバッファ30においてパケットが枯渇した場合にパケットロス隠蔽処理を行う。ここで、パケットロス隠蔽処理としては、例えば、無効パケットより過去の受話音声信号から受話音声信号のピッチを検出し、無効パケットの直前の有効パケット(音声が含まれるパケット。以下、同じ。)の受話音声信号において、終端から1ピッチ分前の区間の音声波形を取り出し、この音声波形をパケット化周期(例えば、20msec)の期間、繰り返すことで得られる音声波形を無効パケットの受話音声信号として生成する手法を採用すればよい。なお、ピッチの検出については、上述した音声データ欠落補償処理におけるピッチ検出処理と共通の手法を採用すればよい。 The concealment processing unit 35 performs a packet loss concealment process on invalid packets (packets that do not include voice; the same applies hereinafter) inserted into the jitter buffer 30 and when the packets are depleted in the jitter buffer 30. Perform packet loss concealment processing. Here, as the packet loss concealment process, for example, the pitch of the received voice signal is detected from the received voice signal in the past from the invalid packet, and the valid packet immediately before the invalid packet (packet including voice; the same applies hereinafter). In the received voice signal, the voice waveform of the section one pitch before the end is taken out, and the voice waveform obtained by repeating this voice waveform for the period of packetization period (for example, 20 msec) is generated as the received voice signal of the invalid packet. It is sufficient to adopt a technique to do this. As for the pitch detection, a method common to the pitch detection process in the audio data loss compensation process described above may be employed.
 出力部36は、ジッタバッファ30の蓄積パケット数が基準値以上となった場合、ジッタバッファ30からパケット化周期Taに同期してパケット(受話音声データ)を時系列順に読み出して受話音声信号の信号経路に出力する。ここで、出力部36は、ジッタバッファ30から取り出したパケットが音声を含まない無効パケットである場合、隠蔽処理部35にパケットロス隠蔽処理を実行させ、実行処理後の音声データを出力する。 When the number of packets stored in the jitter buffer 30 exceeds the reference value, the output unit 36 reads packets (received voice data) from the jitter buffer 30 in chronological order in synchronization with the packetization period Ta, and receives the received voice signal Output to the route. Here, when the packet extracted from the jitter buffer 30 is an invalid packet that does not include voice, the output unit 36 causes the concealment processing unit 35 to execute the packet loss concealment process, and outputs the voice data after the execution process.
 観測履歴保持部37は、例えば不揮発性の半導体メモリにより構成され、カウント部31により算出された過去N回のパケットカウント値を保持する。 The observation history holding unit 37 is configured by, for example, a non-volatile semiconductor memory, and holds the packet count value of the past N times calculated by the counting unit 31.
 図17は、ジッタバッファ30の役割を説明するための図である。図17に示すように、受話音声信号を含むパケットは、パケット化周期(図示例では20msec)で相手の通話端末(ロビーインターホンLIや管理室装置Xあるいは他の住戸の住戸機)から送信される。図17では、1~8の番号(シーケンス番号)が付された8個のパケットが20msecの間隔で送信されている状況を表している。 FIG. 17 is a diagram for explaining the role of the jitter buffer 30. As shown in FIG. 17, a packet including a received voice signal is transmitted from the other party's call terminal (lobby interphone LI, management room device X, or other dwelling unit) at a packetization period (20 msec in the illustrated example). . FIG. 17 shows a situation in which 8 packets with numbers 1 to 8 (sequence numbers) are transmitted at intervals of 20 msec.
 相手の通話端末から送信されたパケットは信号幹線Lsを介して住戸機Aで受信されることになる。ここで、多数のパケット(音声パケット、映像パケット、制御パケット)が信号幹線Lsを介して多重伝送されるため、相手の通話端末からパケット化周期で送信される音声パケットが住戸機Aに到達するまでの時間(伝送遅延)は、個々の音声パケット毎に大きく相違し、いわゆる伝送遅延の揺らぎが発生する。よって、住戸機Aによる音声パケットの受信間隔は不等間隔になる。 The packet transmitted from the other party's call terminal is received by the dwelling unit A via the signal trunk line Ls. Here, since a large number of packets (voice packets, video packets, and control packets) are multiplexed and transmitted via the signal trunk line Ls, voice packets transmitted from the partner telephone terminal at the packetization period reach the dwelling unit A. The time until the transmission time (transmission delay) is greatly different for each voice packet, and so-called transmission delay fluctuation occurs. Therefore, the reception intervals of voice packets by the dwelling unit A are unequal intervals.
 そこで、この伝送遅延の揺らぎを吸収するためにジッタバッファ30が設けられている。図17においては、ジッタバッファ30のバッファサイズはパケット3個分とされている。また、出力部36は、1番目のパケットを受信してから、遅延時間Tdが経過した時刻T1において、1番目のパケットに復号化処理及びD/A変換処理を施して出力を開始している。 Therefore, a jitter buffer 30 is provided to absorb this transmission delay fluctuation. In FIG. 17, the buffer size of the jitter buffer 30 is three packets. Further, the output unit 36 starts the output by performing the decoding process and the D / A conversion process on the first packet at the time T1 when the delay time Td has elapsed since the reception of the first packet. .
 図17の場合、時刻T1から20msec経過後の2番目のパケットの出力時刻である時刻T2において、ジッタバッファ30は、2番目のパケットを蓄積している。よって、出力部36は、時刻T2において2番目のパケットを出力することができる。 In the case of FIG. 17, the jitter buffer 30 stores the second packet at time T2, which is the output time of the second packet after 20 msec from time T1. Therefore, the output unit 36 can output the second packet at time T2.
 一方、3番目のパケットは、伝送遅延が極端に大きいため、時刻T3において、住戸機Aに到達しておらず、ジッタバッファ30において、パケットの枯渇が生じている。そのため、出力部36は、時刻T3において3番目のパケットを出力することができず、音抜け(音声データの欠落)が発生する。 On the other hand, since the third packet has an extremely large transmission delay, it does not reach the dwelling unit A at the time T3 and the jitter buffer 30 is depleted. For this reason, the output unit 36 cannot output the third packet at time T3, and sound loss (voice data loss) occurs.
 また、3~7番目のパケットは輻輳解消後に短時間で連続して住戸機Aに到達しており、7番目のパケットが住戸機Aに到達したとき、ジッタバッファ30には、5、6番目のパケットが存在するが、ジッタバッファ30に空きがあるため、7番目のパケットは破棄されず、ジッタバッファ30に格納される。よって、時刻T7において、7番目のパケットが出力部36から出力される。 The third to seventh packets reach the dwelling unit A continuously in a short time after the congestion is eliminated. When the seventh packet reaches the dwelling unit A, the jitter buffer 30 includes the fifth and sixth pieces. However, since the jitter buffer 30 is empty, the seventh packet is not discarded and stored in the jitter buffer 30. Therefore, the seventh packet is output from the output unit 36 at time T7.
 このように、伝送遅延の揺らぎの特性は動的に変化するため、ジッタバッファ30のバッファサイズを固定サイズにすると、想定される伝送遅延の揺らぎよりも充分に長くせざるを得ない。また、ジッタバッファ30のバッファサイズを充分に長くし、かつ、遅延時間Tdを充分に長くすれば、音抜けの発生は防止できるが、遅延時間Tdが長いと、ジッタバッファ30において、出力待ちのパケットが増大し、通話遅延が発生してしまう。 As described above, since the characteristics of the transmission delay fluctuation dynamically change, if the buffer size of the jitter buffer 30 is set to a fixed size, the transmission delay fluctuation must be sufficiently longer than the assumed transmission delay fluctuation. Moreover, if the buffer size of the jitter buffer 30 is made sufficiently long and the delay time Td is made sufficiently long, the occurrence of sound omission can be prevented, but if the delay time Td is long, the jitter buffer 30 waits for output. Packets increase and call delay occurs.
 図18は、伝送遅延と伝送遅延の発生頻度との関係を示す伝送遅延特性のグラフの一例を示している。なお、図18において、縦軸は、発生頻度を示し、横軸は伝送遅延を示している。また、図19は、ジッタバッファ30の最適なバッファサイズを説明するための図である。図18において、dminは最小の伝送遅延を示し、dmaxは最大の伝送遅延を示している。図19では、k-1番目のパケットの伝送遅延はdminであり、k番目のパケットの伝送遅延はdであり、k+1番目のパケットの伝送遅延はdmaxである。 FIG. 18 shows an example of a transmission delay characteristic graph showing the relationship between the transmission delay and the frequency of occurrence of the transmission delay. In FIG. 18, the vertical axis indicates the occurrence frequency, and the horizontal axis indicates the transmission delay. FIG. 19 is a diagram for explaining an optimum buffer size of the jitter buffer 30. In FIG. 18, dmin represents the minimum transmission delay, and dmax represents the maximum transmission delay. In FIG. 19, the transmission delay of the (k-1) th packet is dmin, the transmission delay of the kth packet is d, and the transmission delay of the (k + 1) th packet is dmax.
 この場合、出力部36による最適な出力の待ち時間は下記のようになる。i)dmaxで届いたパケットは直ちに出力する。ii)dminで届いたパケットは、dmax-dmin待ってから出力する。iii)dで届いたパケットはdmax-d待ってから出力する。 In this case, the optimum output waiting time by the output unit 36 is as follows. i) Packets received with dmax are output immediately. ii) Wait for dmax-dmin before outputting packets that arrive at dmin. iii) The packet arrived at d is output after waiting dmax-d.
 よって、ジッタバッファ30におけるパケットの枯渇を回避するためには、ジッタバッファ30のバッファサイズbufをbuf≧dmax-dminとすればよいが、伝送遅延特性のdmaxが極端に大きくなると、つまり、図18のグラフの右端の尾が極端に長くなると、バッファサイズbufの値が大きくなってしまう。また、図18のグラフに示すように、伝送遅延が増大するにつれて発生頻度が低下しているため、真のdmaxを観測するには、膨大な数のパケットの伝送遅延を観測する必要がある。そのため、図18のグラフでは、真のdmaxではなく、伝送特性の分布の上位数%を切り捨てる値がdmaxとみなされている。この場合、dmaxとみなす値以上の伝送遅延が発生するとパケットの枯渇が発生する。 Therefore, in order to avoid packet depletion in the jitter buffer 30, the buffer size buf of the jitter buffer 30 may be set to buf ≧ dmax−dmin. However, when dmax of the transmission delay characteristic becomes extremely large, that is, FIG. If the tail at the right end of the graph becomes extremely long, the buffer size buf will increase. Further, as shown in the graph of FIG. 18, since the frequency of occurrence decreases as the transmission delay increases, in order to observe the true dmax, it is necessary to observe the transmission delay of a huge number of packets. For this reason, in the graph of FIG. 18, not true dmax but a value obtained by rounding down the upper few percent of the distribution of transmission characteristics is regarded as dmax. In this case, when a transmission delay exceeding the value considered as dmax occurs, packet depletion occurs.
 したがって、パケットの枯渇を防止するためには、dmaxとみなす値を大きく設定することが好ましいが、逆にdmaxとみなす値が大きすぎるとバッファサイズbufが増大し、ジッタバッファ30において、出力待ちの待ちパケットが増大する結果、出力遅延が生じてしまう。このような出力遅延は、パケット伝送方式のインターホン通話においては通話遅延となって現れるため、極力低く抑える方が好ましい。そこで、上述の処理を実行することで、パケットの枯渇を防止すると同時に、通話遅延の防止を図っているのである。 Therefore, in order to prevent packet depletion, it is preferable to set a large value to be regarded as dmax, but conversely, if the value regarded as dmax is too large, the buffer size buf increases, and the jitter buffer 30 waits for output. As a result of the increase in waiting packets, an output delay occurs. Such an output delay appears as a call delay in a packet transmission interphone call, and is preferably suppressed as low as possible. Therefore, by executing the above-described processing, packet depletion is prevented and at the same time, call delay is prevented.
 図20は、揺らぎ吸収処理部JAの揺らぎ吸収処理を示すフローチャートである。まず、ステップS1において、カウント部31は、前回、パケットカウント値の算出タイミングを算出してから、カウント周期Tbが経過して、パケットカウント値の算出タイミングとなったか否かを判定する。そして、カウント部31は、パケットカウント値の算出タイミングになったと判定すると(ステップS1でYES)、現在、ジッタバッファ30に蓄積されているパケット数である蓄積パケット数をカウントする(ステップS2)。一方、カウント部31は、パケットカウント値の算出タイミングになっていないと判定すると(ステップS1でNO)、処理をステップS1に戻す。 FIG. 20 is a flowchart showing the fluctuation absorption processing of the fluctuation absorption processing unit JA. First, in step S1, the counting unit 31 determines whether or not the packet count value calculation timing comes after the count period Tb has elapsed since the packet count value calculation timing was calculated last time. If the counting unit 31 determines that the packet count value calculation timing has come (YES in step S1), the counting unit 31 counts the number of accumulated packets that are currently accumulated in the jitter buffer 30 (step S2). On the other hand, when determining that the packet count value calculation timing has not come (NO in step S1), the counting unit 31 returns the process to step S1.
 次に、カウント部31は、パケットカウント値の算出処理を実行して、パケットカウント値を算出する(ステップS3)。 Next, the count unit 31 executes a packet count value calculation process to calculate a packet count value (step S3).
 図21は、パケットカウント値の算出処理の詳細を示すフローチャートである。まず、カウント部31は、現在時刻をパケットカウント値の算出時刻として特定する(ステップS21)。ここで、住戸機Aの制御部1が時計機能を有しているため、この時計機能を用いて算出時刻が特定できる。 FIG. 21 is a flowchart showing details of packet count value calculation processing. First, the count unit 31 specifies the current time as the packet count value calculation time (step S21). Here, since the control unit 1 of the dwelling unit A has a clock function, the calculated time can be specified using the clock function.
 次に、カウント部31は、ジッタバッファ30に蓄積されているパケットのうち、図16に示すように算出時刻Tkから、過去、パケット化周期Ta内に受信された各パケットの受信時刻を特定する(ステップS22)。この場合、カウント部31は、受信時刻記録部33に記録された受信時刻に対応付けられたシーケンス番号を特定することで、各パケットの受信時刻を特定する。 Next, the counting unit 31 specifies the reception time of each packet received in the past in the packetization period Ta from the calculation time Tk as shown in FIG. 16 among the packets stored in the jitter buffer 30. (Step S22). In this case, the count unit 31 specifies the reception time of each packet by specifying the sequence number associated with the reception time recorded in the reception time recording unit 33.
 次に、カウント部31は、算出時刻Tkから、過去、パケット化周期Ta内に受信された各パケットについて、算出時刻Tkと受信時刻との差分ΔTを算出する(ステップS23)。次に、カウント部31は、過去、パケット化周期Ta内に受信された各パケットについて、ΔT/Taを算出し、このΔT/Taを、各パケットのカウント値として設定する(ステップS24)。 Next, from the calculation time Tk, the counting unit 31 calculates a difference ΔT between the calculation time Tk and the reception time for each packet received in the past in the packetization period Ta (step S23). Next, the counting unit 31 calculates ΔT / Ta for each packet received in the past in the packetization period Ta, and sets this ΔT / Ta as the count value of each packet (step S24).
 次に、カウント部31は、ジッタバッファ30に蓄積されているパケットのうち、算出時刻Tkから、過去、パケット化周期Ta以前に受信されたパケットについては、カウント値を1として設定する(ステップS25)。 Next, the count unit 31 sets the count value to 1 for packets received from the calculation time Tk before the packetization period Ta among the packets stored in the jitter buffer 30 (step S25). ).
 次に、カウント部31は、ステップS24,S25で設定したカウント値を用いて、ジッタバッファ30の蓄積パケット数をカウントすることで、パケットカウント値を算出する(ステップS26)。例えば、算出時刻Tkから、過去、パケット化周期Ta以前に受信されたパケット数が1個、算出時刻Tkから、過去、パケット化周期Ta内に受信されたパケット数が2個であり、この2個のパケットの受信時刻がTi,Tjとすると、パケットカウント値は、1+(Tk-Ti)/Ta+(Tk-Tj)/Taとなる。 Next, the count unit 31 calculates the packet count value by counting the number of packets stored in the jitter buffer 30 using the count value set in steps S24 and S25 (step S26). For example, from the calculation time Tk, the number of packets received before the packetization cycle Ta in the past is 1, and from the calculation time Tk, the number of packets received in the past within the packetization cycle Ta is two. When the reception time of each packet is Ti and Tj, the packet count value is 1+ (Tk−Ti) / Ta + (Tk−Tj) / Ta.
 次に、カウント部31は、算出時刻Tkから、過去、Ta-Tb以前に受信したパケットについては、受信時刻記録部33から受信時刻を削除する(ステップS27)。 Next, the counting unit 31 deletes the reception time from the reception time recording unit 33 for packets received in the past and before Ta-Tb from the calculation time Tk (step S27).
 図20のフローチャートに戻り、ステップS4において、カウント部31は、算出時刻Tkにおけるパケットカウント値を観測履歴保持部37に保持させる。この場合、カウント部31は、観測履歴保持部37に保持されているパケットカウント値の個数がN個となるように、最古のパケットカウント値を観測履歴保持部37から削除する。 Referring back to the flowchart of FIG. 20, in step S4, the counting unit 31 causes the observation history holding unit 37 to hold the packet count value at the calculation time Tk. In this case, the count unit 31 deletes the oldest packet count value from the observation history holding unit 37 so that the number of packet count values held in the observation history holding unit 37 is N.
 次に、バッファサイズ変更部32は、観測履歴保持部37に記憶されているN個のパケットカウント値のうち、n番目に小さいパケットカウント値を代表値として特定する(ステップS5)。 Next, the buffer size changing unit 32 specifies the nth smallest packet count value among the N packet count values stored in the observation history holding unit 37 as a representative value (step S5).
 図22は、パケットカウント値と、パケットカウント値の算出時刻との関係を示した模式図であり、縦軸がパケットカウント値を示し、横軸がパケットカウント値の算出時刻を示している。図22では、N=9,n=3とされている。したがって、バッファサイズ変更部32は、図22で示す左端から2番目の時刻Tk-7のパケットカウント値が、3番目に小さいため、時刻Tk-7のパケットカウント値を代表値として特定する。 FIG. 22 is a schematic diagram showing the relationship between the packet count value and the calculation time of the packet count value. The vertical axis shows the packet count value, and the horizontal axis shows the calculation time of the packet count value. In FIG. 22, N = 9 and n = 3. Therefore, since the packet count value at the second time Tk-7 from the left end shown in FIG. 22 is the third smallest, the buffer size changing unit 32 specifies the packet count value at the time Tk-7 as a representative value.
 次に、バッファサイズ変更部32は、代表値が基準値より大きいか否かを判定し、代表値≧基準値+1の場合(ステップS6でYES)、代表値が基準値以上かつ基準値+1未満となる個数のパケットをジッタバッファ30から削除する(ステップS7)。 Next, the buffer size changing unit 32 determines whether or not the representative value is greater than the reference value.If representative value ≧ reference value + 1 (YES in step S6), the representative value is greater than or equal to the reference value and the reference value + The number of packets that is less than 1 is deleted from the jitter buffer 30 (step S7).
 次に、バッファサイズ変更部32は、観測履歴保持部37に保持されているN個のパケットカウント値のそれぞれから、ステップS7で削除したパケット数を差し引き、N個のパケットカウント値を更新し、観測履歴を更新する(ステップS8)。例えば、削除したパケット数が1個であるとすると、N個のパケットカウント値の全てから1が減算される。これにより、ジッタバッファ30からパケットを削除した事実が観測履歴に反映される。 Next, the buffer size changing unit 32 subtracts the number of packets deleted in step S7 from each of the N packet count values held in the observation history holding unit 37, and updates the N packet count values. The observation history is updated (step S8). For example, assuming that the number of deleted packets is 1, 1 is subtracted from all N packet count values. Thereby, the fact that the packet is deleted from the jitter buffer 30 is reflected in the observation history.
 一方、ステップS6において、代表値が基準値+1未満であり(ステップS6でNO)、かつ、代表値が基準値以上(ステップS9でNO)の場合、バッファサイズ変更部32は、ジッタバッファ30に対してパケットの削除及び挿入を行わない(ステップS10)。 On the other hand, in step S6, when the representative value is less than the reference value +1 (NO in step S6) and the representative value is equal to or larger than the reference value (NO in step S9), the buffer size changing unit 32 is configured to use the jitter buffer 30. The packet is not deleted or inserted in step S10.
 一方、代表値<基準値の場合(ステップS9でYES)、バッファサイズ変更部32は、代表値が基準値以上かつ基準値+1未満となる個数のパケットをジッタバッファ30に挿入する(ステップS11)。 On the other hand, if representative value <reference value (YES in step S9), the buffer size changing unit 32 inserts into the jitter buffer 30 a number of packets whose representative value is greater than or equal to the reference value and less than the reference value + 1 (step S11). ).
 次に、バッファサイズ変更部32は、観測履歴保持部37に保持されているN個のパケットカウント値のそれぞれに対し、ステップS11で挿入したパケット数を加算し、N個のパケットカウント値を更新し、観測履歴を更新する(ステップS12)。例えば、挿入したパケット数が1個であるとすると、N個のパケットカウント値の全てに1が加算される。これにより、ジッタバッファ30にパケットを挿入した事実が観測履歴に反映される。 Next, the buffer size changing unit 32 adds the number of packets inserted in step S11 to each of the N packet count values held in the observation history holding unit 37, and updates the N packet count values. Then, the observation history is updated (step S12). For example, if the number of inserted packets is 1, 1 is added to all N packet count values. Thereby, the fact that the packet is inserted into the jitter buffer 30 is reflected in the observation history.
 そして、ステップS8,S10又はS12の処理が終了すると、処理がステップS1に戻され、次のパケットカウント値の算出時刻が到来すると、ステップS2以降の処理が実行される。 Then, when the process of step S8, S10 or S12 is completed, the process returns to step S1, and when the next packet count value calculation time comes, the processes after step S2 are executed.
 図23Aはバッファサイズ変更部32によるパケット挿入時の処理を示した模式図であり、図23Bはバッファサイズ変更部32によるパケット削除時の処理を示した模式図である。図23Aの例では、バッファサイズ変更部32は、有効パケットである4番目のパケットと5番目のパケットとの間に無効パケットを挿入している。図23Bの例では、バッファサイズ変更部32は、有効パケットである4番目のパケットと5番目のパケットとをオーバーラップ加算して2つのパケット長を1つのパケット長にすることで、1つのパケットを削除している。 FIG. 23A is a schematic diagram showing processing at the time of packet insertion by the buffer size changing unit 32, and FIG. 23B is a schematic diagram showing processing at the time of packet deletion by the buffer size changing unit 32. In the example of FIG. 23A, the buffer size changing unit 32 inserts an invalid packet between the fourth packet and the fifth packet, which are valid packets. In the example of FIG. 23B, the buffer size changing unit 32 overlaps the fourth packet and the fifth packet, which are valid packets, so that two packet lengths become one packet length. Has been deleted.
 このように揺らぎ吸収処理部JAでは、ジッタバッファ30の蓄積パケット数からパケットカウント値が算出され、過去N回のパケットカウント値のうち、n番目に小さいパケットカウント値が代表値として特定される。そして、特定した代表値が基準値よりも大きければジッタバッファ30からパケットが削除される。そのため、パケットカウント値の過去の履歴からジッタバッファ30の蓄積パケット数が基準値と比べて大きい傾向にあり、出力遅延が生じる場合は、ジッタバッファ30からパケットが削除されるため、出力遅延が低減される。一方、パケットカウント値の過去の履歴からジッタバッファ30の蓄積パケット数が基準値に比べて小さい傾向にあり、パケットの枯渇が発生する可能性が高い場合は、ジッタバッファ30にパケットが挿入されるため、パケットの枯渇を防止することができる。 Thus, in the fluctuation absorption processing unit JA, the packet count value is calculated from the number of packets stored in the jitter buffer 30, and the nth smallest packet count value is specified as the representative value among the past N packet count values. If the identified representative value is larger than the reference value, the packet is deleted from the jitter buffer 30. For this reason, the number of packets stored in the jitter buffer 30 tends to be larger than the reference value from the past history of the packet count value, and if output delay occurs, the packet is deleted from the jitter buffer 30 and the output delay is reduced. Is done. On the other hand, if the number of packets stored in the jitter buffer 30 tends to be smaller than the reference value from the past history of the packet count value, and there is a high possibility that the packet will be exhausted, the packet is inserted into the jitter buffer 30 Therefore, it is possible to prevent packet depletion.
 次に、揺らぎ吸収処理におけるパケットカウント値の別の算出方法について説明する。ここで、受信時刻記録部33には、最新のパケットの受信時刻のみが記録される。 Next, another method for calculating the packet count value in the fluctuation absorbing process will be described. Here, in the reception time recording unit 33, only the reception time of the latest packet is recorded.
 カウント部31は、最新のパケットについては、カウント値を、算出時刻Tkと最新のパケットの受信時刻との差分ΔT/Taにより得られる値に設定し、それ以外のパケットはカウント値を1に設定してパケットカウント値を算出する。 The count unit 31 sets the count value for the latest packet to a value obtained by the difference ΔT / Ta between the calculation time Tk and the reception time of the latest packet, and sets the count value to 1 for other packets. To calculate a packet count value.
 図24に示すように、カウント部31は、ジッタバッファ30において、算出時刻Tkから、過去、パケット化周期Ta内に受信されたパケットが蓄積されている場合、パケット化周期Ta内に受信されたパケットの中から受信時刻が最新のパケットPSを特定し、最新のパケットPSのカウント値をΔT/Taに設定する。一方、カウント部31は、ジッタバッファ30に蓄積されたパケットのうち、最新のパケットPS以外のパケットPL1,PL2については、カウント値を一律に1に設定する。この場合、カウント部31は、算出時刻Tkから、過去、パケット化周期Ta内に受信されたパケットであって、最新のパケットPSの受信時刻のみが分かればよいので、パケットカウント値の算出処理が終了した後、受信時刻記録部33に記録されている受信記録を削除する。 As shown in FIG. 24, the counting unit 31 has received the packet received in the packetization period Ta in the jitter buffer 30 when the packets received in the past in the packetization period Ta have been accumulated from the calculation time Tk. The packet PS having the latest reception time is identified from the packets, and the count value of the latest packet PS is set to ΔT / Ta. On the other hand, the count unit 31 uniformly sets the count value to 1 for the packets PL1 and PL2 other than the latest packet PS among the packets stored in the jitter buffer 30. In this case, since the counting unit 31 only needs to know the reception time of the latest packet PS that is a packet received within the packetization period Ta in the past from the calculation time Tk, the packet count value calculation process is performed. After the completion, the reception record recorded in the reception time recording unit 33 is deleted.
 上記パケットカウント値の算出処理を、図25のフローチャートを参照して詳細に説明する。図25におけるステップS31,S33,S34,S36は、図21におけるステップS21,S23,S24,S26と同一であるため、説明を省く。図25におけるステップS32では、カウント部31は、ジッタバッファ30において、算出時刻Tkから、過去、パケット化周期Ta内に受信したパケットのうち最新のパケットの受信時刻を特定する。また、カウント部31は、算出時刻Tkから、最新のパケット以外のパケットについては、カウント値を一律に1に設定する(ステップS35)。そして、ステップS37において、カウント部31は、最新のパケットの受信時刻を受信時刻記録部33から削除する。 The packet count value calculation process will be described in detail with reference to the flowchart of FIG. Steps S31, S33, S34, and S36 in FIG. 25 are the same as steps S21, S23, S24, and S26 in FIG. In step S32 in FIG. 25, the counting unit 31 specifies the reception time of the latest packet among the packets received in the past in the packetization period Ta from the calculation time Tk in the jitter buffer 30. Further, the count unit 31 uniformly sets the count value to 1 for packets other than the latest packet from the calculation time Tk (step S35). In step S37, the count unit 31 deletes the latest packet reception time from the reception time recording unit 33.
 上述した方法でパケットカウント値を算出すれば、最新のパケットについてのみ、受信時刻を記録しておけばよいため、受信時刻記録部33の容量を更に節約することができる。 If the packet count value is calculated by the above-described method, it is only necessary to record the reception time for only the latest packet, so that the capacity of the reception time recording unit 33 can be further saved.
 ところで、パケット伝送方式による音声伝送においては、伝送路でパケットの滞留が突然起こることによるスパイク性遅延変動(スパイク遅延)により、500msec以上の音切れが発生することがある。したがって、揺らぎ吸収処理部JAにおいて、スパイク遅延の発生の有無を判定し、スパイク遅延が発生している場合は、参照する過去のパケットカウント値のウインドウ幅を短くし、短くしたウインドウ幅内のパケットカウント値から代表値を算出することが好ましい。 By the way, in voice transmission by the packet transmission method, sound interruption of 500 msec or more may occur due to spike delay variation (spike delay) due to sudden accumulation of packets in the transmission path. Therefore, the fluctuation absorption processing unit JA determines whether or not a spike delay has occurred. If a spike delay has occurred, the window width of the past packet count value to be referred to is shortened, and packets within the shortened window width are detected. It is preferable to calculate the representative value from the count value.
 そこで、カウント部31は、算出したパケットカウント値を、各パケットカウント値の時系列順序を示すためのインデックスと対応づけて観測履歴保持部37に保持させる。具体的には、観測履歴保持部37は過去N回のパケットカウント値を保持するため、カウント部31は、最新のパケットカウント値のインデックスがN、最古のパケットカウント値のインデックスが1となるように、算出時刻が新しくなるにつれてインデックスが増大するように過去N回のパケットカウント値にインデックスを付す。また、カウント部31は、観測履歴保持部37に保持された過去N回のパケットカウント値に基づいて、スパイク遅延の有無を判定し、スパイク遅延が発生していると判定した場合は、過去N回のパケットカウント値のうち、過去M(M<N)回のパケットカウント値を抽出する。 Therefore, the count unit 31 stores the calculated packet count value in the observation history holding unit 37 in association with an index for indicating the time-series order of each packet count value. Specifically, since the observation history holding unit 37 holds the packet count value of the past N times, the count unit 31 has an index of N for the latest packet count value and an index of 1 for the oldest packet count value. Thus, an index is added to the past N packet count values so that the index increases as the calculation time becomes new. The counting unit 31 determines the presence or absence of a spike delay based on the past N packet count values held in the observation history holding unit 37, and determines that the spike delay has occurred. From the packet count value of the number of times, the packet count value of the past M (M <N) times is extracted.
 ここで、カウント部31は、下記のようにしてスパイク遅延の有無を判定する。図26は、スパイク遅延の有無の判定処理を説明するためのグラフである。図26において、縦軸はパケットカウント値を示し、横軸はインデックスを示している。また、N=100としている。 Here, the counting unit 31 determines the presence or absence of a spike delay as follows. FIG. 26 is a graph for explaining the determination processing for the presence or absence of spike delay. In FIG. 26, the vertical axis indicates the packet count value, and the horizontal axis indicates the index. Also, N = 100.
 まず、カウント部31は、基準値以下のパケットカウント値を特定する。図26の例では点PP1~PP6のパケットカウント値が基準値以下となっている。次に、カウント部31は、基準値以下のパケットカウント値のうち、インデックスが最小、つまり最古の点と、インデックスが最大、つまり最新の点とを特定する。図26の例では、カウント部31は、点PP1と点PP6とを特定する。 First, the count unit 31 specifies a packet count value that is equal to or less than the reference value. In the example of FIG. 26, the packet count values at points PP1 to PP6 are below the reference value. Next, the count unit 31 specifies the smallest index, that is, the oldest point, and the largest index, that is, the latest point among packet count values equal to or less than the reference value. In the example of FIG. 26, the counting unit 31 specifies the points PP1 and PP6.
 次に、カウント部31は、最小のインデックスと最大のインデックスとの差分ΔIを求める。そして、カウント部31は、この差分ΔIが予め定められた閾値より小さければ、スパイク遅延が発生したと判定し、この差分ΔIが閾値より大きければスパイク遅延が発生していないと判定する。 Next, the count unit 31 obtains a difference ΔI between the minimum index and the maximum index. The counting unit 31 determines that a spike delay has occurred if the difference ΔI is smaller than a predetermined threshold, and determines that no spike delay has occurred if the difference ΔI is larger than the threshold.
 図27は、スパイク遅延が発生している場合のパケットカウント値とインデックスとの関係を示すグラフである。図27において縦軸はパケットカウント値を示し、横軸はインデックスを示している。図27の例では、点PP1~点PP5のパケットカウント値が基準値以下である。そして、点PP1はインデックスが最小であり、点PP5はインデックスが最大である。そして、点PP1のインデックスと点PP5のインデックスとの差分ΔIが閾値より小さい。そのため、カウント部31は、スパイク遅延が発生していると判定する。 FIG. 27 is a graph showing the relationship between the packet count value and the index when spike delay occurs. In FIG. 27, the vertical axis represents the packet count value, and the horizontal axis represents the index. In the example of FIG. 27, the packet count values at points PP1 to PP5 are equal to or less than the reference value. The point PP1 has the smallest index, and the point PP5 has the largest index. The difference ΔI between the index of the point PP1 and the index of the point PP5 is smaller than the threshold value. Therefore, the count unit 31 determines that a spike delay has occurred.
 そして、カウント部31は、図27に示すように、スパイク遅延が発生していると判定すると、算出時刻Tkから過去M個のパケットカウント値を抽出する。ここで、Mとしては、ΔIに所定の係数β(0<β≦1)を乗じた値(=β・ΔI)を、整数で丸め込んだ値を採用することができる。 When the counting unit 31 determines that the spike delay has occurred as shown in FIG. 27, the count unit 31 extracts the past M packet count values from the calculation time Tk. Here, as M, a value obtained by multiplying ΔI by a predetermined coefficient β (0 <β ≦ 1) (= β · ΔI) rounded by an integer can be adopted.
 そして、バッファサイズ変更部32は、過去M個のパケットカウント値のうち、m番目に小さなパケットカウント値を代表値として算出する。以下、バッファサイズ変更部32は、代表値を基準値と比較して、ジッタバッファ30にパケットを挿入又は削除する。ここで、mとしては、M×αを整数で丸め込んだ値を採用することができる。 Then, the buffer size changing unit 32 calculates the m-th smallest packet count value among the past M packet count values as a representative value. Thereafter, the buffer size changing unit 32 compares the representative value with the reference value, and inserts or deletes the packet in the jitter buffer 30. Here, as m, a value obtained by rounding M × α with an integer can be adopted.
 このように、スパイク遅延が発生した場合、参照する過去のパケットカウント値のウインドウ幅が狭められてジッタバッファ30にパケットの挿入又は削除が行われる。そのため、稀にしか発生しないスパイク遅延が排除されるようにして、代表値を算出することができる。 In this way, when a spike delay occurs, the window width of the past packet count value to be referred to is narrowed, and a packet is inserted into or deleted from the jitter buffer 30. Therefore, the representative value can be calculated in such a manner that spike delays that rarely occur are eliminated.
 また、揺らぎ吸収処理部JAにおいては、0の蓄積パケット数が連続して発生した場合、以下のようにパケットカウント値を算出することが好ましい。 Further, in the fluctuation absorption processing unit JA, when the number of accumulated packets of 0 occurs continuously, it is preferable to calculate the packet count value as follows.
 具体的には、カウント部31は、0の蓄積パケット数が連続して発生した場合、0の蓄積パケット数が連続した回数が増大するにつれて絶対値が増大する負の値を前記パケットカウント値として算出する。 Specifically, the count unit 31 sets, as the packet count value, a negative value that increases in absolute value as the number of consecutive 0 stored packet numbers increases when the number of 0 stored packet numbers continues. calculate.
 図28A及び28Bは、上記カウント部31の処理を説明する図である。図28Aにおいては、カウント周期Tbの各区間において、パケットカウント値の算出時刻Tk-4,Tk-3,Tk-2,Tk-1の直後にパケットがそれぞれ受信されている。また、出力部36は、各区間において、パケットを受信してから、次のパケットカウント値の算出時刻Tk-3,Tk-2,Tk-1,Tkが経過するまでに、ジッタバッファ30からパケット(受話音声データ)を読み出している。例えば算出時刻Tk-4の直後に受信されたパケットは、次の算出時刻Tk-3が経過するまでに読み出される。そのため、各算出時刻Tk-4,Tk-3,Tk-2,Tk-1,Tkにおいて、ジッタバッファ30における蓄積パケット数は0となっている。そのため、カウント部31は、算出時刻Tk-4,Tk-3,Tk-2,Tk-1,Tkのそれぞれにおいてパケットカウント値を0と算出してしまう。 28A and 28B are diagrams for explaining the processing of the counting unit 31. FIG. In FIG. 28A, packets are received immediately after the packet count value calculation times Tk-4, Tk-3, Tk-2, and Tk-1 in each section of the count cycle Tb. Further, the output unit 36 receives the packet from the jitter buffer 30 in each section until the next packet count value calculation time Tk-3, Tk-2, Tk-1, Tk elapses. Reading (received voice data). For example, a packet received immediately after the calculation time Tk-4 is read out until the next calculation time Tk-3 elapses. Therefore, at each calculation time Tk-4, Tk-3, Tk-2, Tk-1, Tk, the number of stored packets in the jitter buffer 30 is zero. Therefore, the count unit 31 calculates the packet count value as 0 at each of the calculation times Tk-4, Tk-3, Tk-2, Tk-1, and Tk.
 一方、図28Bにおいては、算出時刻Tk-4の少し前に1つのパケットを受信してから以降、パケットは受信されていない。なお、算出時刻Tk-4の少し前に受信されたパケットは、算出時刻Tk-4が経過してから次の算出時刻Tk-3が経過するまでの間に読み出される。この場合においても、算出時刻Tk-4における蓄積パケット数は1となるものの、それ以外の各算出時刻Tk-3,Tk-2,Tk-1,Tkにおける蓄積パケット数は0となっているため、カウント部31は、算出時刻Tk-3,Tk-2,Tk-1,Tkのそれぞれにおいてパケットカウント値を0と算出してしまう。 On the other hand, in FIG. 28B, no packet has been received since one packet was received slightly before the calculation time Tk-4. Note that a packet received shortly before the calculation time Tk-4 is read out after the calculation time Tk-4 has elapsed and until the next calculation time Tk-3 has elapsed. Even in this case, the number of stored packets at the calculation time Tk-4 is 1, but the number of stored packets at other calculation times Tk-3, Tk-2, Tk-1, and Tk is 0. The counting unit 31 calculates the packet count value as 0 at each of the calculation times Tk-3, Tk-2, Tk-1, and Tk.
 しかしながら、図28A及び28Bとでは、信号幹線Lsの状況が大きく異なっている。すなわち、図28Aにおいては、パケットは定期的に住戸機Aに到達しており、出力部36は連続して出力することが可能となるが、図28Bにおいては、パケットは定期的に住戸機Aに到達していないため、出力部36は連続して出力することができない。 However, in FIG. 28A and 28B, the situation of the signal trunk line Ls is greatly different. That is, in FIG. 28A, the packet periodically reaches the dwelling unit A, and the output unit 36 can continuously output the packet. However, in FIG. Therefore, the output unit 36 cannot output continuously.
 これを区別するため、カウント部31は、下記の処理を行う。先ず、算出時刻(現在時刻)と最新のパケットの受信時刻との差と、カウント周期Tbとを比較する。もし、当該差がカウント周期Tbよりも小さい場合は、図28Aの状況であると判断して処理を終える。一方、当該差がカウント周期Tbよりも大きい場合は、前回の算出時刻以後、パケット受信がなかった、すなわち図28Bの状況であると判断し、以下の処理を行う。つまり、図28Bに示すように、算出時刻Tk-3で蓄積パケット数が0であり、算出時刻Tk-2でも蓄積パケット数が0となっており、算出時刻Tk-2において、0の蓄積パケット数の連続回数が1回となっている。この場合、カウント部31は、0を算出時刻Tk-2におけるパケットカウント値として算出する。 In order to distinguish this, the counting unit 31 performs the following processing. First, the difference between the calculated time (current time) and the latest packet reception time is compared with the count cycle Tb. If the difference is smaller than the count cycle Tb, it is determined that the situation in FIG. On the other hand, if the difference is greater than the count cycle Tb, it is determined that no packet has been received since the previous calculation time, that is, the situation in FIG. 28B, and the following processing is performed. That is, as shown in FIG. 28B, the number of accumulated packets is 0 at the calculation time Tk-3, and the number of accumulated packets is 0 at the calculation time Tk-2. The number of consecutive numbers is one. In this case, the count unit 31 calculates 0 as the packet count value at the calculation time Tk-2.
 また、算出時刻Tk-1においては、0の蓄積パケット数の連続回数が2回となっている。そこで、カウント部31は、連続回数である2回から1を差し引いた値に-1を乗じた値である-1を、算出時刻Tk-1におけるパケットカウント値として算出する。算出時刻Tkにおいて、0の蓄積パケット数の連続回数は3であるため、カウント部23は、連続回数である3回から1を差し引いた値に-1を乗じた値である-2を算出時刻Tkにおけるパケットカウント値として算出する。ゆえに、カウント部31は、(連続回数-1)・(-1)をパケットカウント値として算出する。 In addition, at the calculation time Tk-1, the continuous number of 0 stored packets is two. Therefore, the count unit 31 calculates −1, which is a value obtained by multiplying the value obtained by subtracting 1 from 2 that is the number of consecutive times by −1, as the packet count value at the calculation time Tk−1. At the calculation time Tk, since the number of consecutive 0 stored packet numbers is 3, the count unit 23 calculates -2, which is a value obtained by multiplying the value obtained by subtracting 1 from 3 which is the number of consecutive times, and -1. Calculated as the packet count value at Tk. Therefore, the counting unit 31 calculates (number of consecutive times−1) · (−1) as the packet count value.
 これにより、図28Aのように、パケットは定期的に受信できているものの、算出時刻において、蓄積パケット数がたまたま0になっている場合と、図28Bのように、パケットを定期的に受信できていない場合との差を考慮してパケットカウント値を算出することが可能となる。したがって、図28Bの場合は、図28Aの場合よりもジッタバッファ30からパケットが削除され難くなる。 Thus, although the packet can be received periodically as shown in FIG. 28A, the packet can be received periodically as shown in FIG. 28B when the number of stored packets happens to be zero at the calculation time. The packet count value can be calculated in consideration of the difference from the case where the packet is not received. Therefore, in the case of FIG. 28B, packets are less likely to be deleted from the jitter buffer 30 than in the case of FIG. 28A.
 次に、ジッタバッファ30にパケットを挿入又は削除を行う処理を具体的に説明する。バッファサイズ変更部32は、ジッタバッファ30から1つのパケットを削除する場合、音声を含む有効パケットが連続して2つ以上存在すれば、これら連続する有効パケットのうち、中間に位置する連続する2つの有効パケットをオーバーラップ加算して削除する。 Next, a process for inserting or deleting a packet in the jitter buffer 30 will be specifically described. When the buffer size changing unit 32 deletes one packet from the jitter buffer 30, if there are two or more valid packets including voice in succession, two consecutive consecutive packets located in the middle of these consecutive valid packets will be described. Two valid packets are overlapped and deleted.
 図29A、29B及び29Cは、バッファサイズ変更部32がオーバーラップ加算により1つのパケットを削除する処理の説明図であり、図29Aは削除前のジッタバッファ30を示し、図29Bは削除後のジッタバッファ30を示している。 29A, 29B, and 29C are explanatory diagrams of processing in which the buffer size changing unit 32 deletes one packet by overlap addition, FIG. 29A shows the jitter buffer 30 before deletion, and FIG. 29B shows jitter after deletion. A buffer 30 is shown.
 図29A、29B及び29Cに示すリードポインタRPは、リングバッファ構造を有するジッタバッファ30の開始アドレスを示し、ライトポインタWPは、ジッタバッファ30の終了アドレスを示している。また、図29において、各升は1つのパケットを示しており、升内の数字はパケットの時系列の順序を示している。また、内部が白色の升は無効パケットを示し、内部がグレーの升は有効パケットを示している。 29A, 29B, and 29C, the read pointer RP indicates the start address of the jitter buffer 30 having a ring buffer structure, and the write pointer WP indicates the end address of the jitter buffer 30. In FIG. 29, each 升 indicates one packet, and the numbers in 升 indicate the time-series order of the packets. In addition, a white wrinkle indicates an invalid packet, and a gray wrinkle indicates a valid packet.
 図29Aの場合、1番目~2番目の有効パケットの区間ではなく、4番目~7番目の有効パケットの区間に位置する5番目と6番目との有効パケットが、図29Bに示すようにオーバーラップ加算により1つのパケットに纏められ、1つのパケットが削除されている。 In the case of FIG. 29A, the 5th and 6th valid packets located in the 4th to 7th valid packet sections, not the 1st to 2nd valid packet sections, overlap as shown in FIG. 29B. The packets are combined into one packet by addition, and one packet is deleted.
 ここで、図29Aに示す1番目~2番目の有効パケットの区間でオーバーラップ加算を行うと、オーバーラップ加算により生成された1つのパケットの次に無効パケットが存在することになるため、パケットロス隠蔽処理を行った場合の音声劣化が大きくなる可能性がある。一方、5番目の有効パケットと6番目の有効パケットとをオーバーラップ加算すると、オーバーラップ加算により生成された1つのパケットの前後のパケットは有効パケットであるため、パケットロス隠蔽処理による音声劣化を小さくすることができる。 Here, if overlap addition is performed in the first to second valid packet sections shown in FIG. 29A, an invalid packet exists after one packet generated by overlap addition. There is a possibility that speech degradation will increase when concealment processing is performed. On the other hand, if the fifth valid packet and the sixth valid packet are overlap-added, the packets before and after one packet generated by overlap addition are valid packets, so that the voice deterioration due to the packet loss concealment process is reduced. can do.
 つまり、有効パケットが2つ以上連続していれば、オーバーラップ加算により1つのパケットを削除することができるが、有効パケットの連続数が多い区間でオーバーラップ加算した方がパケットロス隠蔽処理を行ったときの音声劣化を少なくすることができる。 In other words, if two or more valid packets are consecutive, one packet can be deleted by overlap addition, but packet loss concealment processing is performed when overlap addition is performed in a section where there are many consecutive valid packets. It is possible to reduce voice deterioration when
 よって、ジッタバッファ30において、有効パケットが連続する区間が複数存在する場合、有効パケットの連続数が多い区間の中間の有効パケットを用いてオーバーラップ加算を行うようにしている。 Therefore, in the jitter buffer 30, when there are a plurality of sections in which valid packets are continuous, overlap addition is performed using a valid packet in the middle of a section in which the number of consecutive valid packets is large.
 ここで、オーバーラップ加算としては、図29Cに示すように、三角窓関数RF1,RF2を用いたオーバーラップ加算を採用することができる。具体的には、バッファサイズ変更部32は、5番目のパケットの音声信号に対して、三角窓関数RF1を用いた窓関数処理を行い、6番目のパケットの音声信号に対しては、三角窓関数RF2を用いた窓関数処理を行い、窓関数処理後の両音声信号を加算して1つの音声信号を生成し、これを1つにパケット化することで、オーバーラップ加算を行えばよい。 Here, as the overlap addition, overlap addition using triangular window functions RF1 and RF2 can be adopted as shown in FIG. 29C. Specifically, the buffer size changing unit 32 performs window function processing using the triangular window function RF1 on the audio signal of the fifth packet, and applies the triangular window to the audio signal of the sixth packet. The window function processing using the function RF2 is performed, the two audio signals after the window function processing are added to generate one audio signal, and this is packetized into one to perform overlap addition.
 ここで、三角窓関数RF1としては、時間幅が20msec、最大値が1、最小値が0であり、時間が経過するにつれて値が減少する一次関数を採用することができる。また、三角窓関数RF2としては、時間幅が20msec、最大値が1、最小値が0であり、時間が経過するにつれて値が増大する一次関数を採用することができる。 Here, as the triangular window function RF1, a linear function having a time width of 20 msec, a maximum value of 1 and a minimum value of 0 and decreasing in value as time passes can be adopted. As the triangular window function RF2, a linear function having a time width of 20 msec, a maximum value of 1 and a minimum value of 0 and increasing in value as time passes can be adopted.
 また、バッファサイズ変更部32は、ジッタバッファ30からパケットを削除する場合、過去に挿入した無効パケットがあれば、当該無効パケットを削除する。 Further, when deleting a packet from the jitter buffer 30, the buffer size changing unit 32 deletes the invalid packet if there is an invalid packet inserted in the past.
 図30A及び30Bは、バッファサイズ変更部32が1つの無効パケットを削除する処理の説明図であり、図30Aは削除前のジッタバッファ30を示し、図30Bは削除後のジッタバッファ30を示している。 30A and 30B are explanatory diagrams of processing in which the buffer size changing unit 32 deletes one invalid packet. FIG. 30A shows the jitter buffer 30 before deletion, and FIG. 30B shows the jitter buffer 30 after deletion. Yes.
 図30Aにおいては、3番目と4番目とのパケットが無効パケットである。そのため、バッファサイズ変更部32は、3番目又は4番目のパケットのいずれかを削除することで、1つのパケットを削除する。ここで、ジッタバッファ30に複数の無効パケットが存在する場合は、例えば、ランダムに1つの無効パケットを選択し、選択した無効パケットを削除するようにしてもよい。あるいは、バッファサイズ変更部32は、無効パケットが2つ以上連続して存在する場合は、連続している領域の無効パケットを優先して抽出し、抽出した無効パケットのうち、ランダムに1つの無効パケットを選択して削除するようにしてもよい。 In FIG. 30A, the third and fourth packets are invalid packets. Therefore, the buffer size changing unit 32 deletes one packet by deleting either the third or the fourth packet. Here, when there are a plurality of invalid packets in the jitter buffer 30, for example, one invalid packet may be selected at random, and the selected invalid packet may be deleted. Alternatively, when two or more invalid packets are continuously present, the buffer size changing unit 32 preferentially extracts invalid packets in a continuous area, and randomly selects one invalid packet from the extracted invalid packets. A packet may be selected and deleted.
 また、バッファサイズ変更部32は、ジッタバッファ30にパケットを挿入する場合、連続する2つの有効パケットが存在すれば、これら2つの有効パケットの間に無効パケットを挿入する。 In addition, when inserting a packet into the jitter buffer 30, the buffer size changing unit 32 inserts an invalid packet between these two valid packets if there are two consecutive valid packets.
 図31A及び31Bは、バッファサイズ変更部32が1つのパケットを挿入する処理の説明図であり、図31Aは挿入前のジッタバッファ30を示し、図31Bは挿入後のジッタバッファ30を示している。 31A and 31B are explanatory diagrams of processing in which the buffer size changing unit 32 inserts one packet. FIG. 31A shows the jitter buffer 30 before insertion, and FIG. 31B shows the jitter buffer 30 after insertion. .
 図31A及び31Bに示すように、5番目の有効パケットと6番目の有効パケットとの間に1つの無効パケットが挿入されている。これは、5番目の有効パケットと6番目の有効パケットとの間に1つの無効パケットを挿入するのが、連続する有効パケットの個数がより多くなるためである。 As shown in FIGS. 31A and 31B, one invalid packet is inserted between the fifth valid packet and the sixth valid packet. This is because inserting one invalid packet between the fifth valid packet and the sixth valid packet increases the number of consecutive valid packets.
 例えば、1番目の有効パケットと2番目の有効パケットとの間に無効パケットを挿入したとしても、挿入した無効パケットの前後に有効パケットが存在するため、パケット隠蔽処理を行うことは可能である。 For example, even if an invalid packet is inserted between the first valid packet and the second valid packet, there is a valid packet before and after the inserted invalid packet, so that packet concealment processing can be performed.
 しかしながら、2番目の有効パケットの前後が無効パケットとなってしまうため、有効パケットの連続数が小さくなってしまう。一方、5番目の有効パケットと6番目の有効パケットとの間に、無効パケットを挿入すると、全ての有効パケットが連続することになる。ここで、パケットロス隠蔽処理を行う場合、有効パケットの連続数が多い方が音声劣化を小さくすることができる。そこで、バッファサイズ変更部32は、ジッタバッファ30において有効パケットが連続する区間が複数存在する場合、有効パケットの連続数が多い区間の中間に無効パケットを挿入している。 However, before and after the second valid packet become invalid packets, the number of consecutive valid packets becomes small. On the other hand, if an invalid packet is inserted between the fifth valid packet and the sixth valid packet, all the valid packets are continuous. Here, when packet loss concealment processing is performed, voice deterioration can be reduced as the number of consecutive valid packets increases. Therefore, when there are a plurality of sections where valid packets continue in the jitter buffer 30, the buffer size changing unit 32 inserts invalid packets in the middle of a section where the number of consecutive valid packets is large.
 また、バッファサイズ変更部32は、一度に挿入又は削除することができるパケット数の上限値が予め定められている。 The buffer size changing unit 32 has a predetermined upper limit value for the number of packets that can be inserted or deleted at a time.
 図32A及び32Bは、ジッタバッファ30に5つのパケットを一度に挿入する場合の処理を説明するための図であり、図32Aは挿入前のジッタバッファ30を示し、図32Bは挿入後のジッタバッファ30を示している。図32A及び32Bにおいては、1番目の有効パケットと2番目の有効パケットとの間に5つの無効パケットが挿入されている。この場合、無効パケットが連続して存在しているため、音声劣化が増大する虞がある。そこで、無効パケットの挿入個数に上限値が設けられている。ここで、「一度に」とは、上述したカウント周期Tbが到達した時に実行される1回の処理を指している。 32A and 32B are diagrams for explaining processing when five packets are inserted into the jitter buffer 30 at once, FIG. 32A shows the jitter buffer 30 before insertion, and FIG. 32B shows the jitter buffer after insertion. 30 is shown. 32A and 32B, five invalid packets are inserted between the first valid packet and the second valid packet. In this case, since there are continuous invalid packets, there is a risk that voice deterioration will increase. Therefore, an upper limit is set for the number of invalid packets inserted. Here, “at once” refers to one process executed when the above-described count cycle Tb has been reached.
 例えば、図32Aにおいて、上限値=3に設定されていたとすると、5つの無効パケットを挿入する必要がある場合であっても、3個の無効パケットしか挿入されなくなる。 For example, if the upper limit value is set to 3 in FIG. 32A, even if it is necessary to insert five invalid packets, only three invalid packets are inserted.
 これにより、無効パケットの連続数が一定個数以上になることが防止され、パケットロス隠蔽処理による音声劣化を小さくすることができる。 This prevents the number of consecutive invalid packets from exceeding a certain number and reduces voice deterioration due to packet loss concealment processing.
 また、バッファサイズ変更部32は、無効パケットを削除した場合において、削除した無効パケットに対応する有効パケットを後から受信した場合、削除した無効パケットの他に無効パケットが存在すれば、他の無効パケットと受信した有効パケットとを入れ換える。 In addition, when the invalid packet is deleted, the buffer size changing unit 32 receives another valid packet corresponding to the deleted invalid packet. Replace the packet with the received valid packet.
 図33A、33B及び33Cは、無効パケットを削除した後に、削除した無効パケットに対応する有効パケットを受信した場合の処理を説明する図であり、図33Aは削除前のジッタバッファ30を示し、図33Bは削除後のジッタバッファ30を示し、図33Cは入れ換え後のジッタバッファ30を示している。 33A, 33B, and 33C are diagrams for explaining processing when a valid packet corresponding to a deleted invalid packet is received after deleting the invalid packet. FIG. 33A shows the jitter buffer 30 before deletion, and FIG. 33B shows the jitter buffer 30 after deletion, and FIG. 33C shows the jitter buffer 30 after replacement.
 図33A及び33Bに示すように3番目の無効パケットが削除されている。その後、図33Cに示すように3番目の無効パケットに対応する3番目の有効パケットが受信されている。 As shown in FIGS. 33A and 33B, the third invalid packet has been deleted. Thereafter, as shown in FIG. 33C, the third valid packet corresponding to the third invalid packet is received.
 この場合、バッファサイズ変更部32は、3番目の無効パケットの次の4番目のパケットが無効パケットであるため、この4番目の無効パケットを受信した3番目の有効パケットに入れ換える。これにより、3番目の有効パケットを復活させることができ、音声劣化を低減させることができる。 In this case, since the fourth packet next to the third invalid packet is an invalid packet, the buffer size changing unit 32 replaces the fourth invalid packet with the received third valid packet. As a result, the third valid packet can be restored, and voice deterioration can be reduced.
 ここで、バッファサイズ変更部32は、ジッタバッファ30にパケットが蓄積されると蓄積されたパケットに対応する無効パケットがジッタバッファ30に蓄積されているか否かを判定する。そして、バッファサイズ変更部32は、対応する無効パケットがジッタバッファ30に蓄積されている場合は、当該無効パケットの次に無効パケットが格納されているか否かを判定し、無効パケットが格納されている場合は、次の無効パケットを削除し、削除した箇所に受信した有効パケットを挿入することで、次の無効パケットと受信した有効パケットとを入れ換えればよい。 Here, when a packet is accumulated in the jitter buffer 30, the buffer size changing unit 32 determines whether or not invalid packets corresponding to the accumulated packet are accumulated in the jitter buffer 30. Then, if the corresponding invalid packet is accumulated in the jitter buffer 30, the buffer size changing unit 32 determines whether the invalid packet is stored next to the invalid packet, and the invalid packet is stored. If it is, the next invalid packet is deleted, and the received valid packet is inserted into the deleted location, so that the next invalid packet and the received valid packet are exchanged.
 一方、バッファサイズ変更部32は、ジッタバッファ30に蓄積されたパケットに対応する無効パケットがジッタバッファ30に蓄積されていない場合、あるいは、対応する無効パケットの次に無効パケットが格納されていない場合、上記の入れ換えを行わない。なお、バッファサイズ変更部32は、無効パケットのシーケンス番号と同じシーケンス番号を有するパケットがジッタバッファ30に蓄積された場合、無効パケットに対応する有効パケットが受信されたと判断すればよい。 On the other hand, when the invalid packet corresponding to the packet accumulated in the jitter buffer 30 is not accumulated in the jitter buffer 30, or the invalid packet is not stored next to the corresponding invalid packet, the buffer size changing unit 32 The above replacement is not performed. The buffer size changing unit 32 may determine that a valid packet corresponding to an invalid packet has been received when a packet having the same sequence number as that of the invalid packet is accumulated in the jitter buffer 30.
 また、バッファサイズ変更部32は、連続する2つの有効パケットの間にパケットを挿入する場合、隠蔽処理部35に対して、前に位置する有効パケットを用いてパケットロス隠蔽処理を実行させ、隠蔽処理されたパケットを生成させ、このパケットをジッタバッファ30に挿入するようにしてもよい。 Also, when inserting a packet between two consecutive valid packets, the buffer size changing unit 32 causes the concealment processing unit 35 to execute a packet loss concealment process using the previous valid packet, thereby concealing. A processed packet may be generated and inserted into the jitter buffer 30.
 図34A及び34Bは、バッファサイズ変更部32が無効パケットに代えて隠蔽処理されたパケットをジッタバッファ30に挿入させる場合の処理を説明する図であり、図34Aは挿入前のジッタバッファ30を示し、図34Bは挿入後のジッタバッファ30を示している。 34A and 34B are diagrams for explaining processing when the buffer size changing unit 32 inserts a concealed packet in place of an invalid packet into the jitter buffer 30, and FIG. 34A shows the jitter buffer 30 before insertion. FIG. 34B shows the jitter buffer 30 after insertion.
 図34A及び34Bに示すように、3番目の有効パケットと4番目の有効パケットとの間に隠蔽処理されたパケットが挿入されている。 34A and 34B, a concealed packet is inserted between the third valid packet and the fourth valid packet.
 これにより、出力部36がジッタバッファ30からパケット(音声データ)を読み出す際に、パケットロス隠蔽処理を実行する必要がなくなり、出力時におけるパケットロス隠蔽処理の処理遅延を低減することができる。 Thus, when the output unit 36 reads a packet (voice data) from the jitter buffer 30, it is not necessary to execute the packet loss concealment process, and the processing delay of the packet loss concealment process at the time of output can be reduced.
 なお、バッファサイズ変更部32は、無効パケットを挿入する場合、母音の音声を含む連続する2つのパケットの間に、無効パケットを挿入することが好ましい。これにより、挿入した無効パケットに対してパケットロス隠蔽処理を実行することで生成された音声が、前後のパケットに含まれる音声と連続的に繋げられ、音声劣化を低減させることができる。 Note that when inserting an invalid packet, the buffer size changing unit 32 preferably inserts an invalid packet between two consecutive packets including vowel sounds. Thereby, the voice generated by executing the packet loss concealment process on the inserted invalid packet is continuously connected to the voice included in the preceding and succeeding packets, and voice deterioration can be reduced.
 図35は、バッファサイズ変更部32による削除処理を示したフローチャートである。 FIG. 35 is a flowchart showing the deletion process by the buffer size changing unit 32.
 まず、ステップS51において、バッファサイズ変更部32は、パケットの削除要求数が予め定められたパケット最大削除数(上限値)以下であるか否かを判定し、削除要求数が上限値以下である場合(ステップS51でYES)、削除カウント値DNを削除要求数に設定する(ステップS52)。一方、削除要求数が上限値より大きい場合(ステップS51でNO)、削除カウント値DNを上限値に設定する(ステップS53)。 First, in step S51, the buffer size changing unit 32 determines whether or not the number of packet deletion requests is equal to or less than a predetermined maximum packet deletion number (upper limit), and the number of deletion requests is equal to or less than the upper limit value. If so (YES in step S51), the deletion count value DN is set to the number of deletion requests (step S52). On the other hand, when the number of deletion requests is larger than the upper limit value (NO in step S51), the deletion count value DN is set to the upper limit value (step S53).
 次に、バッファサイズ変更部32は、ジッタバッファ30において、連続する有効パケットの最大連続数が2以上の場合(ステップS54で2以上)、最大連続数が削除カウント値DNの2倍以上であるか否かを判定する(ステップS55)。ここで、最大連続数が削除カウント値DNの2倍であるか否かを判定するのは、1つのパケットを削除する場合は、2つのパケットがオーバーラップ加算されるため、有効パケットが削除カウント値DNの2倍必要になるからである。 Next, when the maximum continuous number of consecutive valid packets is 2 or more (2 or more in step S54) in the jitter buffer 30, the buffer size changing unit 32 has a maximum continuous number that is twice or more the deletion count value DN. It is determined whether or not (step S55). Here, it is determined whether or not the maximum continuous number is twice the deletion count value DN. When one packet is deleted, two packets are overlap-added. This is because twice the value DN is required.
 そして、バッファサイズ変更部32は、最大連続数が削除カウント値DNの2倍以上であると判定すると(ステップS55でYES)、オーバーラップ加算により削除カウント値DN分のパケットを削除し、削除カウント値DNから削除したパケット数を減じ、削除カウント値DNを更新する(ステップS58)。 When the buffer size changing unit 32 determines that the maximum number of consecutive times is twice or more the deletion count value DN (YES in step S55), the buffer size changing unit 32 deletes the packet corresponding to the deletion count value DN by overlap addition, The delete count value DN is updated by subtracting the number of deleted packets from the value DN (step S58).
 一方、ステップS55において、最大連続数が削除カウント値DNの2倍未満である場合(ステップS55でNO)、バッファサイズ変更部32は、削除可能なパケットをオーバーラップ加算によって削除し、削除カウント値DNから削除したパケット数を減じ、削除カウント値DNを更新し(ステップS56)、処理をステップS54に戻す。 On the other hand, when the maximum continuous number is less than twice the deletion count value DN in step S55 (NO in step S55), the buffer size changing unit 32 deletes the deleteable packet by overlap addition, and deletes the deletion count value. The number of deleted packets is subtracted from the DN, the deletion count value DN is updated (step S56), and the process returns to step S54.
 例えば、最大連続数が7、削除カウント値DN(=4)×2が8の場合、連続する7個の有効パケットのうち、6個の有効パケットを2個ずつオーバーラップ加算して、3個のパケットを削除する。そして、削除カウント値DNを、DN=1(=4-3)に更新する。 For example, when the maximum consecutive number is 7 and the deletion count value DN (= 4) × 2 is 8, 3 valid packets are overlapped by adding 6 valid packets of 7 consecutive valid packets. Delete the packet. Then, the deletion count value DN is updated to DN = 1 (= 4-3).
 一方、ステップS54において、連続する有効パケットの最大連続数が1以下である場合(ステップS54で1以下)、無効パケットを削除し、削除カウント値DNから削除したパケット数を減じ、削除カウント値DNを更新する(ステップS57)。 On the other hand, in step S54, if the maximum number of consecutive valid packets is 1 or less (1 or less in step S54), invalid packets are deleted, and the deleted count value DN is subtracted from the deleted count value DN. Is updated (step S57).
 例えば、削除カウント値DNが4、無効パケットの個数が3とすると、3個の無効パケットが削除され、DN=1(=4-3)に更新される。 For example, if the deletion count value DN is 4 and the number of invalid packets is 3, then 3 invalid packets are deleted and updated to DN = 1 (= 4-3).
 ステップS59において、バッファサイズ変更部32は、削除カウント値DNが0となったか否かを判定し、削除カウント値DNが0である場合(ステップS59でYES)、処理を終了する。 In step S59, the buffer size changing unit 32 determines whether or not the deletion count value DN is 0. If the deletion count value DN is 0 (YES in step S59), the process ends.
 一方、ステップS59において、バッファサイズ変更部32は、削除カウント値DNが0になっていない場合(ステップS59でNO)、有効パケットがあれば(ステップS60でYES)、有効パケットを削除して処理を終了する(ステップS61)。この場合、削除される有効パケットは他の有効パケットと連続していないため、オーバーラップ加算によらず、単純に削除される。一方、有効パケットが無ければ(ステップS60でNO)、そのまま処理が終了される。 On the other hand, in step S59, if the deletion count value DN is not 0 (NO in step S59), the buffer size changing unit 32 deletes the effective packet and processes it if there is a valid packet (YES in step S60). Is finished (step S61). In this case, since the valid packet to be deleted is not continuous with other valid packets, it is simply deleted regardless of overlap addition. On the other hand, if there is no valid packet (NO in step S60), the process is terminated as it is.
 図36は、バッファサイズ変更部32による挿入処理を示したフローチャートである。 FIG. 36 is a flowchart showing the insertion processing by the buffer size changing unit 32.
 まず、ステップS71において、バッファサイズ変更部32は、パケットの挿入要求数が予め定められたパケット最大挿入数(上限値)以下であるか否かを判定し、削除要求数が最大挿入数以下である場合(ステップS71でYES)、挿入数を挿入要求数に設定する(ステップS72)。一方、挿入要求数が最大挿入数より大きい場合(ステップS71でNO)、挿入数を最大挿入数に設定する(ステップS73)。 First, in step S71, the buffer size changing unit 32 determines whether or not the number of packet insertion requests is equal to or less than a predetermined maximum packet insertion number (upper limit), and the number of deletion requests is equal to or less than the maximum number of insertions. If there is (YES in step S71), the number of insertions is set to the number of insertion requests (step S72). On the other hand, if the number of insertion requests is larger than the maximum number of insertions (NO in step S71), the number of insertions is set to the maximum number of insertions (step S73).
 次に、バッファサイズ変更部32は、ジッタバッファ30において、連続する有効パケットの最大連続数が0の場合(ステップS74で0)、ジッタバッファ30の先頭から挿入数分の無効パケットを挿入し(ステップS75)、処理を終了する。 Next, when the maximum number of consecutive valid packets is 0 in the jitter buffer 30, the buffer size changing unit 32 inserts as many invalid packets as the number of insertions from the beginning of the jitter buffer 30 (0 in step S74). Step S75), the process is terminated.
 また、バッファサイズ変更部32は、ジッタバッファ30において、連続する有効パケットの最大連続数が2以上の場合(ステップS74で2以上)、連続する有効パケットの区間の真ん中に挿入数分、無効パケットを挿入し(ステップS76)、処理を終了する。 Further, when the maximum number of consecutive valid packets in the jitter buffer 30 is 2 or more (2 or more in step S74), the buffer size changing unit 32 inserts invalid packets by the number of insertions in the middle of the continuous valid packet section. Is inserted (step S76), and the process is terminated.
 また、バッファサイズ変更部32は、ジッタバッファ30において、連続する有効パケットの最大連続数が1の場合(ステップS74で1)、有効パケットの直後に挿入数分、無効パケットを挿入し(ステップS77)、処理を終了する。 When the maximum number of consecutive valid packets is 1 in the jitter buffer 30 (1 in step S74), the buffer size changing unit 32 inserts invalid packets for the number of insertions immediately after the valid packets (step S77). ), The process ends.
 このように、ジッタバッファ30から1つのパケットを削除する場合、有効パケットが2つ以上連続する区間の真ん中に位置する2つのパケットをオーバーラップ加算して1つのパケットを生成することにより1つのパケットを削除しているため、音声の品質劣化を低減させることができる。 As described above, when one packet is deleted from the jitter buffer 30, one packet is generated by overlapping and adding two packets located in the middle of a section where two or more valid packets are continuous. Therefore, voice quality degradation can be reduced.
 また、ジッタバッファ30にパケットを挿入する場合、連続する2つの有効パケットが存在すれば、これら2つの有効パケットの間に無効パケットが挿入されるため、無効パケットは、2つの有効パケットの間に挟まれることになり、この無効パケットに対してパケットロス隠蔽処理を実行した場合、前後の有効パケットから無効パケットが隠蔽でき、音声の連続性が保たれ、音声を滑らかに再生することができる。 Further, when a packet is inserted into the jitter buffer 30, if there are two consecutive valid packets, an invalid packet is inserted between these two valid packets. When packet loss concealment processing is executed for this invalid packet, the invalid packet can be concealed from the preceding and succeeding valid packets, the continuity of the voice is maintained, and the voice can be reproduced smoothly.
 なお、揺らぎ吸収処理部JAの隠蔽処理部35が行うパケットロス隠蔽処理は、上述した音声データ欠落補償処理部VCによる音声データ欠落補償処理で代用することができる。 Note that the packet loss concealment processing performed by the concealment processing unit 35 of the fluctuation absorption processing unit JA can be replaced by the voice data loss compensation processing by the voice data loss compensation processing unit VC described above.
 上述のように本実施形態の住戸機Aでは、相手の通話端末がアナログ伝送方式の場合には通話処理部2が第1のソフトウェアを実行し、パケット伝送方式の場合には通話処理部2が第2のソフトウェアを実行することにより、それぞれの伝送方式に適した通話処理を選択的に実行することができる。その結果、回路構成の複雑化とコスト上昇を抑えつつ、信号幹線Ls経由の音声伝送にはパケット伝送方式を用いるとともに信号幹線Lsを経由しない宅内近傍の音声伝送にはアナログ伝送方式を用いることを可能とし且つ通話品質の向上を図ることができる。 As described above, in the dwelling unit A of the present embodiment, the call processing unit 2 executes the first software when the other party's call terminal is an analog transmission method, and the call processing unit 2 is the case when the other terminal is a packet transmission method. By executing the second software, call processing suitable for each transmission method can be selectively executed. As a result, while suppressing the complexity of the circuit configuration and cost increase, the packet transmission method is used for voice transmission via the signal trunk line Ls, and the analog transmission method is used for voice transmission in the vicinity of the house not via the signal trunk line Ls. It is possible to improve the call quality.
 (実施形態2)
 以下、図37、38を参照して本発明の実施形態2を詳細に説明する。なお、明瞭のため同様の要素には実施形態1の集合住宅用インターホンシステムと同じ符号が割り当てられて説明を省略する。
(Embodiment 2)
Hereinafter, the second embodiment of the present invention will be described in detail with reference to FIGS. For the sake of clarity, the same elements as those in the intercom system for multi-dwelling houses of Embodiment 1 are assigned to the same elements, and the description thereof is omitted.
 上述した実施形態1における音声データ欠落補償処理及び話速変換処理は何れも音声のピッチを利用しているので、それぞれに音声のピッチを検出するピッチ検出処理を行う必要がある。しかしながら、音声データ欠落補償処理のプログラムと話速変換処理のプログラムがそれぞれピッチ検出処理のプログラム(プログラムモジュール)を装備すると、プログラムをロードするメモリを無駄に消費してしまうことになる。そこで本実施形態では、音声のピッチを検出するピッチ検出処理のプログラムを音声データ欠落補償処理並びに話速変換処理のプログラムから独立させ、音声データ欠落補償処理並びに話速変換処理においてピッチ検出処理で検出されるピッチを共用することに特徴があり、これにより、メモリの無駄な消費を抑えることができる。 Since both the voice data loss compensation process and the speech speed conversion process in the first embodiment described above use the pitch of the voice, it is necessary to perform a pitch detection process for detecting the pitch of the voice. However, if the audio data loss compensation processing program and the speech speed conversion processing program are each equipped with a pitch detection processing program (program module), a memory for loading the program is wasted. Therefore, in this embodiment, the pitch detection processing program for detecting the pitch of the speech is made independent of the speech data missing compensation processing and the speech speed conversion processing program, and is detected by the pitch detection processing in the speech data missing compensation processing and speech speed conversion processing. This is characterized in that the same pitch is shared, and this can reduce wasteful consumption of memory.
 以下、本実施形態の通話処理部2について説明する。なお、本実施形態の話速変換処理部SEは、話速変換処理以外の声質変換処理、音声区間検出処理、音声強調処理、話者判別処理、音声認識処理などを実行するものであっても構わない。 Hereinafter, the call processing unit 2 of the present embodiment will be described. Note that the speech speed conversion processing unit SE of the present embodiment may execute voice quality conversion processing other than speech speed conversion processing, speech segment detection processing, speech enhancement processing, speaker discrimination processing, speech recognition processing, and the like. I do not care.
 本実施形態の通話処理部2は、図37に示すように音響側エコーキャンセラEC1、音声スイッチVS、音声データ欠落検出部15、ピッチ検出部16、音声データ欠落補償処理部VC、話速変換処理部SEを備えている。音声データ欠落検出部15は、伝送処理部7から出力される音声データの欠落を検出するものであって、伝送処理部7のジッタバッファから出力される音声データが連続しない場合に音声データの欠落とみなして検出フラグを立てる。なお、音声データが欠落する原因としては、実施形態1で述べた通り伝送に伴うパケット損失や遅延及びジッタ(揺らぎ)などがある。 As shown in FIG. 37, the call processing unit 2 of the present embodiment includes an acoustic echo canceller EC1, a voice switch VS, a voice data missing detection unit 15, a pitch detection unit 16, a voice data missing compensation processing unit VC, and a speech speed conversion process. Department SE is provided. The audio data loss detection unit 15 detects the loss of audio data output from the transmission processing unit 7, and the audio data is lost when the audio data output from the jitter buffer of the transmission processing unit 7 is not continuous. A detection flag is set up. Note that the cause of missing audio data includes packet loss, delay, and jitter (fluctuation) associated with transmission as described in the first embodiment.
 ピッチ検出部16は、音声データ欠落検出部15からの検出フラグと、ピッチ検出部16内部のカウンタに基づいて、音声データ欠落補償処理部VCから出力される音声データ(欠落補償された音声データ又は欠落補償されていない音声データ。以下、同じ。)から音声のピッチを検出するものである。ピッチ検出の具体的な方法としては、例えば、フレーム長を変えつつ音声の自己相関を算出し、最も相関が高くなるフレーム長をその音声のピッチと推定する方式などを用いることが考えられる。音声データ欠落補償処理部VCは、音声データ欠落検出部15で音声データの欠落が検出されたとき(検出フラグが立ったとき)にピッチ検出部16で検出されるピッチに基づいて音声データの欠落を補償する。具体的には、音声データ欠落補償処理部VCではバッファに保持した過去の音声データから1ピッチ分の音声データを抽出して埋め合わせることで音声が途切れないようにしている。但し、音声データ欠落補償処理部VCは音声データに欠落がなければ、入力された音声データを欠落補償せずにそのまま出力する。 Based on the detection flag from the audio data loss detection unit 15 and the counter inside the pitch detection unit 16, the pitch detection unit 16 outputs audio data (audio data with missing compensation or This is to detect the pitch of audio from audio data that has not been compensated for omission (the same applies hereinafter). As a specific method of pitch detection, for example, a method of calculating the autocorrelation of speech while changing the frame length and estimating the frame length having the highest correlation as the pitch of the speech may be used. The audio data loss compensation processing unit VC detects the audio data loss based on the pitch detected by the pitch detection unit 16 when the audio data loss detection unit 15 detects the audio data loss (when the detection flag is set). To compensate. Specifically, the audio data loss compensation processing unit VC extracts audio data for one pitch from past audio data held in the buffer and makes up for it so that the audio is not interrupted. However, if there is no missing voice data, the voice data missing compensation processing unit VC outputs the input voice data as it is without missing compensation.
 話速変換処理部SEは、音声データ欠落補償処理部VCから出力される音声データを伸長又は圧縮することで元の音声の話速を変換するものであって、例えば、PICOLA(Pointer Interval Controlled OverLap and Add)と呼ばれる従来周知の話速変換アルゴリズムに基づき、ピッチ単位で波形の挿入または削除を行うことによって話速を変換(速く又は遅く)している。なお、これらの各部は、DSP(Digital Signal Proccesor)に所定のプログラムを実行させることで実現される。 The speech rate conversion processing unit SE converts the speech rate of the original speech by expanding or compressing the speech data output from the speech data loss compensation processing unit VC.For example, PICOLA (Pointer Interval Controlled OverLap The speech speed is converted (fast or slow) by inserting or deleting waveforms in units of pitches based on a conventionally known speech speed conversion algorithm called “and Add”. These units are realized by causing a DSP (Digital Signal Processor) to execute a predetermined program.
 ここで、音声データ欠落補償処理部VCと話速変換処理部SEが個々にピッチ検出処理を行った場合、通話処理部2で音声データ欠落補償処理と話速変換処理が同時に実行されるときの処理負荷が増大してしまう。これに対して本実施形態の通話処理部2はピッチ検出部16を1つしか備えておらず、音声データ欠落補償処理部VCと話速変換処理部SEの双方が共通のピッチ検出部16で検出されるピッチを利用している。従って、音声データ欠落補償処理部VCと話速変換処理部SEの双方がピッチ検出部16で検出されるピッチを共用することにより、音声データ欠落補償処理と話速変換処理を同時に実行する際の処理負荷(DSPにおけるプログラム処理の負荷)の増大を抑えることができる。 Here, when the voice data loss compensation processing unit VC and the speech speed conversion processing unit SE individually perform pitch detection processing, when the voice data loss compensation processing and the speech speed conversion processing are simultaneously executed in the call processing unit 2 The processing load increases. On the other hand, the call processing unit 2 of the present embodiment has only one pitch detection unit 16, and both the voice data loss compensation processing unit VC and the speech rate conversion processing unit SE are a common pitch detection unit 16. The detected pitch is used. Therefore, when both the voice data loss compensation processing unit VC and the speech speed conversion processing unit SE share the pitch detected by the pitch detection unit 16, the voice data loss compensation processing and the speech speed conversion processing are executed simultaneously. An increase in processing load (DSP processing load on the DSP) can be suppressed.
 本実施形態におけるピッチ検出部16は、図38に示すように所定の検出周期Txをカウントするとともに検出周期Txに同期してピッチを繰り返し検出し、音声データ欠落検出部15で音声データの欠落が検出されたときに音声データ欠落の検出時点t1でピッチを検出するとともに当該検出時点t1から検出周期Txのカウントを再開している。すなわち、ピッチ検出部16が一定の検出周期Txに同期してピッチを繰り返し検出することにより、話速変換処理部SEが話速変換処理を実行する音声区間のピッチとピッチ検出部16で検出されるピッチとの差異が減少するので、話速変換後の音声の品質を保つことができる。なお、音声が定常と見なし得る時間、例えば、10ミリ秒程度に検出周期Txを設定することが望ましい。 As shown in FIG. 38, the pitch detection unit 16 in the present embodiment counts a predetermined detection cycle Tx and repeatedly detects the pitch in synchronization with the detection cycle Tx, and the audio data loss detection unit 15 detects that audio data is missing. When detected, the pitch is detected at the detection time point t1 of the missing audio data, and the detection cycle Tx is restarted from the detection time point t1. That is, when the pitch detection unit 16 repeatedly detects the pitch in synchronization with a certain detection cycle Tx, the speech speed conversion processing unit SE detects the pitch of the speech section in which the speech speed conversion process is executed and the pitch detection unit 16 detects the pitch. Therefore, the quality of speech after conversion of speech speed can be maintained. It should be noted that it is desirable to set the detection cycle Tx to a time during which the voice can be regarded as steady, for example, about 10 milliseconds.
 一方、音声データの欠落補償処理においては、話速変換処理に比べて長い区間を補償しなければならないため、より正確なピッチの検出が必要となる。したがって、音声データ欠落検出部15で音声データの欠落が検出された場合、ピッチ検出部16は検出周期Txに関係なく直ちにピッチを検出することにより、音声データ欠落補償処理部VCの音声データ欠落補償処理における品質を保つことができる。 On the other hand, in the voice data loss compensation process, a longer interval must be compensated as compared to the speech speed conversion process, so that more accurate pitch detection is required. Therefore, when the audio data loss detection unit 15 detects the audio data loss, the pitch detection unit 16 immediately detects the pitch regardless of the detection cycle Tx, so that the audio data loss compensation processing unit VC performs the audio data loss compensation. Quality in processing can be maintained.
 ここで、ピッチ検出部16は所定の周波数範囲のピッチのみを検出することが望ましい。すなわち、通常の音声通話における音声波形の周波数が百数十ヘルツから千数百ヘルツの周波数範囲内に収まっているので、当該周波数範囲のピッチのみを検出すれば、不要な周波数範囲のピッチ検出を行わないことで処理負荷を軽減することができる。 Here, it is desirable that the pitch detection unit 16 detects only a pitch in a predetermined frequency range. In other words, since the frequency of the voice waveform in a normal voice call is within the frequency range of a few hundred tens to a few hundreds of hertz, if only the pitch in the frequency range is detected, the pitch detection in the unnecessary frequency range can be performed. By not doing so, the processing load can be reduced.
 また話速変換処理部SEは、音声データの音声区間を検出し、当該音声区間の音声データのみを話速変換することが望ましい。すなわち、音声区間以外の区間(例えば、無音区間)で話速変換処理を行わないことにより、話速変換処理における処理負荷を軽減することができる。 Also, it is desirable that the speech speed conversion processing unit SE detects the speech section of the speech data and converts only the speech data in the speech section. That is, the processing load in the speech speed conversion process can be reduced by not performing the speech speed conversion process in a section other than the speech section (for example, a silent section).
 (実施形態3)
 以下、図39A~42を参照して本発明の実施形態3を詳細に説明する。なお、明瞭のため同様の要素には実施形態2の集合住宅用インターホンシステムと同じ符号が割り当てられて説明を省略する。
(Embodiment 3)
Hereinafter, the third embodiment of the present invention will be described in detail with reference to FIGS. 39A to 42. For the sake of clarity, similar elements are assigned the same reference numerals as those for the intercom system for collective housing of the second embodiment, and description thereof is omitted.
 本実施形態における音声データ欠落検出部15は、1パケット分の音声データの時間長τを正の整数mで除した第1の時間間隔T1(=τ/m)と音声データの入力タイミングに同期して音声データの欠落を検出している。また本実施形態におけるピッチ検出部16は、第1の時間間隔T1を正の整数n倍した検出周期Tx(=n×τ/m)と第1の時間間隔T1に同期してピッチを検出している。 The voice data loss detection unit 15 in this embodiment is synchronized with the first time interval T1 (= τ / m) obtained by dividing the time length τ of voice data for one packet by a positive integer m and the voice data input timing. The lack of voice data is detected. The pitch detector 16 in the present embodiment detects the pitch in synchronization with the detection period Tx (= n × τ / m) obtained by multiplying the first time interval T1 by a positive integer n and the first time interval T1. ing.
 ここで、m=n=4とした場合の音声データ欠落検出処理とピッチ検出処理の実行タイミングについて、図39A及び39Bを参照して説明する。音声データ欠落検出部15並びにピッチ検出部16は、図39Aに示すようにτ/4時間毎に音声データ欠落検出処理、ピッチ検出処理をそれぞれ実行している。そして、図39Bに示すように時刻t=t0の時点で話速変換処理の開始が指示されたとすると、話速変換処理部SEは当該時点(時刻t=t0)の直前にピッチ検出部16で検出された最新のピッチを利用して話速変換処理を実行する。 Here, the execution timing of the audio data loss detection process and the pitch detection process when m = n = 4 will be described with reference to FIGS. 39A and 39B. As shown in FIG. 39A, the voice data loss detection unit 15 and the pitch detection unit 16 perform a voice data loss detection process and a pitch detection process every τ / 4 hours. Then, as shown in FIG. 39B, if the start of the speech speed conversion process is instructed at the time t = t0, the speech speed conversion processing section SE uses the pitch detection section 16 immediately before the time (time t = t0). The speech speed conversion process is executed using the latest detected pitch.
 このように音声データの欠落検出処理が実行されるタイミングとピッチ検出処理が実行されるタイミングを同期させておけば、ピッチ検出部16でピッチ検出処理を実行するタイミングの制御が簡便になるという利点がある。 As described above, if the timing at which the audio data loss detection process is executed and the timing at which the pitch detection process is executed are synchronized, the control of the timing at which the pitch detection unit 16 executes the pitch detection process is simplified. There is.
 また、図40に示すように話速変換処理の開始が指示された時点(時刻t=t0)で音声データの欠落が検出されていた場合、話速変換処理部SEが、音声データの欠落が検出される直前にピッチ検出部16で検出されたピッチを用いて話速変換を行えば、話速変換処理による音声の品質劣化を抑えることができる。 Also, as shown in FIG. 40, when the lack of voice data is detected at the time when the start of the speech speed conversion process is instructed (time t = t0), the speech speed conversion processing unit SE detects that the voice data is missing. If speech speed conversion is performed using the pitch detected by the pitch detection unit 16 immediately before detection, it is possible to suppress deterioration in speech quality due to the speech speed conversion processing.
 あるいは、図41に示すように話速変換処理の開始が指示された時点(時刻t=t0)で音声データの欠落が検出されていた場合、話速変換処理部SEが、音声データ欠落補償処理部VCで補償された音声データからピッチ検出部16で検出されたピッチを用いて話速変換を行うようにしても構わない。このようにすれば、音声データが欠落しているときに話速変換処理を開始する場合においても、ピッチ検出部16が一定の検出周期Txでピッチ検出処理を実行すればよいので、ピッチ検出部16でピッチ検出処理を実行するタイミングの制御が簡便になるという利点がある。 Alternatively, as shown in FIG. 41, when voice data loss is detected at the time when the start of speech speed conversion processing is instructed (time t = t0), the speech speed conversion processing unit SE performs voice data loss compensation processing. The speech speed conversion may be performed using the pitch detected by the pitch detection unit 16 from the voice data compensated by the unit VC. In this way, even when the speech speed conversion process is started when audio data is missing, the pitch detection unit 16 only needs to execute the pitch detection process at a constant detection cycle Tx. 16 has an advantage that the control of the timing for executing the pitch detection process becomes simple.
 ここで、本実施形態の住戸機Aが音声データ欠落補償処理部VCから出力される音声データを録音することができる録音部(図示せず)を有しており、録音音声データに対して話速変換処理部SEで話速変換処理を実施する場合を考える。録音再生の場合においては、音声区間のみならず非音声区間に対しても話速変換処理を行った方がより聞き取りやすさが向上する。一方、通常通話時において非音声区間に対しても話速変換処理を行なうと話速変換処理による遅延が大きくなり自然な通話の妨げとなる。このように非音声区間に対しても話速変換処理を行なう場合、図42に示すように音声区間における検出周期Tx1よりも非音声区間における検出周期Tx2を長くする(Tx1<Tx2)ことが望ましい。これにより、音声区間においては相対的に短い検出周期Tx1でピッチ検出が行われるために話速変換処理の品質を確保し、非音声区間においては相対的に長い検出周期Tx2でピッチ検出が行われるために処理負荷を軽減することができる。 Here, the dwelling unit A according to the present embodiment has a recording unit (not shown) that can record the audio data output from the audio data loss compensation processing unit VC. Consider a case where speech speed conversion processing is performed by the speed conversion processing section SE. In the case of recording / playback, the ease of listening is improved by performing the speech speed conversion process not only on the speech section but also on the non-speech section. On the other hand, if the speech speed conversion process is performed even for a non-speech section during a normal call, a delay due to the speech speed conversion process increases, which hinders natural conversation. When speech speed conversion processing is also performed for a non-speech segment in this way, it is desirable to make the detection cycle Tx2 in the non-speech segment longer (Tx1 <Tx2) than the detection cycle Tx1 in the speech segment as shown in FIG. . As a result, since the pitch detection is performed with a relatively short detection cycle Tx1 in the speech section, the quality of the speech speed conversion process is ensured, and the pitch detection is performed with a relatively long detection cycle Tx2 in the non-speech section. Therefore, the processing load can be reduced.
 本発明を幾つかの好ましい実施形態について記述したが、この発明の本来の精神および範囲、即ち請求の範囲を逸脱することなく、当業者によって様々な修正および変形が可能である。 While the invention has been described in terms of several preferred embodiments, various modifications and variations can be made by those skilled in the art without departing from the true spirit and scope of the invention, ie, the claims.

Claims (27)

  1.  集合住宅の共用玄関に設置される共用部装置と、当該集合住宅の各住戸内に設置される住戸機と、前記集合住宅の外玄関に設置されるドアホン子機と、前記共用部装置に接続された信号幹線と、当該信号幹線から分岐されて前記各住戸機に接続される住戸線と、前記住戸機とドアホン子機を接続する子機接続線とを有し、前記共用部装置と前記住戸機の間、並びに前記住戸機同士の間では前記信号幹線及び住戸線を介したパケット伝送方式によって通話音声が伝送され、前記住戸機と前記ドアホン子機との間では前記子機接続線を介してアナログ伝送方式によって通話音声が伝送される集合住宅用インターホンシステムの住戸機であって、
     マイクロホン及びスピーカと、通話用の音声データが含まれる音声パケット及び呼制御用の制御データが含まれる制御パケットを前記住戸線並びに前記信号幹線を介して伝送する伝送処理部と、前記子機接続線を介してアナログの音声信号を伝送するアナログ信号伝送部と、前記マイクロホンから出力されるアナログの音声信号を音声データに変換し、音声データをアナログの音声信号に変換して前記スピーカに出力する第1の変換処理部と、前記アナログ信号伝送部で受信するアナログの音声信号を音声データに変換し、音声データをアナログの音声信号に変換して前記アナログ信号伝送部に出力する第2の変換処理部と、音声データに対して所定の通話処理を行う通話処理部と、前記ドアホン子機からの呼出を検出するドアホン呼出検出部と、アナログ伝送方式で伝送される音声データに対する通話処理用の第1のソフトウェアとパケット伝送方式で伝送される音声データに対する通話処理用の第2のソフトウェアを記憶する記憶部と、前記通話処理部に対して通話処理の実行を指示する制御部とを備え、
     当該制御部は、前記ドアホン呼出検出部が前記呼出を検出した場合は前記第1のソフトウェアを実行するように前記通話処理部に指示し、前記共用部装置若しくは住戸機から呼制御用の制御データを受信した場合は前記第2のソフトウェアを実行するように前記通話処理部に指示することを特徴とする集合住宅用インターホンシステムの住戸機。
    Connected to the common unit device installed in the common entrance of the apartment house, the dwelling unit installed in each dwelling unit of the apartment building, the door phone slave unit installed in the outer entrance of the apartment building, and the shared unit device A signal main line, a dwell unit line branched from the signal main line and connected to each dwell unit, a slave unit connection line connecting the dwell unit and the door phone slave unit, the shared unit device and the Call voice is transmitted between the dwelling units and between the dwelling units by the packet transmission method via the signal trunk line and the dwelling unit line, and between the dwelling unit and the door phone slave unit, the cordless handset connection line is connected. A dwelling unit of an intercom system for apartment houses, in which call voice is transmitted via an analog transmission method,
    A microphone and a speaker; a transmission processing unit that transmits a voice packet including voice data for calling and a control packet including control data for call control via the dwelling unit line and the signal trunk line; and the slave unit connection line An analog signal transmission unit for transmitting an analog audio signal via the first, an analog audio signal output from the microphone is converted into audio data, and the audio data is converted into an analog audio signal and output to the speaker. 1 conversion processing unit and a second conversion process for converting an analog audio signal received by the analog signal transmission unit into audio data, converting the audio data into an analog audio signal, and outputting the analog audio signal to the analog signal transmission unit Unit, a call processing unit that performs predetermined call processing on voice data, and a door phone call detection that detects a call from the door phone slave unit A storage unit that stores first software for speech processing for voice data transmitted in an analog transmission system and second software for speech processing for speech data transmitted in a packet transmission system; And a control unit for instructing execution of call processing to
    The control unit instructs the call processing unit to execute the first software when the door phone call detection unit detects the call, and receives control data for call control from the shared unit device or the dwelling unit. When the mobile phone is received, the call processing unit is instructed to execute the second software.
  2.  前記第2のソフトウェアは、前記マイクロホンとスピーカの音響結合によって生じる音響エコーを抑圧する音響エコー抑圧処理のプログラムと、前記音響エコー抑圧処理では抑圧しきれない残留エコーを抑圧する残留エコー抑圧処理のプログラムとを含むことを特徴とする請求項1記載の集合住宅用インターホンシステムの住戸機。 The second software includes an acoustic echo suppression processing program for suppressing acoustic echo generated by acoustic coupling of the microphone and a speaker, and a residual echo suppression processing program for suppressing residual echo that cannot be suppressed by the acoustic echo suppression processing. The dwelling unit of the intercom system for collective housing according to claim 1, characterized in that
  3.  前記第2のソフトウェアは、前記伝送処理部における伝送遅延の揺らぎを吸収する揺らぎ吸収処理のプログラムを含むことを特徴とする請求項1又は2記載の集合住宅用インターホンシステムの住戸機。 3. The dwelling unit for an intercom system for an apartment house according to claim 1 or 2, wherein the second software includes a fluctuation absorption processing program for absorbing fluctuations in transmission delay in the transmission processing section.
  4.  前記伝送処理部で受信した前記音声パケットに含まれている音声データを蓄積する揺らぎ吸収用バッファを備え、
     前記揺らぎ吸収処理プログラムは、前記音声パケットのパケット化周期よりも長くない周期で前記揺らぎ吸収用バッファに蓄積されている音声データのパケット数をカウントしてパケットカウント値を算出するカウントステップと、前記カウントステップで算出される前記パケットカウント値に基づいて、前記揺らぎ吸収用バッファにパケットを挿入又は削除するバッファサイズ変更ステップとを前記通話処理部に行わせることを特徴とする請求項3記載の集合住宅用インターホンシステムの住戸機。
    A fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit;
    The fluctuation absorbing processing program counts the number of voice data packets stored in the fluctuation absorbing buffer at a period not longer than the packetization period of the voice packet and calculates a packet count value; 4. The set according to claim 3, wherein the call processing unit is caused to perform a buffer size changing step of inserting or deleting a packet in the fluctuation absorbing buffer based on the packet count value calculated in the counting step. 5. Residential intercom system dwelling unit.
  5.  前記揺らぎ吸収処理用プログラムは、前記バッファサイズ変更ステップにおいて、前記パケットカウント値の過去の履歴を基に、パケットカウント値の代表値を算出し、算出した代表値が所定の基準値より大きい場合、前記揺らぎ吸収用バッファからパケットを削除し、前記代表値が前記基準値より小さい場合、前記揺らぎ吸収用バッファにパケットを挿入する処理を前記通話処理部に行わせることを特徴とする請求項4記載の集合住宅用インターホンシステムの住戸機。 The fluctuation absorption processing program calculates a representative value of the packet count value based on the past history of the packet count value in the buffer size changing step, and if the calculated representative value is larger than a predetermined reference value, 5. The call processing unit according to claim 4, wherein when the packet is deleted from the fluctuation absorbing buffer and the representative value is smaller than the reference value, the call processing unit performs processing to insert the packet into the fluctuation absorbing buffer. Intercom system dwelling unit for multiple dwelling houses.
  6.  前記揺らぎ吸収処理用プログラムは、最新のパケットの受信時刻を前記通話処理部に記録させ、前記カウントステップにおいて、前記最新のパケットのカウント値を、前記パケットカウント値の算出タイミングである算出時刻と前記受信時刻との差分を前記パケット化周期で除した値に設定し、前記最新のパケット以外のパケットのカウント値を1に設定して前記パケットカウント値を算出する処理を前記通話処理部に行わせることを特徴とする請求項4又は5記載の集合住宅用インターホンシステムの住戸機。 The fluctuation absorption processing program causes the call processing unit to record the latest packet reception time, and in the counting step, the latest packet count value is calculated as the calculation time of the packet count value and the calculation time. Set the difference from the reception time divided by the packetization period, set the count value of packets other than the latest packet to 1, and cause the call processing unit to perform the process of calculating the packet count value The dwelling unit of the intercom system for collective housing according to claim 4 or 5.
  7.  前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、過去N(Nは正の整数値)回のパケットカウント値を前記通話処理部に保持させ、前記バッファサイズ変更ステップにおいて、前記過去N回のパケットカウント値のうち、n(nはN未満の正の整数値)番目に小さいパケットカウント値を前記代表値とする処理を前記通話処理部に行わせることを特徴とする請求項5記載の集合住宅用インターホンシステムの住戸機。 The fluctuation absorbing processing program causes the call processing unit to hold the packet count value of the past N (N is a positive integer value) times in the counting step, and in the buffer size changing step, the packet of the past N times 6. The housing complex according to claim 5, wherein said call processing unit is caused to perform a process of setting a packet count value that is nth smallest (n is a positive integer value less than N) among said count values as said representative value. Intercom system dwelling unit.
  8.  前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、前記過去N回のパケットカウント値に基づいて、スパイク遅延の有無を判定し、当該スパイク遅延が発生していると判定した場合は、前記過去N回のパケットカウント値のうち、過去M(MはM<Nの正の整数値)回のパケットカウント値を抽出する処理を前記通話処理部に行わせ、前記バッファサイズ変更ステップにおいて、前記カウントステップにより抽出された過去M回のパケットカウント値のうち、m(mはM未満の整数)番目に小さいパケットカウント値を前記代表値として算出する処理を前記通話処理部に行わせることを特徴とする請求項5記載の集合住宅用インターホンシステムの住戸機。 In the counting step, the fluctuation absorbing processing program determines the presence or absence of a spike delay based on the past N packet count values, and determines that the spike delay has occurred. The packet processing unit is caused to perform a process of extracting the packet count value of the past M (M is a positive integer value of M <N) out of the packet count value of the number of times, and in the buffer size changing step, the counting step The call processing unit is caused to perform a process of calculating, as the representative value, a packet count value that is mth (m is an integer less than M) among the past M packet count values extracted by The dwelling unit of the intercom system for apartment houses according to claim 5.
  9.  前記揺らぎ吸収処理用プログラムは、前記カウントステップにおいて、前記パケットカウント値が連続してゼロとなった場合、当該連続してゼロとなった回数が増大するにつれて絶対値が増大する負の値を前記パケットカウント値として算出する処理を前記通話処理部に行わせることを特徴とする請求項4~8の何れか1項に記載の集合住宅用インターホンシステムの住戸機。 In the counting step, when the packet count value is continuously zero in the counting step, the fluctuation absorbing processing program sets a negative value that increases in absolute value as the number of times of continuous zero increases. 9. The dwelling unit for an intercom system for an apartment house according to any one of claims 4 to 8, wherein the call processing unit is caused to perform processing for calculating a packet count value.
  10.  前記第2のソフトウェアは、前記伝送処理部で受信した前記音声パケットに含まれている音声データの全部又は一部が欠落した場合、欠落していない音声データを利用して、欠落した前記音声データの全部又は一部を補償する音声データ欠落補償処理のプログラムを含むことを特徴とする請求項1~9の何れか1項に記載の集合住宅用インターホンシステムの住戸機。 When all or part of the audio data included in the audio packet received by the transmission processing unit is missing, the second software uses the audio data that is not missing, and the missing audio data 10. The dwelling unit for an apartment intercom system according to any one of claims 1 to 9, comprising a program for audio data loss compensation processing that compensates for all or part of the intercom system.
  11.  前記伝送処理部で受信した前記音声パケットに含まれている音声データを蓄積する揺らぎ吸収用バッファを備え、
     前記揺らぎ吸収処理プログラムは、前記揺らぎ吸収用バッファに蓄積されている音声データのパケット数をカウントしてパケットカウント値を算出するカウントステップと、前記カウントステップで算出される前記パケットカウント値に基づいて、前記揺らぎ吸収用バッファにパケットを挿入又は削除するバッファサイズ変更ステップとを前記通話処理部に行わせるとともに、前記バッファサイズ変更ステップにおいて、前記揺らぎ吸収用バッファから1つのパケットを削除する場合、音声データを含む有効なパケットが連続して2つ以上存在すれば、これら連続する有効パケットのうち、中間に位置する連続する2つの有効パケットをオーバーラップ加算して削除する処理を前記通話処理部に行わせることを特徴とする請求項3記載の集合住宅用インターホンシステムの住戸機。
    A fluctuation absorbing buffer for accumulating voice data included in the voice packet received by the transmission processing unit;
    The fluctuation absorption processing program counts the number of packets of audio data stored in the fluctuation absorption buffer to calculate a packet count value, and based on the packet count value calculated in the count step A buffer size changing step for inserting or deleting a packet in the fluctuation absorbing buffer is performed by the call processing unit, and in the buffer size changing step, one packet is deleted from the fluctuation absorbing buffer. If there are two or more valid packets containing data in succession, the call processing unit performs processing for overlapping and deleting two consecutive valid packets located in the middle of these consecutive valid packets. 4. A set according to claim 3, wherein Dwelling units machine intercom system for the home.
  12.  前記揺らぎ吸収処理用プログラムは、前記バッファサイズ変更ステップにおいて、前記揺らぎ吸収用バッファにパケットを挿入する場合、連続する2つの有効パケットが存在すれば、これら2つの有効パケットの間に、音声を含まない無効なパケットを挿入する処理を前記通話処理部に行わせることを特徴とする請求項11記載の集合住宅用インターホンシステムの住戸機。 In the fluctuation absorption processing program, when a packet is inserted into the fluctuation absorption buffer in the buffer size changing step, if there are two consecutive valid packets, audio is included between the two valid packets. 12. The dwelling unit of an intercom system for an apartment house according to claim 11, wherein the call processing unit is caused to perform processing for inserting a non-invalid packet.
  13.  前記第2のソフトウェアは、前記伝送処理部が出力する音声データの全部又は一部の欠落を検出する音声データ欠落検出処理のプログラムと、前記音声データから音声のピッチを検出するピッチ検出処理のプログラムと、前記音声データ欠落検出処理で音声データの欠落が検出されたときに前記ピッチ検出処理で検出されるピッチに基づいて、欠落した音声データを補償する音声データ欠落補償処理のプログラムとを含み、
     前記ピッチ検出処理プログラムは、現時点から過去に向けてある時間幅の音声信号を基準信号として設定する処理と、前記基準信号を前記音声信号に対して現時点から過去に向けてスライドさせ、前記基準信号と前記音声信号との相関を求めることで、前記音声信号のピッチを検出するとともに、前記基準信号のスライド量が増大するにつれて前記基準信号の時間幅を増大させる処理とを前記通話処理部に行わせることを特徴とする請求項1~12の何れか1項に記載の集合住宅用インターホンシステムの住戸機。
    The second software includes: a program for detecting missing audio data that detects all or part of the audio data output from the transmission processing unit; and a program for detecting pitch from the audio data. And a program of audio data loss compensation processing that compensates for missing audio data based on the pitch detected in the pitch detection processing when audio data loss is detected in the audio data loss detection processing,
    The pitch detection processing program is a process of setting a sound signal having a time width from the present time to the past as a reference signal, and sliding the reference signal from the present time to the past with respect to the sound signal, And a process of increasing the time width of the reference signal as the slide amount of the reference signal is increased in the call processing unit. 13. The dwelling unit for an apartment intercom system according to any one of claims 1 to 12, characterized in that
  14.  前記ピッチ検出処理プログラムは、前記基準信号のスライド量が所定のスライド基準値になるまで、前記基準信号の時間幅を所定の初期時間幅に設定する処理を前記通話処理部に行わせることを特徴とする請求項13記載の集合住宅用インターホンシステムの住戸機。 The pitch detection processing program causes the call processing unit to perform a process of setting a time width of the reference signal to a predetermined initial time width until a slide amount of the reference signal reaches a predetermined slide reference value. The dwelling unit of the intercom system for collective housing according to claim 13.
  15.  前記ピッチ検出処理プログラムは、平均振幅差関数法により前記基準信号と前記音声信号との相関を求める処理を前記通話処理部に行わせることを特徴とする請求項13又は14記載の集合住宅用インターホンシステムの住戸機。 15. The intercom for collective housing according to claim 13 or 14, wherein the pitch detection processing program causes the call processing unit to perform processing for obtaining a correlation between the reference signal and the audio signal by an average amplitude difference function method. System dwelling machine.
  16.  前記ピッチ検出処理プログラムは、式(1)の平均振幅差関数を用いて前記基準信号と前記音声信号との相関を求める処理を前記通話処理部に行わせることを特徴とする請求項15記載の集合住宅用インターホンシステムの住戸機。
    Figure JPOXMLDOC01-appb-M000001
    但し、φ(τ)は相関値、Nは前記基準信号の時間幅、x(j)は前記基準信号、x(j-τ)は前記音声信号、k+1は前記基準信号の開始点、aは予め定められた係数、τは前記基準信号のスライド量をそれぞれ示す。
    The pitch detection processing program causes the call processing unit to perform a process of obtaining a correlation between the reference signal and the voice signal using an average amplitude difference function of Expression (1). A dwelling unit for an intercom system for apartment buildings.
    Figure JPOXMLDOC01-appb-M000001
    Where φ (τ) is the correlation value, N is the time width of the reference signal, x (j) is the reference signal, x (j−τ) is the audio signal, k + 1 is the starting point of the reference signal, a represents a predetermined coefficient, and τ represents the slide amount of the reference signal.
  17.  前記第2のソフトウェアは、前記伝送処理部が出力する音声データの全部又は一部の欠落を検出する音声データ欠落検出処理のプログラムと、前記音声データから音声のピッチを検出するピッチ検出処理のプログラムと、前記音声データ欠落検出処理で音声データの欠落が検出されたときに前記ピッチ検出処理で検出されるピッチに基づいて、欠落した音声データを補償する音声データ欠落補償処理のプログラムと、前記ピッチ検出処理で検出されるピッチを利用して前記音声データを伸長又は圧縮する話速変換処理のプログラムとを含むことを特徴とする請求項3記載の集合住宅用インターホンシステムの住戸機。 The second software includes: a program for detecting missing audio data that detects all or part of the audio data output from the transmission processing unit; and a program for detecting pitch from the audio data. A voice data missing compensation processing program that compensates for missing voice data based on a pitch detected by the pitch detection processing when voice data missing is detected by the voice data missing detection processing, and the pitch The dwelling unit of the intercom system for an apartment house according to claim 3, further comprising: a speech speed conversion processing program that expands or compresses the audio data using a pitch detected by the detection processing.
  18.  前記ピッチ検出処理は、所定の検出周期をカウントするとともに当該検出周期に同期して前記ピッチを繰り返し検出し、前記音声データ欠落検出処理で音声データの欠落が検出されたときは当該音声データ欠落の検出時点で前記ピッチを検出するとともに当該検出時点から前記検出周期のカウントを再開することを特徴とする請求項17記載の集合住宅用インターホンシステムの住戸機。 The pitch detection process counts a predetermined detection period and repeatedly detects the pitch in synchronization with the detection period. When the voice data loss detection process detects a lack of voice data, The dwelling unit of the intercom system for an apartment house according to claim 17, wherein the pitch is detected at a detection time and counting of the detection period is restarted from the detection time.
  19.  前記ピッチ検出処理は、所定の周波数範囲のピッチのみを検出することを特徴とする請求項17または18記載の集合住宅用インターホンシステムの住戸機。 19. The dwelling unit of an intercom system for an apartment house according to claim 17 or 18, wherein the pitch detection process detects only a pitch in a predetermined frequency range.
  20.  前記話速変換処理は、前記音声データの音声区間を検出し、当該音声区間の音声データのみを話速変換することを特徴とする請求項17記載の集合住宅用インターホンシステムの住戸機。 18. The dwelling unit for an intercom system for an apartment house according to claim 17, wherein the speech speed conversion processing detects a voice section of the voice data, and converts only the voice data of the voice section.
  21.  前記音声データ欠落検出処理は、1パケット分の前記音声データの時間長を正の整数で除した第1の時間間隔と前記音声データの入力タイミングに同期して音声データの欠落を検出し、前記ピッチ検出処理は、前記第1の時間間隔を正の整数倍した前記検出周期と当該第1の時間間隔に同期してピッチを検出することを特徴とする請求項18記載の集合住宅用インターホンシステムの住戸機。 The voice data loss detection process detects a voice data loss in synchronization with a first time interval obtained by dividing a time length of the voice data for one packet by a positive integer and the input timing of the voice data, 19. The collective housing intercom system according to claim 18, wherein the pitch detection processing detects the pitch in synchronization with the detection period obtained by multiplying the first time interval by a positive integer and the first time interval. Dwelling machine.
  22.  前記話速変換処理は、前記音声データ欠落検出処理が音声データの欠落を検出しているときに話速変換を行う場合、前記音声データ欠落検出処理が音声データの欠落を検出する直前に前記ピッチ検出処理で検出されたピッチを用いて話速変換を行うことを特徴とする請求項17記載の集合住宅用インターホンシステムの住戸機。 When the speech speed conversion process performs speech speed conversion when the voice data loss detection process detects a loss of voice data, the pitch immediately before the voice data loss detection process detects a voice data loss 18. The dwelling unit of an intercom system for an apartment house according to claim 17, wherein speech rate conversion is performed using the pitch detected in the detection process.
  23.  前記話速変換処理は、前記音声データ欠落検出処理が音声データの欠落を検出しているときに話速変換を行う場合、前記音声データ欠落補償処理で補償された音声データから前記ピッチ検出処理で検出されたピッチを用いて話速変換を行うことを特徴とする請求項17記載の集合住宅用インターホンシステムの住戸機。 In the speech speed conversion process, when the speech speed conversion is performed when the voice data loss detection process detects a loss of voice data, the pitch detection process uses the voice data compensated in the voice data loss compensation process. 18. The dwelling unit for an intercom system for an apartment house according to claim 17, wherein speech speed conversion is performed using the detected pitch.
  24.  前記ピッチ検出処理は、前記音声データの音声区間と非音声区間とを判別し、前記音声区間における前記検出周期よりも前記非音声区間における前記検出周期を長くすることを特徴とする請求項18記載の集合住宅用インターホンシステムの住戸機。 19. The pitch detection process determines a speech interval and a non-speech interval of the speech data, and makes the detection cycle in the non-speech interval longer than the detection cycle in the speech interval. Intercom system dwelling unit for multiple dwelling houses.
  25.  前記第2のソフトウェアは、前記マイクロホンとスピーカの音響結合によって生じる音響エコー経路により形成される閉ループの一巡利得を低減してハウリングを抑制する音声スイッチ処理のプログラムを含み、当該音声スイッチ処理プログラムは、前記音響エコー経路の帰還利得を推定し、当該帰還利得の推定値に基づいて、前記伝送処理部から出力される受話の音声データを減衰させる受話側減衰量と、前記伝送処理部に入力される送話の音声データを減衰させる送話側減衰量との総和を算出するとともに、送話及び受話の各音声データを監視して通話状態を推定し、当該通話状態の推定結果と前記総和の算出値に応じて前記送話側減衰量と前記受話側減衰量の配分を決定し、前記帰還利得の推定値の減少量に応じて前記総和を減少させる処理を前記通話処理部に行わせることを特徴とする請求項1~24の何れか1項に記載の集合住宅用インターホンシステムの住戸機。 The second software includes a voice switch processing program that suppresses howling by reducing a loop gain of a closed loop formed by an acoustic echo path generated by acoustic coupling between the microphone and a speaker. A feedback gain of the acoustic echo path is estimated, and based on the estimated value of the feedback gain, a reception-side attenuation amount for attenuating received voice data output from the transmission processing unit, and input to the transmission processing unit The sum of the attenuation on the transmission side for attenuating the voice data of the transmission is calculated, and the call state is estimated by monitoring each voice data of the transmission and reception, and the estimation result of the call state and the calculation of the sum are calculated. The distribution of the transmission-side attenuation and the reception-side attenuation is determined according to the value, and the sum is decreased according to the decrease in the estimated value of the feedback gain. Dwelling machine collective housing intercom system according to any one of claims 1 to 24 that processes, characterized in that to perform the call processing unit.
  26.  住宅内に設置される通話装置が接続される内線接続線と、当該内線接続線を介してアナログの音声信号を伝送する内線用アナログ信号伝送部とを備え、前記通話処理部で前記第1のソフトウェアを実行して通話処理された音声データが前記内線用アナログ信号伝送部から前記内線接続線を介して前記通話装置に伝送されることを特徴とする請求項1~25の何れか1項に記載の集合住宅用インターホンシステムの住戸機。 An extension connection line to which a communication device installed in a house is connected, and an extension analog signal transmission unit for transmitting an analog voice signal via the extension connection line, and the call processing unit includes the first 26. The voice data that has been subjected to call processing by executing software is transmitted from the extension analog signal transmission unit to the call device via the extension connection line. The dwelling unit of the intercom system for apartment houses described.
  27.  前記第1のソフトウェアは、前記アナログの音声信号がA/D変換されたデジタルの音声信号から音声のピッチを検出するとともに当該ピッチを利用して前記デジタルの音声信号を伸長又は圧縮する話速変換処理のプログラムを含むことを特徴とする請求項1~26の何れか1項に記載の集合住宅用インターホンシステムの住戸機。 The first software detects speech pitch from a digital audio signal obtained by A / D converting the analog audio signal and uses the pitch to expand or compress the digital audio signal. The dwelling unit for an intercom system for an apartment house according to any one of claims 1 to 26, comprising a processing program.
PCT/JP2010/062581 2010-05-24 2010-07-27 Dwelling unit device for interphone system for residential complex WO2011148519A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2012517086A JP5544012B2 (en) 2010-05-24 2010-07-27 Apartment unit intercom system dwelling unit and apartment house intercom system
CN201080067044.6A CN102918825B (en) 2010-05-24 2010-07-27 Dwelling master unit multidwelling intercom system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2010118723 2010-05-24
JP2010-118723 2010-05-24
JP2010-129196 2010-06-04
JP2010129196 2010-06-04

Publications (1)

Publication Number Publication Date
WO2011148519A1 true WO2011148519A1 (en) 2011-12-01

Family

ID=45003524

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/062581 WO2011148519A1 (en) 2010-05-24 2010-07-27 Dwelling unit device for interphone system for residential complex

Country Status (4)

Country Link
JP (1) JP5544012B2 (en)
CN (1) CN102918825B (en)
TW (1) TWI442759B (en)
WO (1) WO2011148519A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014110554A (en) * 2012-12-03 2014-06-12 Denso Corp Hands-free speech apparatus

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2824917A1 (en) * 2013-07-08 2015-01-14 Fermax Design & Development, S.L.U. Two-wire multichannel video door system
US9947334B2 (en) * 2014-12-12 2018-04-17 Qualcomm Incorporated Enhanced conversational communications in shared acoustic space
JP5984029B1 (en) * 2015-12-24 2016-09-06 パナソニックIpマネジメント株式会社 Doorphone system and communication control method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001211254A (en) * 2000-01-27 2001-08-03 Matsushita Electric Ind Co Ltd Information terminal and information terminal system
JP2003324525A (en) * 2002-05-06 2003-11-14 Sharp Corp System and method for virtual multiline telephony in a home-network telephone
JP2005109833A (en) * 2003-09-30 2005-04-21 Aiphone Co Ltd Interphone device
JP2008061005A (en) * 2006-08-31 2008-03-13 Aiphone Co Ltd Apartment building intercom system
JP2010028771A (en) * 2008-07-24 2010-02-04 Panasonic Electric Works Co Ltd Intercom system for multiple dwelling houses

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0588041U (en) * 1992-04-27 1993-11-26 日通工株式会社 Home bus
CA2327813A1 (en) * 1999-12-07 2001-06-07 Kazuo Yahiro Information terminal and information terminal system
FR2911598B1 (en) * 2007-01-22 2009-04-17 Soitec Silicon On Insulator SURFACE RUGOSIFICATION METHOD

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001211254A (en) * 2000-01-27 2001-08-03 Matsushita Electric Ind Co Ltd Information terminal and information terminal system
JP2003324525A (en) * 2002-05-06 2003-11-14 Sharp Corp System and method for virtual multiline telephony in a home-network telephone
JP2005109833A (en) * 2003-09-30 2005-04-21 Aiphone Co Ltd Interphone device
JP2008061005A (en) * 2006-08-31 2008-03-13 Aiphone Co Ltd Apartment building intercom system
JP2010028771A (en) * 2008-07-24 2010-02-04 Panasonic Electric Works Co Ltd Intercom system for multiple dwelling houses

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014110554A (en) * 2012-12-03 2014-06-12 Denso Corp Hands-free speech apparatus

Also Published As

Publication number Publication date
TWI442759B (en) 2014-06-21
JP5544012B2 (en) 2014-07-09
CN102918825B (en) 2014-05-07
CN102918825A (en) 2013-02-06
JPWO2011148519A1 (en) 2013-07-25
TW201143350A (en) 2011-12-01

Similar Documents

Publication Publication Date Title
JP5061853B2 (en) Echo canceller and echo cancel program
US20080205632A1 (en) Packet voice system with far-end echo cancellation
CN103391381A (en) Method and device for canceling echo
JP5086769B2 (en) Loudspeaker
JPS59193660A (en) Conference telephone set
JP5544012B2 (en) Apartment unit intercom system dwelling unit and apartment house intercom system
JP3385221B2 (en) Echo canceller
JP5821022B2 (en) External line transfer device for intercom system for apartment houses
JP5923705B2 (en) Call signal processing device
US8737601B2 (en) Echo canceller
JP2003051879A (en) Speech device
JP5963077B2 (en) Telephone device
JP4543896B2 (en) Echo cancellation method, echo canceller, and telephone repeater
JP4380688B2 (en) Telephone device
JP3864915B2 (en) Loudspeaker calling system for apartment houses
JP4079008B2 (en) Loudspeaker calling system for apartment houses
JP2007124163A (en) Call apparatus
JP3903933B2 (en) Telephone device
JP4346414B2 (en) Signal processing device, computer program
JP2004274683A (en) Echo canceler, echo canceling method, program, and recording medium
JP4655719B2 (en) Intercom system for housing complex
JP3442535B2 (en) Echo canceller
JP2005333586A (en) Interphone
JP2004260491A (en) Voice switching apparatus
JP2003324369A (en) Battery operated calling apparatus

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080067044.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10852185

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012517086

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10852185

Country of ref document: EP

Kind code of ref document: A1