WO1994006246A1 - Moving picture encoder - Google Patents
Moving picture encoder Download PDFInfo
- Publication number
- WO1994006246A1 WO1994006246A1 PCT/JP1993/001213 JP9301213W WO9406246A1 WO 1994006246 A1 WO1994006246 A1 WO 1994006246A1 JP 9301213 W JP9301213 W JP 9301213W WO 9406246 A1 WO9406246 A1 WO 9406246A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- channel audio
- estimated
- circuit
- sound source
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/148—Interfacing a video terminal to a particular transmission medium, e.g. ISDN
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/403—Linear arrays of transducers
Definitions
- the present invention relates to an encoding device for encoding a video signal, and in particular, specifies a target portion of an image by a sound signal obtained together with the video signal, and assigns a code amount to the specified image region.
- the present invention relates to a moving image encoding device that performs image encoding with an increase. Background technology
- video and audio are transmitted using a communication line such as a telephone line, but the amount of codes that can be transmitted per channel is limited. Encode and transmit image information in order to reduce the amount of video signal data within the code amount.
- the amount of code that can be transmitted per unit time is not sufficient, when transmitting a moving image, the amount of code per image is determined by the transmission rate to ensure natural motion.
- encoding is performed so that the entire screen has a uniform resolution, but this has the disadvantage that the face of the other party is unclear.
- the human sense does not pay attention to the entire screen, but tends to concentrate on the attentional part of the screen. Therefore, if the image quality of the part of interest is improved, the other parts will not matter even if the resolution is low.
- This system shoots the caller with a TV camera, detects a moving part on the image from the video signal obtained by this, estimates the speaker's face area from the detected area, and estimates this.
- the human face area can be displayed more clearly than the other areas.
- an object of the present invention is to accurately estimate the position of a speaker in a video signal, and to accurately extract the region of the speaker on the screen, and thus to determine the region where the speaker appears in the screen.
- An object of the present invention is to provide a video encoding device capable of displaying images clearly.
- an image transmitting apparatus that encodes and transmits a video signal
- a television camera that captures an image of a subject and generates a video signal, and collects the sound of the subject captured by the television camera
- a plurality of microphones that output audio signals and are spaced apart from each other
- a sound source position estimating circuit that estimates a sound source position from audio signals obtained from the plurality of microphones
- a video signal corresponding to this image area is assigned a larger amount than other image areas so that the image area in a predetermined range has a high resolution centered on the sound source position estimated by the position estimation circuit.
- the present invention provides a moving picture coding apparatus comprising a coding circuit for coding in (1).
- the television camera captures an image of the subject and outputs a video signal.
- a plurality of microphones spaced apart from each other in front of the subject pick up the sound, and the sound source localization circuit estimates the sound signals of the collected channels.
- the position of the sound source in the subject is estimated based on the signal.
- the coding circuit is larger than the other image areas so that the image area in a predetermined range around the sound source position estimated by the estimation circuit has a high resolution.
- the video signal of the image area is encoded with the code amount assigned to.
- high-resolution encoding can be performed mainly on the periphery of the sound source position on the screen, so that moving image encoding that can encode a video signal so that a speaker can be more clearly displayed can be realized.
- the video signal is coded so that the speaker's face area has a high resolution by matching the image area within a predetermined range around the estimated sound source position with the area of the subject's face area on the screen. it can.
- FIG. 1 is a block diagram illustrating a configuration example of an image encoding unit of a video conference system according to an embodiment of the present invention.
- FIG. 2 is a diagram for explaining the embodiment of the present invention, and is a diagram showing a configuration of a conference room of the video conference system according to the present invention.
- FIG. 3 is a block diagram showing a configuration of a sound source position estimating unit shown in FIG.
- 4A and 4B are circuit diagrams showing a configuration of the sound source position estimation circuit shown in FIG.
- FIG. 5 is a diagram for explaining an estimation method of the sound source position estimation circuit shown in FIG.
- FIG. 6 is a diagram for explaining a method of determining a priority coding area of the image coding unit shown in FIG.
- FIG. 7 is a block circuit diagram of the image encoding unit shown in FIG. BEST MODE FOR CARRYING OUT THE INVENTION
- the present invention estimates a sound source position from audio signals of a plurality of channels, and encodes an image with emphasis around the estimated sound source position.
- an image encoding device that employs an encoding method.
- FIG. 2 shows a schematic configuration of a conference room of a video conference system having the image encoding device of the present invention. In this figure, one television camera captures three conference attendees.
- a television camera 12 is provided in front of the desk 9 and generates an image signal by capturing images of the conference attendees A 1 to A 3 sitting side by side with the desk 9.
- the audio signal input by the right and left microphones 11 R and 11 L and the video signal input by the TV camera 12 are converted to the image estimation code shown in Fig. 1, which is an image processing system.
- the image data is input to the encoding unit 10 and encoded here so as to be within a predetermined code amount per screen.
- the audio signal is also supplied to an audio signal processing system (not shown), where it is converted to a digital signal.
- the encoded video signal is transmitted to the transmission path together with the encoded video signal, and transmitted to the other party.
- the image estimation encoding unit 10 which is an image processing system, estimates the position of the speaker's face area in the images of the conference attendees A1 to A3 captured by the TV camera 12, and calculates the position of the estimated position.
- the image estimation encoding unit 10 includes a sound source position estimation unit 13, a sound source position information storage unit 14, an image encoding unit 15, and an image memory 16.
- the image memory 16 is a memory for temporarily storing image data obtained by digitally converting the video signal obtained from the TV camera 12 on a screen-by-screen basis. It has the capacity to store the image data, and sequentially updates and stores the image data.
- the sound source position estimating unit 13 estimates the sound source position. That is, the estimating unit 13 estimates the position of the speaker from the audio signal outputs of the microphones 11R and 11L, as well as the left and right from the image data held in the image memory 16.
- the position of the sound source on the image is estimated from the positions of the microphones 11L and 11R.
- the sound source position information storage unit 14 stores the information of the sound source position estimated by the sound source position And information on the time at which the setting was performed. At this time, the time information is provided from the outside, or a clock circuit is provided in the image estimation coding unit 10 so as to obtain the time information.
- the image encoding unit 15 encodes and outputs the image data held in the image memory 16 using the information of the sound source position information storage unit 14. That is, this performs encoding so as to more clearly display an area centered on the speaker's position in the video signal. For this purpose, based on the speaker position information stored in the sound source position information storage unit 14, the image coding unit 15 sets the region on the image of the speaker position as a weighted coding region. Is determined, the code amount M (i) is assigned to the video signal in the weighted coding region, and the code amount M (0) is assigned to the video signal in the other region, and the code amount is set within the allocated range. Encode the video signal in each area to fit.
- the sound source position estimating unit 13 includes a delay circuit 31, an estimating circuit 32, a subtracting circuit 33, and a sound source position estimating circuit 34.
- the delay circuit 31 delays the left channel audio input signal obtained by the left microphone 11 L
- the estimating circuit 32 outputs the delayed left channel audio output from the delay circuit 31.
- the subtraction circuit 33 receives the delayed left channel audio signal output from the delay circuit 31 and the estimated left channel audio signal output from the estimation circuit 32 as inputs, and estimates the left channel audio signal from the left channel audio signal. Is subtracted to obtain the difference signal.
- the difference signal is fed back to the estimation circuit 32, whereby the estimation circuit 32 estimates and outputs an estimated left channel audio signal such that the difference signal becomes zero.
- the estimating circuit 32 estimates the left channel audio signal from the right channel audio signal obtained by the right microphone phone 11 R with reference to the delayed left channel audio input signal, and estimates the impulse response sequence H (k ).
- the sound source position estimating circuit 34 estimates the sound source position using the estimated impulse response sequence H (k) obtained by the estimating circuit 32.
- the conference attendees are photographed by the television camera 12 and the sound is collected by the microphones 11R and 11L on the desk 9 at the same time.
- the video signal from the television camera 12 is sent to the image encoding unit 15, and the audio signals from the microphones 11 R and 11 L are sent to the sound source position estimation unit 13.
- the sound source position estimating unit 13 estimates the position of the sound source based on the audio signal, and the estimation result is stored in the sound source position information storage unit 14 o
- the image coding unit 15 specifies the sound source position corresponding area on the screen in the television image using the latest sound source position information stored in the sound source position information storage unit 14, and sets the area in advance.
- the other area is encoded with the code amount M (0), and the other area is transmitted with the code amount M (0).
- the left channel input audio signal YL 0 ( ⁇ ) is applied to a delay circuit 31 for guaranteeing causality in the estimation circuit 32. Therefore, there is a flat delay of C ( ⁇ ).
- the left channel input audio signal Y LO ( ⁇ ) can be expressed as the following Y L ( ⁇ ) by the transfer function FL ( ⁇ ) including the delay circuit 31.
- the estimation circuit 32 uses the right-channel audio signal YR ( ⁇ ) and the left-channel audio signal YL ( ⁇ ), and calculates the left-channel audio signal YL ( ⁇ ) from the right-channel audio signal YR ( ⁇ ).
- the transfer function G ( ⁇ ) for obtaining And an estimated transfer function G ⁇ ( ⁇ ) is generated from the transfer function G ( ⁇ ).
- G ( ⁇ ) G L ( ⁇ ) / F R ( ⁇ )-(4)
- the generation of the estimated transfer function G p ( ⁇ ) for the above transfer function G ( ⁇ ) is specifically performed as follows.
- the estimating circuit 32 first calculates an estimated left channel audio signal y ⁇ ( ⁇ ) in the time domain using the audio signal YR ( ⁇ ) of the right channel.
- the estimating circuit 32 includes an adaptive transversal filter 32a for calculating an estimated left-channel audio signal yp (k) in the time domain as shown in Fig. 4 ⁇ , and a transfer function G ( ⁇ ) as shown in Fig. 4B.
- a correction circuit 32b for sequentially updating the estimated impulse response sequence Hp (k).
- the adaptive transversal filter 32a and the correction circuit 32b operate in synchronization with a system clock provided from a clock source (not shown).
- the adaptive transversal filter 32a sequentially sends the input audio signal YR ( ⁇ ) and converts the right channel audio signal X (k) or X (k-n + 1) into a value for each time component.
- ⁇ And the estimated impulse response hp 1 (k) or hpn (k) for each time component corrected by the correction circuit 32b, and the right channel obtained through the shift register 4 11 or 4 1 ⁇ -1.
- Multipliers 42 j to 42 n that perform multiplication for each component with channel audio signals X (k) to X (k ⁇ n ⁇ 1), and the sum ( ⁇ ) of the multiplication results is obtained to estimate left channel audio
- the correction circuit 32b calculates the estimated impulse response sequence hp1 (k) or hpn (k) by performing the operation of the expression (10) described later, and divides the estimated impulse response sequence for each time component into an adaptive transversal filter.
- 3 2a is given to the corresponding multiplier 4 2 i or 4 2 n .
- the multipliers 4 2 1 to 4 2 D are the right channel audio signals X (k) or X (k) obtained via the estimated impulse response sequences hp 1 (k) or hpn (k) and the shift registers or li. k-n + 1) for each component to obtain the estimated left channel audio signal for each time component.
- the estimated left channel audio signal yp (k) is obtained by the adder 43 adding the estimated left channel audio signal for each time component.
- first right channel audio signals X (k) is to shift register 4 to the n stages have a delay time of one sample time period per one stage is input to the 4 1 n. Equation ( A time series vector as shown in 5) is generated.
- H p (k) (h p l (k), h p 2 (k),
- an estimated left channel audio signal yp (k) which is an estimated value of the left channel audio signal y (k), can be obtained.
- the estimation of the estimated impulse response sequence Hp (k) in the estimation circuit 32 is performed by the input / output of n-stage shift registers 4 to 4 1 waive ⁇ in the adaptive transversal filter 32 a. This is achieved by sequentially performing, for example, the following operation by the correction circuit 32b using the time series vectors X (k) to X (k-n + 1) obtained as forces.
- H p (k + 1) H p (k) + a * e (k) X (k) / II X (k) II 2
- Equation (10) e (k) is the output of the subtraction circuit 33 in FIG. 3, and this output e (k) is given by the following equation assuming that the estimated left channel audio signal is yp (k).
- E (k) y (k) -yp (k)-(11), which has the relationship of the expression (1 1) .
- the output e (k) of the subtraction circuit 33 becomes the left channel audio signal y ( This is the difference signal from the estimated left channel audio signal yp (k) for k).
- ⁇ is a coefficient that determines the convergence speed and stability of Equation (10), and the distances from the sound source 51 to the left and right microphons 11L and 11R are calculated. Indicates the difference.
- the image estimation encoding unit 10 examines the left and right positions of the microphones 11 L and 11 R from the image data held in the image memory 16, obtains the distance difference ⁇ , and calculates By using the output e (k) of (3), the correction circuit 32b can calculate the estimated impulse response sequence Hp (k) by performing the operation of the expression (10).
- the sound source position is estimated by the sound source position estimating circuit 34 from the estimated impulse response sequence H p (k) obtained by the above processing. This estimation is performed as follows.
- Mx be the term that takes the maximum value among the coefficients of the estimated impulse response sequence Hp (k).
- T the sampling period
- V the sound speed
- n the number of taps
- ⁇ ⁇ ⁇ ⁇ ( ⁇ - ⁇ / 2)-(1 2)
- the left and right microphones 11 and 11R are connected by a straight line 52, and a straight line 53 parallel to this straight line 52 is assumed. It is assumed that the right and left microphones 11 L and 11 R are on a straight line 53 that is a fixed distance away. At this time, the distance from the intersection of the line 54 perpendicular to the straight line 52 and the straight line 53 to the sound source 51 passes through the center point Po of the left and right microphones 11 L and 11 R on the straight line 52.
- the image encoding unit 15 When the data of the sound source position Pa estimated as described above is input to the image encoding unit 15 via the sound source position information storage unit 14, the image region centered on the sound source position is regarded as a weighted coding region.
- the image data corresponding to the area is encoded with a larger encoding amount than the image data of the other areas. This encoding will be described in detail.
- the image memory 16 stores image data of one frame, for example, one block is divided into 44 ⁇ 36 blocks as 8 pixels ⁇ 8 lines.
- This image The image data stored in the memory 16 is sequentially sent to the image encoding unit 15 in block units.
- the image encoding unit 15 is connected to the orthogonal transform (DCT) circuit 71 connected to the readout terminal of the image memory 16 and the output terminal of the DCT circuit 71 as shown in FIG. It comprises a quantization circuit 72, a variable length coding circuit 73 connected to the output terminal of the quantization circuit 72, and a quantization step size determination circuit 74 connected to the control terminal of the quantization circuit 72.
- DCT orthogonal transform
- the image coding unit 15 further includes a marker recognition circuit 75 and a weighted coding area determination circuit 76.
- the marker recognition circuit 75 recognizes two markers 61 a and 61 b provided in correspondence with the positions of the left and right microphones 11 L and 11 R from the image data read from the image memory 16. Recognize and find the distance 2 d 'between the microphones 11 L and 11 R on the screen. The marker is input to the device by the operator when the microphone is placed in the conference room.
- the obtained information of the distance 2 d ′ is input to the weighted coding region determination circuit 76, and this circuit 76 receives the distance (2 d ′) information and the sound source position information storage unit 14.
- the distance a 'from the center of the distance 2 d * to the position 62 of the speaker from the read sound source position information is calculated by the following equation (14).
- the weighted coding area determination circuit 76 determines the area 63 having a width 2w 'set in advance around the speaker position 62 as the weighted coding area. .
- the step size determination circuit 74 determines the step size for coding the image data in the weighted coding area with a larger code amount than the image data in other areas.
- the quantization circuit 72 reads out the image memory 16 and outputs the image data orthogonally transformed by the DCT circuit 71. Is quantized by the determined step size, that is, by the code amount.
- the image data corresponding to the weighted coding area 63 is quantized by the step size determined when input to the quantization circuit 72, but the image data of the other areas is Quantization is performed with a step size that is coarser than the step size for the image data.
- the quantized image data is subjected to variable-length encoding by a variable-length encoding circuit 73, and is output as encoded image data.
- the image data encoded as described above is sent to the receiving side and displayed on the receiving monitor, the image of the speaker is displayed at a higher resolution than other images.
- time information may be stored as follows.
- the sound source position estimating unit 13 estimates the sound source position Pa in the sound source position estimating circuit 34 based on the term having the maximum value among the coefficients of the estimated impulse response sequence H p (k).
- the information on the sound source position Pa estimated by the sound source position estimating unit 13 and the time when the estimation is performed are stored in the sound source position information storing unit 14 under the control of a control device (not shown). At this time, only t time Past sound source position P a (t) power If the latest sound source position Pa is within a fixed width w to the left and right from the latest sound source position Pa, the stored information of the past sound source position Pa (t) is stored from the sound source position information storage unit 14.
- the storage unit 14 is controlled by the control device so as to erase the data.
- the sound source position information storage unit 14 stores the information of the current utterance position of the speaker and the information of the last utterance position of the person (N persons) who has made the past in the following manner.
- T (i) is the time elapsed since speaker i last spoken
- L (i) is This data indicates the position where speaker i last spoke.
- T (1) is the time at the time when the above-mentioned arithmetic processing is performed by the voice sampling of the current speaker
- L (1) is data indicating the position where the current speaker speaks.
- the image encoding unit 15 encodes an image as described above, based on the information of the latest speaker position L (1) stored in the sound source position information storage unit 14.
- the code amount of the entire screen is M
- the width of the entire screen is WL
- the importance of the speaker i's weighted coding area is R (i)
- the weight of the area other than the weighted coding area is R (0).
- the importance levels R (i) and R (0) can be set freely, but more recently If you give high importance to the person who spoke to,
- ... (16) can be set.
- the code amount M (i) of the weighted coding region of the latest speaker (the image region of the latest speaker) and the code amount M (0) of the region other than the weighted coding region are
- WL (0) ⁇ ⁇ (WL- ⁇ ⁇ w ') R (0) / R ⁇ ⁇ ⁇ ⁇ ⁇ Where R ⁇ is
- the audio signals of multiple channels picked up by multiple microphones arranged at different positions, the microphone opening phone and the microphone opening phone position on the video screen including the speaker By estimating the sound source position from the speaker, the image area of the speaker on the video screen can be accurately extracted, and coding is performed by assigning a larger amount of code to the image area of the speaker. By doing so, it is possible to obtain a moving picture coding method capable of clearly displaying the image area of the speaker.
- the present invention is not limited to the above-described embodiment, and may be implemented by appropriately modifying the gist thereof without changing the gist.
- the estimation circuit of the sound source position estimating unit 13 in the above-described embodiment In 32 an adaptive transversal filter in the time domain is used, but other circuit configurations such as an adaptive transversal filter in the frequency domain may be used.
- the learning identification method has been described as an example of the estimation algorithm, other learning algorithms such as the steepest descent method can be used.
- the sound source position estimating circuit 34 estimates the sound source position based on the term having the maximum value among the coefficients of the estimated impulse response sequence H p (k), but other methods may be used. good.
- the method of determining the weighted coding area in the image coding unit 15 is not limited to the above-described method, and another method such as detecting a face area in the weighted coding area 63 may be used. Also, regarding the method of setting the importance in the image coding unit 15, the method of setting the importance according to the time of speaking up to the present time and the time of the last speaking and the time of speaking up to the present are described. Other methods such as a method of setting the importance in consideration of both may be used.
- the subject is almost fixed at a fixed position, and the television screen maintains the same viewing angle with respect to the subject, so that the position of the subject on the screen does not change unless the subject itself moves. From the importance setting in the image coding unit 15.
- VIP can always perform high-definition coding.
- the coding method of the image coding unit 15 in the above-described embodiment, a method of giving a large amount of code to the weighted coding area 63 for each frame and performing fine coding has been described.
- the resolution is changed by weighting according to the rank of the utterance, for example, the resolution is higher for the latest speaker, and lower for the older speaker. Is also good.
- two channels are used for voice input, but three or more channels may be used.
- two-dimensional estimation of the sound source position is possible by giving a vertical difference to the arrangement of the microphones.In this case, it is possible to estimate one point on the screen as the sound source. Thus, it is possible to estimate the sound source position with higher accuracy.
- the sound is converted from the audio signal of the plurality of channels.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Circuit For Audible Band Transducer (AREA)
- Closed-Circuit Television Systems (AREA)
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/211,724 US5594494A (en) | 1992-08-27 | 1993-08-27 | Moving picture coding apparatus |
EP93919585A EP0615387B1 (en) | 1992-08-27 | 1993-08-27 | Moving picture encoder |
DE69326751T DE69326751T2 (de) | 1992-08-27 | 1993-08-27 | Bewegtbildkodierer |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4/228572 | 1992-08-27 | ||
JP22857292 | 1992-08-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1994006246A1 true WO1994006246A1 (en) | 1994-03-17 |
Family
ID=16878468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1993/001213 WO1994006246A1 (en) | 1992-08-27 | 1993-08-27 | Moving picture encoder |
Country Status (5)
Country | Link |
---|---|
US (1) | US5594494A (ja) |
EP (1) | EP0615387B1 (ja) |
CA (1) | CA2122371C (ja) |
DE (1) | DE69326751T2 (ja) |
WO (1) | WO1994006246A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6850265B1 (en) | 2000-04-13 | 2005-02-01 | Koninklijke Philips Electronics N.V. | Method and apparatus for tracking moving objects using combined video and audio information in video conferencing and other applications |
CN108769874A (zh) * | 2018-06-13 | 2018-11-06 | 广州国音科技有限公司 | 一种实时分离音频的方法和装置 |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3017384B2 (ja) | 1993-07-19 | 2000-03-06 | シャープ株式会社 | 特徴領域抽出装置 |
US6313863B1 (en) * | 1994-07-29 | 2001-11-06 | Canon Kabushiki Kaisha | Image communication apparatus and system |
FR2728753A1 (fr) * | 1994-12-21 | 1996-06-28 | Grenier Yves | Dispositif de prise de sons comprenant un systeme video pour le reglage de parametres et procede de reglage |
US6496607B1 (en) * | 1998-06-26 | 2002-12-17 | Sarnoff Corporation | Method and apparatus for region-based allocation of processing resources and control of input image formation |
ES2367099T3 (es) * | 1998-11-11 | 2011-10-28 | Koninklijke Philips Electronics N.V. | Disposición de localización de señal mejorada. |
US6269483B1 (en) * | 1998-12-17 | 2001-07-31 | International Business Machines Corp. | Method and apparatus for using audio level to make a multimedia conference dormant |
KR100293456B1 (ko) | 1998-12-30 | 2001-07-12 | 김영환 | 오디오/비디오 신호의 코딩 장치 및 방법_ |
US6288753B1 (en) * | 1999-07-07 | 2001-09-11 | Corrugated Services Corp. | System and method for live interactive distance learning |
US20010017650A1 (en) * | 1999-12-23 | 2001-08-30 | Mitsubishi Denki Kabushiki Kaisha | Method and apparatus for transmitting a video image |
US6605674B1 (en) * | 2000-06-29 | 2003-08-12 | Ondeo Nalco Company | Structurally-modified polymer flocculants |
US7002617B1 (en) * | 2000-07-20 | 2006-02-21 | Robert Samuel Smith | Coordinated audio and visual omnidirectional recording |
US20020140804A1 (en) * | 2001-03-30 | 2002-10-03 | Koninklijke Philips Electronics N.V. | Method and apparatus for audio/image speaker detection and locator |
EP1425909A4 (en) * | 2001-08-07 | 2006-10-18 | Polycom Inc | SYSTEM AND METHOD FOR HIGH RESOLUTION VIDEOCONFERENCE |
US20030220971A1 (en) * | 2002-05-23 | 2003-11-27 | International Business Machines Corporation | Method and apparatus for video conferencing with audio redirection within a 360 degree view |
US20040001091A1 (en) * | 2002-05-23 | 2004-01-01 | International Business Machines Corporation | Method and apparatus for video conferencing system with 360 degree view |
US7444068B2 (en) * | 2002-06-28 | 2008-10-28 | Hewlett-Packard Development Company, L.P. | System and method of manual indexing of image data |
GB2414369B (en) * | 2004-05-21 | 2007-08-01 | Hewlett Packard Development Co | Processing audio data |
GB2415584B (en) | 2004-06-26 | 2007-09-26 | Hewlett Packard Development Co | System and method of generating an audio signal |
JP2006148861A (ja) * | 2004-10-21 | 2006-06-08 | Matsushita Electric Ind Co Ltd | 撮像信号処理装置及び方法 |
FR2886799A1 (fr) * | 2005-06-03 | 2006-12-08 | France Telecom | Procede et dispositif de commande d'un deplacement d'une ligne de visee, systeme de visioconference, terminal et programme pour la mise en oeuvre du procede |
FR2886800A1 (fr) * | 2005-06-03 | 2006-12-08 | France Telecom | Procede et dispositif de commande d'un deplacement d'une ligne de visee, systeme de visioconference, terminal et programme pour la mise en oeuvre du procede |
JP2009143454A (ja) * | 2007-12-14 | 2009-07-02 | Fujitsu Ten Ltd | 車両制御装置及び車両状態監視方法 |
US8697990B2 (en) | 2012-07-12 | 2014-04-15 | Wirepath Home Systems, Llc | Power products with selectable mounting and related assemblies and kits |
JP2014143678A (ja) * | 2012-12-27 | 2014-08-07 | Panasonic Corp | 音声処理システム及び音声処理方法 |
KR20140127508A (ko) * | 2013-04-25 | 2014-11-04 | 삼성전자주식회사 | 음성처리장치 및 음성처리방법 |
US20190082255A1 (en) * | 2017-09-08 | 2019-03-14 | Olympus Corporation | Information acquiring apparatus, information acquiring method, and computer readable recording medium |
CN110719430A (zh) * | 2018-07-13 | 2020-01-21 | 杭州海康威视数字技术股份有限公司 | 图像数据生成方法、装置、电子设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5921186A (ja) * | 1982-06-28 | 1984-02-03 | ウエスタ−ン・エレクトリツク・カムパニ−・インコ−ポレ−テツド | ビデオ信号を発生させる方法 |
JPS6129163B2 (ja) * | 1977-02-21 | 1986-07-04 | Mitsubishi Electric Corp | |
JPS6364120B2 (ja) * | 1982-11-05 | 1988-12-09 | ||
JPH0396999A (ja) * | 1989-09-08 | 1991-04-22 | Aisin Seiki Co Ltd | 集音装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6129163A (ja) * | 1984-07-19 | 1986-02-10 | Toshiba Corp | Icモジユ−ルユニツト |
JPS6243285A (ja) * | 1985-08-21 | 1987-02-25 | Hitachi Ltd | テレビ会議発言者確認方式 |
JPS6364120A (ja) * | 1986-09-04 | 1988-03-22 | Mitsubishi Electric Corp | 端末装置のプリンタ制御方式 |
US5206721A (en) * | 1990-03-08 | 1993-04-27 | Fujitsu Limited | Television conference system |
-
1993
- 1993-08-27 EP EP93919585A patent/EP0615387B1/en not_active Expired - Lifetime
- 1993-08-27 US US08/211,724 patent/US5594494A/en not_active Expired - Fee Related
- 1993-08-27 WO PCT/JP1993/001213 patent/WO1994006246A1/ja active IP Right Grant
- 1993-08-27 DE DE69326751T patent/DE69326751T2/de not_active Expired - Fee Related
- 1993-08-27 CA CA002122371A patent/CA2122371C/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6129163B2 (ja) * | 1977-02-21 | 1986-07-04 | Mitsubishi Electric Corp | |
JPS5921186A (ja) * | 1982-06-28 | 1984-02-03 | ウエスタ−ン・エレクトリツク・カムパニ−・インコ−ポレ−テツド | ビデオ信号を発生させる方法 |
JPS6364120B2 (ja) * | 1982-11-05 | 1988-12-09 | ||
JPH0396999A (ja) * | 1989-09-08 | 1991-04-22 | Aisin Seiki Co Ltd | 集音装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP0615387A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6850265B1 (en) | 2000-04-13 | 2005-02-01 | Koninklijke Philips Electronics N.V. | Method and apparatus for tracking moving objects using combined video and audio information in video conferencing and other applications |
CN108769874A (zh) * | 2018-06-13 | 2018-11-06 | 广州国音科技有限公司 | 一种实时分离音频的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CA2122371A1 (en) | 1994-03-17 |
US5594494A (en) | 1997-01-14 |
DE69326751D1 (de) | 1999-11-18 |
EP0615387A4 (en) | 1994-07-12 |
DE69326751T2 (de) | 2000-05-11 |
CA2122371C (en) | 1998-03-03 |
EP0615387B1 (en) | 1999-10-13 |
EP0615387A1 (en) | 1994-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1994006246A1 (en) | Moving picture encoder | |
US5555310A (en) | Stereo voice transmission apparatus, stereo signal coding/decoding apparatus, echo canceler, and voice input/output apparatus to which this echo canceler is applied | |
US6675145B1 (en) | Method and system for integrated audiovisual speech coding at low bitrate | |
US5778082A (en) | Method and apparatus for localization of an acoustic source | |
EP1711019B1 (en) | Motion compensated temporal filtering for noise reduction pre-processing of digital video data | |
US8379074B2 (en) | Method and system of tracking and stabilizing an image transmitted using video telephony | |
US8130257B2 (en) | Speaker and person backlighting for improved AEC and AGC | |
CN106664501B (zh) | 基于所通知的空间滤波的一致声学场景再现的系统、装置和方法 | |
JP4872871B2 (ja) | 音源方向検出装置、音源方向検出方法及び音源方向検出カメラ | |
JP5857674B2 (ja) | 画像処理装置、及び画像処理システム | |
JP6703525B2 (ja) | 音源を強調するための方法及び機器 | |
CN110289009B (zh) | 声音信号的处理方法、装置和交互智能设备 | |
JPH08205156A (ja) | ディジタル圧縮・再生画像の画質評価装置 | |
US11076127B1 (en) | System and method for automatically framing conversations in a meeting or a video conference | |
JP2004118314A (ja) | 発話者検出システムおよびそれを用いたテレビ会議システム | |
US11842745B2 (en) | Method, system, and computer-readable medium for purifying voice using depth information | |
JPH06217276A (ja) | 動画像符号化装置 | |
US11875800B2 (en) | Talker prediction method, talker prediction device, and communication system | |
JPH0761043B2 (ja) | ステレオ音声伝送蓄積方式 | |
JP3724008B2 (ja) | 画像情報変換装置および係数データ作成装置 | |
Bulla et al. | High Quality Video Conferencing: Region of Interest Encoding and Joint Video/Audio Analysis | |
JPH0758939B2 (ja) | ステレオ信号伝送方法、符号化装置および復号化装置 | |
WO2023120244A1 (ja) | 伝送装置、伝送方法、およびプログラム | |
CN110121890B (zh) | 处理音频信号的方法和装置及计算机可读介质 | |
Vahedian et al. | Improving videophone subjective quality using audio information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
WWE | Wipo information: entry into national phase |
Ref document number: 08211724 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2122371 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1993919585 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1993919585 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 1993919585 Country of ref document: EP |