GB2328832A

GB2328832A - Apparatus,method and system for audio and video conferencing and telephony

Info

Publication number: GB2328832A
Application number: GB9718201A
Authority: GB
Inventors: Timothy Mark Burke; Douglas J Newlin
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 1996-08-30
Filing date: 1997-08-29
Publication date: 1999-03-03
Also published as: RU97114924A; CN1183695A; ID18200A; DE19737906A1; GB9718201D0

Abstract

A video access apparatus 110 provides for audio and video teleconferencing and telephony via a first communication channel 103 coupled to a primary station 105 having communication with a network 140, such as the public switched telephone network or an ISDN network. The video access apparatus 110 includes a video network interface 210 for reception of a first protocol signal to form a received protocol signal and for transmission of a second protocol signal to form a transmitted protocol signal; a radio frequency modulator/demodulator 205 to convert a baseband output video signal to a radio frequency output video signal and to convert a radio frequency input video signal to a baseband input video signal; a user interface 215 for reception of a first control signal of a plurality of control signals; and a processor arrangement 190 responsive, through a set of program instructions, and in response to the first control signal, to convert the received protocol signal to the baseband output video signal and to an output audio signal, and to convert the baseband input video signal and an input audio signal to the second protocol signal. This provides an audio/video conferencing or telephony apparatus which is mobile and can be configured for additional locations.

Description

1 2328832 APPARATUS, METHOD AND SYSTEM FOR AUDIO AND VIDEO CONFERENCING

AND TELEPHONY

Cross-Reference to Related Application

This application is related to Newlin et al., United States Patent Application Serial No. 081658,792, filed June 5, 1996, entitled "Audio/Visual Communication System and Method ThereoC, Motorola Docket No. PD05634AM, incorporated by reference herein, with priority claimed for all commonly disclosed subject matter.

Field of the Invention

This invention relates in general to audio and video communications systems and, more specifically, to an apparatus, method and system for audio and video conferencing and telephony.

Background of the Invention

Currently, audio and video (visual) conferencing capabilities are implemented as computer based systems, such as in personal computers ("PCs"), as stand-alone, "roll abouC room systems, and as video phones. These systems typically require new and significant hardware, software and programming, plus require significant communications network connections, for example, multiple channels of an Integrated Services Digital Network (1SDW) connection or a T1/E1 connection.

- For example, stand-alone, "roll abouC room systems for audio and video conferencing typically require dedicated hardware at significant expense, in the tens of thousands of dollars, utilizing dedicated video cameras, television displays, microphone systems, and the additional video conferencing equipment. Such systems may also require as many as six (or more) contiguous ISDN B channels (or T1/E1 DS0s), each operating at 64 kbps (kilobits per second). PC based systems also typically require, at a minimum, ISDN basic rate interface service, consisting of 2 ISDN B channels (each operating at 64 kbps) plus one D channel (operating at 16 kbps). Such communication network capability is also expensive and potentially unnecessary, particularly when the additional channels are not in continuous use.

Current audiolvisual telephony or conferencing systems are also limited to providing such audiolvisual functionality only at designated nodes, i.e., the specific system location, and are neither mobile nor distributed (having multiple locations). Stand-alone, 'Yoll abouC room systems allow such audio and video conferencing only within or at that particular physical location. Video phones are also currently limited to their installed locations. Similarly, PC based systems provide such functionality only at the given PC having the necessary network connections (such as ISDN) and having the specified audio/visual conferencing equipment, such as a video camera, microphone, and the additional computer processing boards which provide for the audiolvisual processing. For other PCs to become capable of such audio/visual conferencing functionality, they must also be equipped with any necessary hardware, software, programming and network connections.

Such conventional audiolvisual conferencing systems are also difficult to assemble, install, and use. For example, the addition of audiolvisual functionality to a PC requires the addition of a new PC card, camera, microphone, the installation of audiolvisual control software, and the installation of new network connections, such as ISDN. In addition, such network connectivity may require additional programming of the PC with necessary ISDN specific configuration information, such as configuration information specific to the central office switch type of the service provider and ISDN service profile identifier 3 (SPID) information. Video conference call set up procedures typically are also difficult and complicated utilizing these current systems.

Conventional audiolvisual telephony and conferencing equipment is also limited to communication with similar equipment at the far end (remote location). For example, video phone systems which utilize typical telephone systems ("POTS"- plain old telephone service) transmit information in analog form, for example, as trellis code modulated data, at V.34 and V.34bis rates (c 9L, highest rates of approximately 28.8 to 33 kbps). Such POTS-based video phone systems would not be compatible with ISDN audiolvisual conferencing and telephony systems which transmit information in digital form, such as utilizing Q.931 message signaling, 0.921 LAPD datalink, and Q.91 0 physical interface digital protocols, with data rates of 128 kbps (two B channels) or more with additional channels or DS0s.

In addition, such current audiolvisual telephony and conferencing equipment are relatively expensive and, in most instances, sufficiently expensive to be prohibitive for in-home or other consumer use. For example, the cost of roil about, room based systems is typically tens of thousands of dollars. PC based videoconferencing systems, with ISDN network connections, are also expensive, with costs in the thousands of dollars.

Accordingly, a need has remained for audio/visual conferencing and telephony systems, equipment, and methods which may operate at more than one designated node or location within the user premises, or may be mobile, or may be configured as needed for additional locations. In addition, such a-system should be compatible for use with other existing video conferencing systems, should be user friendly, easy to install and use, and should be relatively less expensive for in-home purchase and use by consumers.

4 Brief Description of the Drawings

FIG. 1 is a block diagram illustrating an audiolvideo network configuration for a video access apparatus in accordance with the present invention.

FIG. 2 is a high level block diagram illustrating a first embodiment of a video access apparatus and a first embodiment of a video conferencing system in accordance with the present invention.

FIG. 3 is a detailed block diagram illustrating a second embodiment of a video access apparatus and a second embodiment of a video conferencing system in accordance with the present invention.

FIG. 4 is a block diagram illustrating a CATV RF transceiver of the preferred apparatus embodiment in accordance with the present invention.

FIG. 5 is a block diagram illustrating a microprocessor subsystem and communications ASIC of the preferred apparatus embodiment in accordance with the present invention.

FIG. 6 is a block diagram illustrating a audiolvideo compression and decompression subsystem of the preferred apparatus embodiment in accordance with the present invention.

FIG. 7 is a block diagram illustrating a user audio interface of the preferred apparatus embodiment in accordance with the present invention.

FIG. 8 is a block diagram illustrating an RF modulator of the preferred apparatus embodiment in accordance with the present invention.

FIG. 9 is a block diagram illustrating an RF demodulator of the preferred apparatus embodiment in accordance with the present invention.

FIG. 10 is a block diagram illustrating a camera interface of the preferred apparatus embodiment in accordance with the present invention.

FIG. 11 is a flow diagram illustrating the method of the 5 preferred embodiment of the present invention.

FIG. 12 is a flow diagram illustrating the telephony and video conference control methodology in accordance with the preferred embodiment of the present invention.

Detailed Description of the Invention

As mentioned above, a need has remained for audio/visual conferencing and telephony systems, apparatus, and methods which may operate at more than one designated node or location within user premises, or may be mobile, or may be configured as needed for additional locations. As discussed in greater detail below, the preferred embodiment of the present invention provides such audio and visual conferencing and telephony capability at one or more locations within the user premises, may be mobile, and may be configured as needed for additional locations. In accordance with the preferred embodiment, the audiolvisual conferencing and telephony system utilizes equipment typically found in consumers' homes or premises, such as existing televisions, video cameras or camcorders, and telephones. In addition, such a system is designed to be compatible for use with other existing video conferencing systems, may be utilized over a variety of connected telecommunications networks (such as ISDN or POTS), is user friendly, easy to install and use, and should be relatively less expensive for in-home purchase and use by consumers.

FIG. 1 is a block diagram illustrating a configuration of an audiolvideo network 100 for a video access apparatus 110 in accordance with the present invention. As illustrated in FIG. 1, video access apparatus 1101 through video access apparatus 6 11 On (individually and collectively referred to as a video access apparatus(es) 110) may have an outdoor location, for example, at subscriber premises 1091 (video access apparatus 1101), or may have indoor locations, for example, at subscriber premises 1092 and 109n (video access apparatus 1101 and video access apparatus 11 On). The video access apparatus 110 illustrated in FIG. 1 may have a first embodiment as illustrated in FIG. 2 or a second and preferred embodiment as video access apparatus 150 illustrated in FIG. 3, and as used herein, reference to any of the embodiments of the video access apparatus 110 or 150 shall be understood to mean and include the other apparatus embodiment or its equivalents. Referring to FIG. 1, in accordance with the present invention, the video access apparatus 110 provides audio and video telephony and conferencing services over a first communication channel 103 which, in the preferred embodiment, is hybrid fiber coaxial cable ("HFC") utilized in the audiolvideo network 100 (which may have multiple configurations). Also in the preferred embodiment utilizing HFC, the video access apparatus 110 (or 150) is also referred to as a video cable access unit. The first communication channel 103, in turn, is connected through a primary station 105 to a cable television network ("CATW) video services infrastructure 102, and through a local digital switch 135 to a network 140. The network 140, for example, may be a public switched telephone network ("PSTW) or an Integrated Services Digital Network ("ISDN"), or any combination of such existing or future telecommunications networks.

Continuing to refer to FIG. 1, a primary station 105, also referred to as head end equipment, includes a control unit referred to in the preferred embodiment as a cable control unit (.'CCU11) 115, a network interface (or telecommunications network interface) 130, a combiner 104, and is coupleable to the CATV video services infrastructure' 102. The CCU 115 consists of a communications controller 125 and a bank of 7 transceivers 1201 through 120n, also referred to as cable port transceiver (IICPXII) cards in the preferred embodiment. The communications controller 125 transmits and receives industry standard time division multiplexed ("TDMII) signals, via the network interface 130, to and from a local digital switch CLDW) which connects to the rest of the network 140. In the preferred embodiment, incoming (received) signals to the communications controller 125 are converted to an internal signaling format, may also have TDIV1 time slots interchanged, and are then routed to the transceivers 1201 through 120n.

The transceivers 1201 through 1120n convert the received signals to frequencies (e.g., radio frequencies ("RP)), preferably frequencies compatible with cable television (CATV) networks. The primary station 105 provides concentration of the resources of the network 140 through time slot and frequency management techniques. The audiolvideo network comprises the primary station 105 (YAth the network interface 130 for connection to the network 140 and the coupleability to the CATV video services infrastructure 102), along with a plurality of video access apparatuses, such as video access apparatuses 1101 through 11 On (connected to the primary station 105 over the first communication channel 103).

In the preferred embodiment, the signaling over the audiolvideo network 100 uses a protocol referred to as "CACS" (for C-able A-C-cess aignaling), for transmission and reception of data such as voice, video, computer files and programs, and other information (collectively referred to as data). CACS is a multi-layered protocol consisting of a plurality of 768 kbps ni4 DQPSK (differential quadrature phase shift keying) modulated RF carriers using TDIV1 framing in the downstream path (from the primary station 105 to a video access apparatus 110) and TDIVIA (time division multiple access) in the upstream path (to the primary station 105 from a video access apparatus 110). In the preferred embodiment, each CACS carrier supports as 8 many as eight time slots of individually addressable user data packets, in which each packet contains 160 bits of user data (the "payload") plus address and error correction information. The preferred CACS frame rate is 400 frames per second, providing a net user data throughput of 64 kbps (kilobits per second) for each assigned time slot. Time slots also may be concatenated to provide even greater data rates, for example, up to 512 kbps when all eight time slots are assigned to a single user.

As a consequence, N x 64 kbps services may be supported with the CACS protocol, where N is the number of assigned time slots. In the case of connectivity for ordinary telephony commonly known as POTS (Plain Old Telephone Service), a single time slot is used in which digital PCM (pulse code modulated) audio samples are transported in the payload of the CACS time slot. In the case of connectivity for higher rate services, such as basic rate ISDN (two 64 kbps B channels plus one 16 kbps D channel), two or more time slots are used to transport the user (bearer) data. For video conferencing and telephony service, compressed digital audio and video signals may occupy from one to multiple time slots per carrier (aQ., 8 time slots per carrier), depending on the method of compression used and the desired quality of the service, and depending upon the number of video network interfaces 210 (or CATV RF transceivers 245) utilized in the video access apparatus 110 (or 150) discussed below.

Also in the preferred embodiment, modulated CACS RF carriers occupy an RF bandwidth of 600 kHz and may be assigned anywhere within the downstream and upstream CATV frequency bands. Typically, in domestic, North American CATV systems, the downstream band has been designated from 50 to 750 MHz, with an upstream band designated from 5 to 40 MHz. Referring to FIG. 1, for transmission to the user premises 1091 through 109n, the transceivers 1201 through 120n receive a TDM data stream from the communications 9 controller 125 and create CACS frames of eight time slots, along with associated overhead signaling information (including error control data), resulting in a 768 kbps data stream. The data stream is then converted to a ri/4-DOPSK signal, which in turn is then upconverted in frequency from baseband to an RF carrier within the CATV downstream band. This rI/4DQPSK signal may then be optionally combined (in the combiner 104 of the primary station 105) with other video services signals from the CATV video services infrastructure 102, and transmitted over the first communication channel 103.

At the receiving end, as discussed in greater detail below, a video access apparatus 110 downconverts the CACS carrier to baseband and demodulates the ri/4-DQPSK signal, resulting in received CACS frames. Time slot information the data in the payload) is then extracted from the CACS frames and transferred to an audio codec in the case of telephony (a POTS call), or transferred to an audiolvideo compression and decompression subsystem in the case of a video conferencing call or session. Conversely, for upstream transmission, voice or video data originating, respectively, from the audio codec or an audiolvideo compression and decompression subsystem, is put into CACS protocol formatted TDMA data packets. The TDMA data packets are then converted into a M4-MPSK signal, upconverted to an RF carrier, and injected into the upstream path of the audiolvideo network 100, on first communication channel 103. In turn, one of the transceivers 1201 through 120n receives the upstream signal from a video access apparatus 110, RF downconverts the signal to baseband and demodulates the n/4-DQPSK signal, resulting in a received TDMA data packet. The user data is then extracted from the packet and transferred to the communications controller 125, which reformats the user data into an appropriate network signal (analog or digital) and, through the network interface 130, transmits the network signal, multiplexed with other signals, to the network 140 (via the local digital switch 135).

In the preferred embodiment, the CACS protocol consists of three types of signaling channels which use designated time slots on CACS carriers. A first type of signaling channel, referred to as a broadcast channel, is utilized to transmit general system information, only in the downstream direction to the various video access apparatuses 110, and to transmit information such as terminating alerts to a video access apparatus 110 when a call is to be received from the network 140. A plurality of a second type of signaiing channel, referred to as access channels, are used by the various video access apparatuses 110 to gain access to the audiolvideo network 100 or the network 140. A plurality of a third type of signaling channel, referred to as traffic channels, are full-duplex and are used to transport user data to and from the network 140.

In the preferred embodiment, traffic channels may consist of one or more time slots and are assigned to users based on demand (trunked) from a pool of available time slots.

A traffic channel is assigned for the duration of a call (POTS or video), and upon call termination, is subsequently released to the pool of available time slots. When a video access apparatus 110 first powers up, it registers with the CCU 115 by first scanning the downstream spectrum for a CACS broadcast channel, synchronizing with that channel, and obtaining information concerning a location of an access channel. On the access channel, the video access apparatus 110 requests an assignment of a traffic channel, and then transmits a registration message over the assigned traffic channel of the plurality of traffic channels. After registration is complete, the video access apparatus 110 may make or receive calls through the network 140.

If a call origination is required, the video access apparatus 110 makes a request to the CCU 115 for the 11 required number of time slots through the access channel. The CW 115 then grants the request and assigns a traffic channel (carrier frequency and associated time slot(s)). If a call delivery is required, the CCU 115 alerts the identified, addessed video access apparatus 110 of an incoming call over the broadcast channel. Via the access channel, the video access apparatus 110 then requests a traffic channel. The CCU 115 grants the request and a traffic channel is assigned.

In the preferred embodiment, the CACS protocol also provides the capability for transferring calls to other available carrier frequencies and time slots, especially in the event of high noise conditions. Preferably, the quality of all user traffic channels is continuously monitored, and if the quality starts to degrade due to noise, the call is transferred to another RF carrier having less noise.

FIG. 2 is a high level block diagram illustrating a first embodiment of a video access apparatus, namely, video access apparatus 110, and illustrating a video conferencing system 200, in accordance with the present invention. The video conferencing system 200, in accordance with the present invention, includes a video access apparatus 110, audio 220, one or more video displays 2251 through 225n (individually and collectively referred to as video display(s) 225), camera interface 235, and video camera 230. The video access apparatus 110 is coupleable to a first communication channel 103, for communication via a primary station 105 with a network 140 and with a CATV video service infrastructure 102 as discussed above, and is coupled to a second communication channel 227, typically located within or about the user (or subscriber) premises 109. For example, the second communication channel 227 may be an internal 75 Ohm coaxial cable typically utilized with cable television. The audio 220 is coupled to the video access apparatus 110, and may include a microphone and speaker or, as discussed below with reference to FIG. 3, may be preferably embodied as a 12 telephone. One or more video displays 225 are utilized to display the incoming video portion of an audio and video conferencing call or session (incoming in the sense of having been transmitted to the video access apparatus 110 from another location), may also include a speaker for output of the incoming audio portion of an audio and video conferencing call or session, and are implemented utilizing one or more televisions in the preferred embodiment. The video camera 230 is utilized to generate the outgoing video portion of an audio and video conferencing call or session (outgoing in the sense of being transmitted from the video access apparatus to another location), may also include a microphone for generation of the outgoing audio portion of an audio and video conferencing call or session, and is implemented utilizing an ordinary video camera or camcorder in the preferred embodiment. The camera interface 235 is utilized to modulate the video output signal from the video camera 230 for transmission on the second, corn m u nication channel 227 to the video access apparatus 110 and, as discussed in greater detail below, the camera interface 235 also may be directly incorporated within the video camera 230.

Continuing to refer to FIG. 2, the video access apparatus includes a video network interface 210, a radio frequency (RF) modulator and demodulator 205 (also referred to as an RF modulatorldemodulator 205), a user interface 215, and a processor arrangement 190. The video network interface 210 is coupleable to the first communication channel 103 for reception of a first protocol signal, such as a n/4-DQPSK TDM signal, to form a received protocol signal; and for transmission of a second protocol signal, such as digital data in a TIDIVIA format, to form a transmitted protocol signal, such as a 7U4 DQPSK TDMA signal. These various protocol signals may also utilize protocols and modulation types other than those utilized within the CACS protocol such as, for example, more general PSK (phase shift keying) or QPSK (quadrature phase shift 13 keying) modulation methods, OFIDIV1 (orthegonal frequency division multiplexing), or QAM (quadrature amplitude modulation). Also, as used herein, input and output directions are defined to avoid confusion between incoming and outgoing signals because, for example, an incoming signal to the video access apparatus 110 from the network 140 will also be an outgoing signal from the video access apparatus 110 when transmitted to a video display 225 on the second communication channel 227. As a consequence, as used herein, input and output directions are defined at the interface between the video access apparatus 110, on the one hand, and the second communication channel 227 or audio 220, on the other hand, as follows: an input signal, such as an input video or audio signal, is input to the video access apparatus 110 from the second communication channel 227 (or, in the case of input audio, from the audio 220), and may originate, for example, from the video camera 230, and will be transmitted from the video access apparatus 110 to the network 140; conversely, an output signal, such as an output video or audio signal, is output from the video access apparatus 110 to the second communication channel 227 (or, in the case of output audio, to the audio 220), and may originate, for example, from a remote location via the network 140, is received by the video access apparatus 110 via the first communication channel 103, and will be transmitted or output by the video access apparatus 110 on the second communication channel 227 to a video display 225 or output to audio 220.

Continuing to refer to FIG. 2, the RF modulator and demodulator 205 is utilized to convert a baseband output video signal (from the processor arrangement 190) to a radio frequency output video signal, for transmission on the second communication channel 227 and reception by one or more of the video displays 225, and to convert a radio frequency input video signal (from the camera interface 235) to a baseband input video signal, for input to the processor arrangement 190.

14 The user interface 215 is utilized for reception of a control signal of a plurality of control signals, such as a request to place a telephony call, a request to place an audio and video conference call, and other control signals such as alerting signals of incoming telephony or audio and video conference calls. The processor arrangement 190 is coupled to the video network interface 210, to the radio frequency modulator/demodulator 205 and to the user interface 215. As explained in greater detail below, the processor arrangement 190 may be comprised of a single integrated circuit ("IC"), or may include a plurality of integrated circuits or other components connected or grouped together, such as microprocessors, digital signal processors, ASICs, associated memory (such as RAM and ROM), and other ICs and components. As a consequence, as used herein, the term processor arrangement should be understood to equivalently mean and include a single processor, or arrangement of processors, microprocessors, controllers, or some other grouping of integrated circuits which perform the functions discussed in greater detail below. For example, in the preferred embodiment, the processor arrangement 190 is implemented as illustrated in FIG. 3, and includes a communications ASIC (application specific integrated circuit) 250, an audio/video compression and decompression subsystem 265, and a microprocessor subsystem 260. As discussed in greater detail below, the methodology of the present invention may be programmed and stored, as a set of program instructions for subsequent execution, in the processor arrangement 190 and its associated memory and other equivalent components. In the preferred embodiment, the processor arrangement 190 is utilized, in conjunction with a stored set of program instructions and in response to any control signals entered by the user or received from the network 140, first, to convert the received protocol signal (from the video network interface 210) both to a baseband output video signal (to be modulated by the RF mod u lator/demodu lator 205 and transmitted to a video display 225) and to an output audio signal (transmitted to the audio 220 or combined with the baseband output video signal, modulated and transmitted to the video display 225); and second, to convert both a baseband input video signal (the demodulated input video signal having originated from the camera interface 235) and an input audio signal (from the audio 220 or combined with the baseband input video signal having originated from the video camera 230 and the camera interface 235), to the second protocol signal (to be modulated and transmitted by the video network interface 210 to the network 140). The functions of each of the components of the video access apparatus 110 are discussed in greater detail below with reference to FIGs. 3 - 10.

FIG. 3 is a high level block diagram illustrating a second embodiment of a video access apparatus, namely, video access apparatus 150, and illustrating a second embodiment of a video conferencing system 300, in accordance with the present invention. The second apparatus embodiment, namely, the video access apparatus 150 illustrated in FIG. 3, is the preferred apparatus embodiment of the invention, and is in all other respects equivalent to and may be utilized in a manner identical to the first embodiment, video access apparatus 110, illustrated in FIGs. 1 and2. Similarly, the second embodiment of the video conferencing system, video conferencing system 300, is also the preferred system embodiment of the present invention, and is in all other respects equivalent to and may be utilized in a manner identical to the first embodiment, video conferencing system 200, illustrated in FIG. 2.

As illustrated in FIG. 3, the video access apparatus 150 includes a microprocessor subsystem 260, an audiolvideo compression and decompression subsystem 265, and a communications ASIC 250, which form the processor arrangement 190 discussed above with reference to FIG. 2.

The video access apparatus 150 also includes a CATV radio 16 frequency (RF) transceiver 245 (which equivalently functions as the video network interface 210 illustrated in FIG. 2), a userlaudio interface 255 (which equivalently functions as the user interface 215 illustrated in FIG. 2); and an RF modulator 270 and RF demodulator 275 (which together equivalently function as the RF modulatorldemodulator 205 illustrated in FIG. 2). The preferred embodiment of the video access apparatus 150 illustrated in FIG. 3 also includes a first directional coupler 280, a second directional coupler 290, and a filter 285. Also as mentioned above, when a data rate may be needed which is higher than that which may be accommodated by all available time slots per carrier, additional CATV RF transceivers 245 may also be utilized to provide additional time slots on additional carriers. The functions of each of these components is explained in greater detail below.

Also as illustrated in FIG. 3, the second embodiment of a video conferencing system 300 includes (as an audio interface) one or more telephones 2951 through 295n (individually and collectively referred to as telephone(s) 295, and which telephones 295 equivalently function as the audio 220 illustrated in FIG. 2); the video access apparatus 150; a video camera 230; a camera interface 235 (which also may be combined or incorporated within the video camera 230); one or more televisions 2401 though 240n (which are individually and collectively referred to as television(s) 240, and which equivalently function as the video displays 225 illustrated in FIG. 2); and a second communications channel 227 which, as mentioned above, is preferably a coaxial cable in the user (or subscriber) premises.

Referring to FIG. 3, the video access apparatus 150 provides both telephony (POTS) and audiolvideo conferencing service using common household appliances for interaction with the user (or subscriber) in the video conferencing system 300, such as telephones 2951 through 295n for entry of control signals and for audio input and output; video camera 230 for 17 video input (such as a video camcorder); and television(s) 240 for video output (as or in lieu of video displays 225). When providing POTS service, the video access apparatus 150 interfaces with the typical, existing twisted-pair cabling 294 in the user (or subscriber) premises so that any telephone in the user premises, such as telephones 2951 through 295n, may be used. The video access apparatus 150 also provides line current and traditional 'BORSHT11 functions for typical (POTS) telephone service, as explained in greater detail below.

When providing video conferencing service, any of the plurality of telephones 2951 through 295n (individually and collectively referred to as telephone(s) 295) may be used for call (conference) establishment or set up and for audio input and output. The radio frequency output video signal (from the video access apparatus 150) may be displayed on any of the televisions 240 connected to the second communication channel 227 (such as a CATV coaxial cable) within the user premises, using a vacant channel within the CATV downstream frequency band (for example, channel 3 or 4). The radio frequency output video signal is originally received from the network 140 in a modulated digital form, such as digital data modulated and encoded utilizing a protocol such as CACS, which may be referred to as a received or first protocol signal. The first protocol signal is received over the audiolvideo network 100, having been transmitted via, for example, the primary station 105 and the network 140, from another, second user premises. The first protocol signal, typically consisting of compressed digital data, is received by the video access apparatus 150, which decompresses the data and converts it to a baseband output video signal, such as an NTSCIPAL composite video signal (NTSC being a video format typically utilized in North America and Japan, with PAL being a video format typically utilized in Europe). This baseband output video signal (on line 271) is then RF modulated (using RF modulator 270)onto an available video RF carrier and injected into the 18 second communication channel 227 (aj., coaxial cable) at the user premises using a directional coupler 290 (preferably 4 port). The radio frequency output video signal is then sent to all television receivers, such as televisions 240, within the user premises, such as a home or office. The directional coupler 290 is used in the preferred embodiment to provide directional signal injection while providing isolationwith any connected CATV network.

The video signal originating in the user premises and to be transmitted via the primary station 105 and the network 140 to another, second user premises (or other location), originates from a video camera (or camcorder) 230 that produces a video signal, such as an NTSC/PAL composite video signal, which is also preferably modulated on channel 3 or 4 (61.25 or 67.25 MHz). This RF video signal from the video camera 230 is connected or coupled to a camera interface 235, which utilizes an offset mixer to shift the RF video signal (typically on a 61.25 or 67.25 MHz carrier) up to a spectrum higher than typical CATV frequencies, such as the 1.2 GHz or 900 MHz bands. For those video cameras 230 which may not include a modulator to shift the NTSC/PAL composite video signal to channel 3 or 4, such modulation may be incorporated into the camera interface 235; conversely, the functions of the camera interface 235 may also be incorporated directly into the video camera 230. The shifted video signal from the camera interface 235, referred to as a radio frequency input video signal, is then injected into the same second communication channel 227 (also connected to the televisions 240) which transmits the radio frequency input video signal back to the video access apparatus 150. The video access apparatus 150 receives the radio frequency input video signal from the directional coupler (at 1.2 GHz or 900 MHz) and demodulates the signal to baseband using RF demodulator 275, to form the baseband input video signal (on fine 272). The baseband input video signal is then converted to digital form and compressed, to form a second protocol signal, 19 such as a TIDIVIA signal, and is then n14-DOPSK modulated (to form a transmitted protocol signal) and transmitted over the audiolvideo network 100. In the preferred embodiment, by using a vacant video channel at 1.2 GHz or 900 MHz, interference with the downstream and upstream CATV services tends to be avoided. The 1.2 GHz or 900 MHz signal is also filtered out of the feed-through cable or link 287 by a low pass filter 285, so that the signal is highly attenuated before it may leave the video access apparatus 150.

While the primary function of the video access apparatus (or 150) and the video conferencing system 200 (or 300) is to provide full-duplex video communications, other secondary functions are also available in the preferred embodiment. For example, one such secondary function is a "loop back function" which allows the user to view the video from the video camera 230 on the screen of a television 240 or video display 225, such that the RF input video signal is demodulated (from 1.2 GHz or 900 MHz), remodulated onto a video RF carrier, and utilized for the RF output video signal. Such a loop back feature is especially valuable for surveillance, such as for home security or for baby monitoring. Also, a picture-in-picture (or multiple window) function may be provided, in which a user may view a small window of the video from video camera 230 along with the received video from another location, for example, to provide baby monitoring within the small window while simultaneously watching a movie or video received from a CATV network.

In addition, the video access apparatus 110 (or 150) may be frequency agile, such that video conferencing may occur on any channel. While video conferencing on typically empty channels such as channels 3 or 4 may be preferable, in accordance with the present invention, video conferencing on additional channels is also feasible. For example, an existing video channel may be blanked out or eliminated, utilizing a notch filter, for any length of time, and the various input and output video signals inserted or overlaid into the now empty (filtered or muted) channel. Such frequency agility and injection of an audio/video signal, in the presence of existing programming, is one of many truly unique features of the present invention.

FIG. 4 is a block diagram illustrating a CATV RF transceiver 245 of the preferred apparatus embodiment in accordance with the present invention. In the preferred embodiment, the CATV RF transceiver 245 is frequency agile, providing upconversion and downconversion of the CACS signals to and from any available CACS carrier, with frequency control provided by the microprocessor subsystem 260.

Referring to FIG. 4, a first protocol signal, such as a CACS n14 DQPSK modulated downstream carrier in the 50 - 750 MHz CATV band, is received from the first communication channel 103 and filtered in the filter 305 (having a 50 - 750 MHz bandwidth), and in heterodyne downconverter 310, is heterodyne downconverted to baseband, with this incoming baseband signal having iriphase ("I") and quadrature ( f1QU) components (or signals). The local oscillators for the heterodyne downconverter are provided by a frequency synthesizer subsystem 315. The 1 and Q components are then square root raised cosine ('5RRW) filtered in a first SRRC filter 320 to remove noise and other distortions. The filtered 1 and Q components are then mixed up to an intermediate frequency (IF) signal at 1.2 MHz, in the up mixer 325, for transfer to the communications ASIC 250 on bus 261 (or on another line connecting the up mixer 325 to the communications ASIC 250).

In the preferred embodiment, the CACS carrier has a symbol rate of 384 kNosymbols/second and is transmitted with an excess bandwidth factor of 0.5, and with an occupied channel bandwidth of 600 kHz.

Continuing to refer to FIG. 4, a second protocol signal, such as a 768 kb/s TDMA burst, originating from the communications ASIC 250, is applied to a ri14-DQ13SK 21 waveform generator or modulator 330, which outputs baseband 1 and Q components (signals). The 1 and Q signals are SIRRC filtered (in second SRIRC filter 335) and then upconverted in RF upconverter 340 to the 5 - 40 MHz CATV upstream band, to form a transmit (or transmitted) protocol signal. As in the downconverter 310, local oscillators for the RF upconverter 340 are provided by the frequency synthesizer subsystem 315. The transmit power of the TDMA burst is programmable by the microprocessor 350 of the microprocessor subsystem 260 (discussed below with reference to FIG. 5) to provide network gain control, by the audiolvideo network 100, over any individual video access apparatus 110 or 150 connected to the audiolvideo network 100.

FIG. 5 is a block diagram illustrating a microprocessor subsystem 260 and communications ASIC 250 of the preferred apparatus embodiment in accordance with the present invention. The communications ASIC 250 is utilized in the preferred apparatus embodiment to provide low-level baseband functions to support a protocol such as CACS.

Functionally, communications ASIC 250 may be separated into a receive section and a transmit section (not separately illustrated in FIG. 5). In the receive section, the IF signal at 1.2 MHz (from the up mixer 325 of the CATV transceiver 245), contains the ni4-DQPSK modulated.CACS signal. This downstream CACS ri/4-DQPSK TDM signal is coherently demodulated, to provide baseband binary data as well as recovery of symbol and bit timing information. A TDM frame is then synchronized and decoded, time slot data is extracted, and error control checking is performed. Such supervisory data, as well as user data in the payload, is then made available to the microprocessor subsystem 260 via the bus 261, which may be an address/data bus. The user data may also be directly routed out of the communications ASIC 250 for delivery to the audio codec 410 (FIG. 7) or the audiolvideo compression and decompression subsystem 265 (FIG. 6). In the transmit 22 section of the communications A SIC 250, control data originating from the microprocessor 350, and compressed audio and video data from the audiolvideo compression and decompression subsystem 265, are transferred to the communications ASIC 250, to create an audio/video data stream. The audiolvideo data stream is then formatted with synchronization and error control information, resulting in binary TDIVIA bursts, which are then transferred to the CATV transceiver 245 for subsequent modulation and transmission as a transmitted protocol signal over the first communication channel 103. In the preferred embodiment, the communications ASIC 250 also provides other functions to support the video access apparatus 150, including TDIVIA time alignment, sleep mode control for low power operation, data buffering for rate control, and interrupt generation of POTS interface control signals.

Continuing to refer to FIG. 5, the microprocessor subsystem 260 consists of a microprocessor 350 or other processing unit, such as the Motorola MC68LC302, and memory 360, which includes random access memory (RAM) and read-only memory (ROM), with communication to the communications ASIC 250 and the audiolvideo compression and decompression subsystem 265 provided over the bus 261. The read only memory portion of memory 360 also utilizes flash programmable memory, such that the memory contents may be downloaded over the audiolvideo network 100 using a protocol such as CACS. As a consequence, different versions of operating software (program instructions), such as upgrades, may implemented without modifications to the video access apparatus 150 and without user intervention.

Continuing to refer to FIG. 5, the microprocessor subsystem 260 provides device control and configuration, as well as higher layer CACS functions, such as call processing, and is also used to implement an ISDN protocol stack when required for video calls. Because the microprocessor 23 subsystem directly interfaces with the communications ASIC 250 with access to the CACS channel user data, a high speed data link may be established between the communications ASIC 250 and the audiolvideo compression and decompression subsystem 265 using the microprocessor subsystem 260 as the data exchange and protocol conversion device. User audio, in the form of a pulse code modulated (PCM) data stream, may also be routed through the microprocessor 350 to the audiolvideo compression and decompression subsystem 265 from the DSP 415 of the userlaudio interface 255.

FIG. 6 is a block diagram illustrating an audiolvideo compression and decompression subsystem 265 of the preferred apparatus embodiment in accordance wdh the present invention. The audiolvideo compression and decompression subsystem 265 performs video compression of the baseband input video signal (originating from the video camera 230 and camera interface 235), and decompression of the video data contained in the payload of the received, demodulated first protocol signal (such as a CACS signal), for subsequent display on the television(s) 240. The audiolvideo compression and decompression subsystem 265 includes a video processing digital signal processor (DSP) 365, a red green-blue digital to analog converter 370, a red-green-blue analog to digital converter 390, an encoder 375, and an audiolvideo input processor 380. The video processing DSP (or video processing DSP subsystem) 365 is a high-speed programmable DSP (or DSP arrangement or subsystem, such as a Motorola MC56303 with associated support components, inciuding memory and a hardware acceleration ASiC (discussed below)), utilized to implement different video and audio compression and decompression algorithms, depending on the transmission rate andlor video conferencing standard at the remote end (Le., the other premises with which the video access apparatus is communicating). The program code for the 24 video processing DSP 365 may also be downioaded from the microprocessor subsystem memory 360, which may also be downloaded through the audiolvideo network 100 using a protocol such as CACS. As a consequence, video functionality of the video access apparatus 150, including new algorithms, may be changed or upgraded on-the-fly, also without any hardware changes and without user intervention.

Continuing to refer to FIG. 6, compressed video data received from the network 140 (as, for example, a n14-DOPSK TDM CACS protocol signal), having been previously demodulated, demultipJexed and reformatted into video data by the communications ASIC 250 and the microprocessor subsystem 260, is transferred to the video processing DSP 365 where it is decompressed and converted to red-green- blue ("RG13") digital video signals. The RGB digital video signals are then converted to RGB analog signals, by the RGB digital to analog ("DIA") converter 370, such as the Motorola MC44200. The analog RGB signals, along with a composite synchronization signal, are then applied to an encoder 375, preferably an NTSCIPAL encoder such as a Motoroia MC1 3077, resulting in an NTSC/PAL composite video signal, which may also be referred to as a baseband output video signal. The NTSC/PAL composite video signal is then transferred to the RF modulator 275 for upconversion to a radio frequency (to form the radio frequency output video signal), followed by transmission on the second communications channel 227 and display on the television 240.

For subsequent transmission over the network 140 of an input video signal (originating from the video camera 230 and the camera interface 235), a baseband input video signal, such as an NTSC/PAL composite video camera or camcorder signal, is received from the RF demodulator 270. The baseband input video signal is transferred to an audiolvideo input processor 380, such as a Motorola MC4401 1, which converts the baseband input video signal to analog RGB signals, while also providing a genlocked sampling clock for subsequent digitizing of the video signals. These input analog RGB signals are then converted to digital RGB signals by a RGB analog to digital converter 390, such as the Motorola MC44250, and transferred to the video processing DSP 365. The video processing DSP 365 compresses the digital RGB signals, and transfers the resulting data stream to the communications ASIC 250 or microprocessor subsystem 260 for protocol encoding and modulation, for subsequent delivery to the network 140. In the preferred embodiment, the audiolvideo compression and decompression subsystem 265 may also include additional random access memory for use by the video processing DSP 365 for partial or full storage of pixel data of an inputloutput video frame. Also in the preferred embodiment, a hardware acceleration ASIC is used to assist the video processing DSP 365 in processing speed intensive tasks, such as discrete cosine transforms associated with the compression and decompression processes.

FIG. 7 is a block diagram illustrating a user audio interface 255 of the preferred apparatus embodiment in accordance with the present invention. The user audio interface 255 is designed to interface with standard household telephone sets, including wireless devices and speakerphones, such as telephones 2951 through 295n. The user audio interface 255 is intended to support both audio POTS calls and video calls. In the preferred embodiment, POTS calls are processed in a "transparent" mode, such that placing and receiving telephone calls occur as if no video call functions were present. Also in the preferred embodiment, video calls are processed as an exception, requiring a designated or predetermined dialing sequence entered by the user to invoke a video call.

Referring to FIG. 7, a SLIC (Subscriber Loop Interface Circuit) 400 provides 'BORSHT11 functions for telephone service within the user premises, such as that normally provided by a 26 network central office, including DC (direct current) power for the telephone LBattery); Qvervoltage protection; fling trip detection and facilitation of ringing insertion; _$upervision features such as hook status and dial pulsing; tlybrid features such as two-wire differential to four-wire single-ended conversions and suppression of longitudinal signals at the twowire input; and jesting. The SLIC 400 communicates with the telephones 2951 through 295n through an ordinary telephone line, such as twisted pair cabling 294, which has tip and ring lines. The ring generator 405 provides high-voltage AC (alternating current) signals to ring the telephones 2951 through 295n. Connected to the SLIC 400, the audio codec 410 provides analog-to-digital conversion for voice digitizing of the input (voice) audio signal originating from the microphone portion of one or more of the telephones 2951 through 295n, to form an input (PCM) digital voice data stream or signal, and digital-to- analog conversion for voice recovery from an output (PCM) digital voice data stream or signal (to create the output audio signal to the speaker portion of the telephones 2951 through 295n), and well as band limiting and signal restoration for PCM systems. The output and input (PCM) digital voice data streams connect directly to the voice processing DSP 415. The voice processing DSP 415, such as a Motorola MC56166, contains program memory and data memory to perform signal processing functions such as DTIVIF/dial pulse detection and generation, call progress tone (dial tone, busy tone) generation, PCM- to-linear and linear-to-PCM conversion, and speech prompt playback. The voice processing DSP 415 may also provide V.34 and V.34bis modem functions to additionally support POTS or other analog-based video calls. The voice processing DSP 415 interfaces with the microprocessor subsystem 260 and the communications ASIC 250 over the bus 261. The memory 420 (connected to the voice processing DSP 415), in the preferred embodiment, includes high density read only memory (referred to as speech ROM) containing PCM 27 encoded (or compressed) speech segments used for interaction with the user, such as in prompting the user for keypad DTMF or dial pulse entry when in the video calling mode. In addition, optional speech random access memory may be used for user voice storage functions, and electrically alterable, programmable non-volatile (flash) memory for storage of programs (and updates) or algorithms.

The user audio interface 255, in the preferred embodiment, operates in one of two modes, first, for telephony (POTS), and second, for video conferencing (calling). The telephony (POTS) mode is user transparent, as a default mode which is entered whenever the user goes off hook. As discussed in greater detail below, the video conferencing mode is entered as an exception, through the user entering (dialing) a specific, predetermined sequence which, in the preferred embodiment, is not recognized as a telephony sequence. In the telephony (POTS) mode, the voice processing DSP 415 generates the customary "dial" tone when the user telephone (of the telephones 2951 through 295n) goes off hook. The user then enters the diaiing sequence, just as in known or customary telephone dialing. The voice processing DSP 415 decodes the dialing digits and stores them in a calling memory buffer of memory 420. Upon decoding the first two digits entered (which are not the first two digits of the specific predetermined video call sequence), the voice processing DSP 415 recognizes that the requested call is not a video call and, as a consequence, signals the microprocessor subsystem 260 to initiate a POTS call through the audiolvideo network 100 using a protocol such as CACS. When the call is granted (by the network 140) and the audio link with the local digital switch 135 is established, the voice processing DSP 415 forwards the stored digits to the local digital switch 135 and connects the audio paths between the user's telephone(s) and the network 140. From this point on, the voice processing DSP 415 will not decode any dialed digits and will simply pass through the input and output PCM 28 digital voice data stream, until the user's telephone goes on hook and the call is terminated.

Alternatively for a telephony session, the audio/user interface 255 may create or maintain a connection to a central office of a network 140, to provide transparency for telephony. Once the entry of the specific predetermined sequence for video mode is detected, the audioluser interface 255 breaks or terminates the central office connection, and enters video mode, under local control of the video access apparatus 150 (or 110).

As indicated above, the user initiates the video conferencing mode as an exception to the normal telephony mode, by entering a specific predetermined sequence which is recognized by the voice processing DSP 415 as a non- telephony sequence and, additionally in the preferred embodiment, as the predetermined sequence specific to the video mode. This methodology is also discussed below with reference to the flow chart of FIG. 12. For the video conference mode of the preferred embodiment, the first two digits of the specific, predetermined sequence are unique and specifically unused in a standards POTS call, such as " % and as a consequence, may specifically signal the audio voice processing DSP 415 to enter the video call mode. Alternatively, other specific, predetermined sequences could be programmed by the user for recognition as a video conference mode by the voice processing DSP 415. Immediately after decoding the two special digits or other specific predetermined sequence, the voice processing DSP 415 generates or plays a speech prompt sequence, such as "Please select a call option or press the W key for help", which is stored in the speech ROM portion of memory 420. The action taken by the voice processing DSP 415 will then depend upon the sequence entered or key pressed by the user following the initial prompt. For example, if theW key is pressed, the user may hear a menu of commands such as, for example, the following:

29 place a Directory call, press " 'To update the call Directory, press 7' 'To place a manual video call, press W 'To mute the camera, press 4" 'To view the camera on your television, press Y '7o hear this menu again, press #' Thus, in the preferred embodiment, an automated and user friendly prompting sequence is used to guide the user through placing a video conference call. Once the entry is complete, the information is then passed from the voice processing DSP 415 to the microprocessor subsystem 260, which will then attempt to connect the call through the network 140. If successful, the audio paths (input and output audio signals) will be connected through to the telephones 2951 through 295n, the output video path will be connected through to the television 240 or other video display 225, and the input video path will be connected from the camera interface 235 (originating from the video camera 230). Alternatively, under user control, the output audio path may also be connected to a television 240, for broadcast over the speakers within the television(s) 240, and the input audio path may also originate from a microphone within the video camera 230 and be connected via the camera interface 235. This alternate path may be particularly useful when the user desires to video tape the video conference, for example, utilizing an ordinary VCR coupled to the television 240. The v'ideo call terminates when the telephone goeson hook, or another control signal is entered via the user interface 215 or userlaudio interface 255.

It should be noted that in the preferred embodiment, a simple directory feature may be used to simplify the video calling process. For example, after the user goes off hook and presses the'' key three times followed by a single digit 11 1, 2'...'9', a call automatically may be placed using a sequence of numbers stored in the directory for that digit. This feature may be necessary or desirable under a variety of circumstances, for example, when an ISDN call may require the entry of two separate 1 0-digit numbers to connect the call through the network 140. Also as an option in the preferred embodiment, a more sophisticated system may store a simple name tag or other alphanumeric entry associated with the directory entry, created by the user, and played back to the user by the voice processing DSP 415. For example, a prompt in response to making a directory call may be: 'To call 'grandma', press 1 "; "To call 'mother', press T; 'To call 'work', press 311; in which the speech segments "grandma", "mother", and "work" are spoken by the user, recorded and stored in memory 420. More sophisticated systems may include speakerlvoice recognition techniques, to recognize the user selection, eliminating the need to press any keys on a telephone keypad or other manual entry of information into the user interface 215 or userlaudio interface 255. It should also be noted that video call control functions, such as camera muting, unmuting, and local playback (loop back), also may be selected with the same user interface. Other sophisticated systems may also include use of the video display 225 or television 240 for on-screen visual display of a menu of options, with corresponding entry of user control signals, such as call control and placement information, occurring in a variety of ways, such as through the keypad of the telephones 295, through a infrared remote control link with the video access apparatus 150 (or 110), or through the input video path via the second communication channel 227. These various methods of user prompting, on-screen display, and user feedback are especially useful to guide the user through the process of placing a video call, and help to make the audio video conferencing system 300 (or 200) especially userfriendly. In addition, these various methods also illustrate the 'Iri-ality" of the use of a telephone 295 in the preferred 31 embodiment, for telephony, for audio input and output, and for call control.

FIG. 8 is a block diagram illustrating an RF modulator 270 of the preferred apparatus embodiment in accordance with the present invention. The RF modulator 270 converts the baseband output video signal from the audio/video compression and decompression subsystem 265, such as an NTSCIPAL composite video signal, to a radio frequency output video signal, such as an amplitude modulated vestigial sideband RF signal, which may be viewed via the receiver of the user's television 240, for example, when tuned to channel 3 or 4. The RF modulator 270 may be implemented in a variety of ways, including through use of a video modulator 425, such as a Motorola MC1373, followed by a gain stage (amplifier) 430, utilized in the preferred embodiment to overcome losses from the directional coupler 290 which feeds the RF output video signal into the second communication channel 227, such as the coaxial cable system in the user premises. A switchable notch filter may also be used to remove current programming from a particular channel (RF video carrier), while inserting the radio frequency output video signal into the second communication channe1227.

FIG. 9 is a block diagram illustrating an RF demodulator 275 of the preferred apparatus embodiment in accordance with the present invention. In the preferred embodiment, the RF demodulator 275 is a full heterodyne receiver tuned to a specific channel in the 900 MHz band or 1.2 GHz band, to receive the radio frequency input video signal from the camera interface 235 (originating from the video camera 230). The radio frequency input video signal, fed into the RF demodulator '275 from the directional coupler 290, is bandpass filtered (at either 900 MHz or 1. 2 GHz) in pre-fifter 435, then mixed down to an intermediate frequency (IF) of, for example, 45 MHz, using the mixer 440 and a fixed reference oscillator 445. The signal is then surface acoustic wave (SAW) filtered by the SAW filter 450, or othe-iwise bandpass filtered, and transferred to a (color) TV IF subsystem 460, such as a Motorola MC44301, which provides amplification, AM detection (demodulation), automatic fine tuning, resulting in a baseband input video signal (baseband composite input video signal). This baseband input video signal is then transferred to the audio/video compression and decompression subsystem 265 for further processing as discussed above.

FIG. 10 is a block diagram illustrating a camera interface 235 of the preferred apparatus embodiment in accordance with the present invention. The camera interface 235 is used in conjunction with a video camera (or camcorder) 230 that outputs its signal as an RF video carrier on channel 3 or 4 (61.25 or 67.25 MHz), and is used to upconvert the video carrier to an RF carrier at 900 MHz or 1.2 GHz without demodulation and modulation of the video signal. As illustrated in FIG. 10, the input video signal from the video camera 230 is mixed up to the required output frequency using an offset mixer 465, a fixed reference oscillator 470, and a bandpass filter 475.

Not illustrated in FIG. 10, if additional input video signals are desired from, for example, additional video cameras, the input video signals may also be multiplexed. This feature may be desirable, for example, when the system is to be used for surveillance of multiple points or locations, or when the user desires to transmit additional windows or screens within screens.

Alternatively, as mentioned above, the camera interface 235 may be directly incorporated within the video camera 230. In addition, for those video cameras producing a NTSC/PAL composite video signal (rather than an RF video carrier on channel 3 or 4), an additional stage may be added within the dchmera interface 235 to modulate the NTSC/PAL composite video signal to an RF video carrier prior to offset mixing by offset mixer 465, or in lieu of offset mixing, directly modulating 33 the NTSC/PAL composite video signal to 900 MHz or 1.2 GHz to form the RF input video signal.

Not illustrated in the various apparatus diagrams, the video access apparatus 110 (or 150) may be dual powered, deriving supply voltages from both power provided by the audiolvideo network 100 and power provided by the user premises. The power provided by the audiolvideo network 100 is used for those circuits that support basic telephony (POTS) service. The power provided by the user premises is used for those circuits that support video. Alternatively, the video access apparatus 110 (or 150) may be completely powered by the audiolvideo network 100. As a consequence, if a power failure occurs in the user premises, basic telephony service may still be operational, or if fully powered by the audiolvideo network 100, complete audiolvideo conferencing and telephony may still be operational.

FIG. 11 is a flow diagram illustrating the method of the preferred embodiment of the present invention. As illustrated in FIG. 11, the method begins, start step 500, with receiving a first protocol signal, such as a CACS signal, to form a received protocol signal, step 505. In the preferred embodiment, step 505 is performed in the video network interface 210 or in the CATV RF transceiver 245. Next, in step 515, the received protocol signal is converted to a baseband output video signal and an output audio signal. In the preferred embodiment, step 515 is performed by the video network interface 210 and processor arrangement 190, or by the CATV RF transceiver 245, the communications ASIC 250, and the microprocessor subsystem 260. In the preferred embodiment utilizing audio 220 or telephones 295 for audio output and input, an important feature of the present invention is the independence of the output audio signal from the output video signal. In the event that a television 240 or other video display 225 is also to be used for audio output, the output audio signal may be combined with the baseband output video signal (rather than 34 separating out the audio portion and separately routing it to audio 220 or telephones 2951 through 295n). Next, in step 525, the baseband output video signal (and possibly output audio signal as well) is modulated to form a radio frequency output video (and audio) signal, also referred to as a composite output video signal, and in step 535, the RF output video (and audio) signal is transmitted. in the preferred embodiment, steps 525 and 535 are performed by the RF modulator/demodulator 205 or the RF modulator 270.

Concurrently with steps 505, 515, 525 and 535 (involving receiving (at a local location) video conference information transmitted from another location, such as a remote location), in the preferred embodiment, steps 510, 520, 530 and 540 are also occurring (involving transmitting (from a local location) video conference information to another location, such as a remote location). In step 510, a radio frequency input video signal and an input audio signal are received. As indicated above, in the preferred embodiment, the input video signal and input audio signal are each independent of the other. In the preferred embodiment, the radio frequency input video signal is received by the RF demodulator 275 or the RF mod u latorldemoduiator 205 from the camera interface 235, and an input audio signal is received by either the audio 220 and user interface 215, or telephones 2951 through 295n.

Alternatively, the input audio signal may also be received by a microphone in the video camera 230 and included as part of the RF input video signal from the camera interface 235. Next, preferably in the RF demodulator 275 or the RF modulatorldemodulator 205, in step 520 the RF input video (and possibly audio) signal is demodulated to form a baseband input video (and possibly audio) signal. In step 530, the baseband input video signal and the input audio signal are converted to a second protocol signal, such as a TDMA format signal, preferably by the processor arrangement 190, or by the microprocessor subsystem 260 and the communications ASIC 250. In step 540, the second protocol signal is modulated and transmitted to form a transmitted protocol signal, such as a n14 MPSK TDMA signal (an upstream CACS signal), preferably by the video network interface 210 or the CATV RF transceiver 245. Following steps 535 and 540, when the video conference has been terminated, step 545, such as by going on hook, the process may end, return step 550, and if the video conference has not been terminated in step 545, the method continues, returning to steps 505 and 510.

FIG. 12 is a flow diagram illustrating the telephony and video conference control methodology in accordance with the preferred embodiment of the present invention. FIG. 12 also illustrates the multiple roles of a telephone, such as telephones 2951 through 295n, in the system of the present invention, including providing telephony (POTS), providing video call control, and providing the audio portion of the video conference. Referring to FIG. 12, beginning with start step 600, a request for service is detected, step 605, such as going off hook or receiving an incoming alert signal. Next, in step 610, a user indication or alert is provided, such as a dial tone or an incoming ring signal, and signaling information is collected, such as DTIVIF digits of a phone number or "". When a video conference has been requested in step 615, such as through entry of "" or receipt of an incoming message from the network 140, then the method proceeds to step 635. When a video conference has not been requested in step 615, the method proceeds to request or set up a telephony call, such as generating DTIVIF tones and connecting an audio path between the user's telephone and the network 140, step 620, followed by entering the transparent telephony mode and transmitting audio (typically PCM) data to the network 140, step 625. The audio data will typically be CACS encoded from the video access apparatus 110 (or 150), and transformed into an appropriate format (g-.L, ISDN, POTS, etc.) by the primary station 105 for transmission to the network 140. When the 36 telephony call is terminated, step 630, the method may end, return step 660.

Continuing to refer to FIG. 12, when a video conference has been requested in step 615, the method proceeds to step 635 and initializes the video conference control system, such as playing an initial speech prompt as discussed above. Next, in step 640, the video input request type is collected and the corresponding requested service is performed, such as originating a video conference call using a directory, updating a video conference call directory, manually originating a video conference call, muting an input (audio or video), providing loop back (Lg., local self-view such as monitoring or other surveillance), playing help or error messages or menu options, or exiting the video conferencing control system. In step 645, a video conference call is requested or set up (such as for an incoming video call), and in step 650, the video conference mode is entered, with protocol encoded audio and video data being transmitted to the network 140. When the video conference call is terminated in step 655, such as by going on hook, the method may end, return step 660.

Numerous advantages from the video access apparatus 110 and video access apparatus 150, and from the various video conferencing systems 200 and 300, are readily apparent. First, because the output video signal is modulated and transmitted over the second communications channel 227, such as over an entire coaxial cable within the user premises, the audio/visual conferencing and telephony system of the preferred embodiment may operate at more than one designated node or location within user premises, for example, utilizing any telephone and television within the user premises, providing multiple viewing points and multiple participation points. Such broadcast capability of the video conferencing functionality is truly unique to the present invention. In addition, the audio/visual conferencing and telephony system of the preferred embodiment may be mobile, utilizing the video 37 camera 230 and camera interface 235 from a myriad of locations within the user premises and, indeed, from anywhere the second communications channel 227 (such as a coaxial cable) may reach. As a consequence, the user is not confined to a single location, such as at a PC or in a dedicated conference room, for video conferencing capability. In addition, the system may be configured as needed for additional locations, for example, simply by adding or removing televisions and video cameras.

In addition, in accordance with the preferred embodiment, the audiolvisual conferencing and telephony system utilizes equipment typically found in consumers' homes or premises, such as existing televisions, video cameras or camcorders, and telephones. As a consequence, the system may be implemented at relatively low cost, especially compared to the currently available PC-based or stand alone video conference systems. In addition, and in contrast with prior art video conferencing systems, the system of the present invention is designed to be compatible for use with other existing video conferencing systems, for example, those which may utilize ISDN networks. Use of the present invention is not limited to cable systems of current CATV systems, but may be utilized with connections to ISDN (H.320), POTS (H.324), and other systems such as T1 and E1 networks. Moreover, the system of the present invention is user friendly, easy to install and use, and should be relatively less expensive for in-home purchase and use by consumers.

Another interesting feature of the apparatus and system embodiments of the present invention is the multiple functionality of the user interface, for example, the dual use of a telephone (as a user interface) for control of the video conference call and for the audio portion of the video conference call. This feature is also in stark contrast to the prior art systems, which typically require special switching and special network operations for call placement and call control.

38 Such duality is in addition to the concomitant us3 of the telephone for POTS service. Yet another significant feature of the preferred embodiment of the present invention is the transparency of telephony operation, such that a user need not be aware of the video conferencing capability to place or receive a telephone call.

Other special features of the preferred embodiment include dual network and premise powering of the video access apparatus, or complete network powering, enabling continued functionality even during power outages. Yet another significant feature of the present invention is the 9oop bacC operation, such that the same system may also be utilized for surveillance, such as baby monitoring, in addition to conferencing. Another significant feature of the present invention is the independence of the audio portion from the video portion of an audiolvideo conference. Lastly, the video conferencing capability illustrated is also protocol independent, such that other communication protocols may be utilized in lieu of or in addition to the CACS protocol of the preferred embodiment.

From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.

39

Claims

What is claimed is:

A video access apparatus, comprising: a video network interface coupleable to a first communication channel for reception of a first protocol signal to form a received protocol signal and for transmission of a second protocol signal to form a transmitted protocol signal; a radio frequency modulator and demodulator to convert a baseband output video signal to a radio frequency output video signal and to convert a radio frequency input video signal to a baseband input video signal; a user interface for reception of a first control signal of a plurality of control signals; and a processor arrangement, the processor arrangement coupled to the video network interface, to the radio frequency modulator and demoduiator, and to the user interface, the processor arrangement responsive, through a set of program instructions, and in response to the first control signal, to convert the received protocol signal to the baseband output video signal and to an output audio signal, the processor arrangement further responsive to convert the baseband input video signal and an input audio signal to the second protocol signal.
2. The video access apparatus of claim 1 wherein the video access apparatus is coupled through a second communication channel to a video monitor for display of the radio frequency output video signal.
3. The video access apparatus of claim 1 wherein the video access apparatus is coupled, via the user interface, to a telephone for input of the input audio signal.
4. The video access apparatus of claim 1 wherein the video access apparatus is coupled, via the user interface, to a telephone for output of the output audio signal.
5. The video access apparatus of claim 1, further comprising a camera interface to receive an input video signal and to convert the input video signal to the radio frequency input video signal.
6. The video access apparatus of claim 1 wherein the video access apparatus is coupled, via the user interface, to a telephone for entry of the plurality of control signals.
7. The video access apparatus of claim 1 wherein the 15 processor arrangement is further responsive to remove a preexisting video signal from a radio frequency carrier, and via the radio frequency modulator and demodulator, to simultaneously inject the radio frequency output video signal onto the radio frequency carrier.
8. An audio and video conferencing system, the audio and video conferencing system coupleable to a communication channel for audio and video transmission and reception, the audio and video conferencing system comprising: 25 an audio interface; a video display; a video camera; a camera interface coupled to the video camera; a video access apparatus coupled to the audio interface, 30 and the video access apparatus coupled to the video display and to the camera interface via the communication channel.

41
9. A method of audio/video conferencing, the method comprising:

(a) receiving a first protocol signal to form a received protocol signal; (b) converting the received protocol signal to a baseband output video signal and an output audio signal; (c) modulating the baseband output video signal to form a radio frequency output video signal; (d) transmitting the radio frequency output video signal 10 and the output audio signal; (e) receiving a radio frequency input video signal and an input audio signal; (f) demodulating the radio frequency input video signal to form a baseband input video signal; (g) converting the baseband input video signal and the input audio signal to a second protocol signal; and (h) transmitting the second protocol signal to form a transmitted protocol signal.

42
10. A video access apparatus, comprising: a radio frequency transceiver coupleable to a first communication channel for reception of a 1r14-DOPSK modulated time division multiplexed signal, and for transmission of a n/4DQPSK modulated time division multiple access signal, a binary TDMA burst modulated by the radio frequency transceiver to form the 7d4-DQPSK modulated time division multiple access signal,; a radio frequency modulator to convert a baseband NTSC/PAL encoded composite output video signal to a radio frequency amplitude modulated vestigial sideband output video signal; a radio frequency demodulator to convert a radio frequency amplitude modulated vestigial sideband input video signal to a baseband NTSC/PAL encoded composite input video signal; an audio/user interface for reception of a first control signal of a plurality of control signals, for reception of an input analog audio signal and conversion of the input analog audio signal to an input digital audio signal, and for conversion of an output digital audio signal to an output analog audio signal and for output of the output analog audio signal; a communications ASIC and a microprocessor subsystem coupled to the radio frequency transceiver and the audioluser interface, the communications ASIC and the microprocessor subsystem responsive, through a set of program instructions, and in response to the first control signal, to coherently demodulate and decode the n/4-DQPSK modulated time division multiplexed signal to form an output video signal digital data stream, the communications ASIC and the microprocessor subsystem further responsive to convert the input digital audio signal and a compressed input video signal data stream to the binary TDMA burst; and an audiolvideo compression and decompression subsystem coupled to the communications ASIC and the 43 microprocessor subsystem, and further coupled to the radio frequency modulator and to the radio frequency demodulator, the audio/video compression and decompression subsystem responsive, through a set of program instructions, to convert the baseband NTSC/PAL encoded composite input video signal to the compressed input video signal data stream, and to decompress and convert the output video signal digital data stream to the baseband NTSC/PAL encoded composite output video signal.