TELEPHONIC ADDRESSING FOR ESTABLISHING SIMULTANEOUS VOICE AND COMPUTER NETWORK CONNECTIONS
Background of the Invention
The present invention relates to establishing a communication session between users connected to a computer network, and more specifically, to exchanging computer networking data packets over a public data network simultaneously with a telephone call placed over a public telephone network.
Internetworking (i.e., the interconnection of many computer networks) allows the interaction of very large numbers of computers and computer users. The most well known example is the Internet. Computers connected to the Internet may be widely separated geographically and utilize many different hardware and software configurations. In order to achieve communication sessions between any two endpoints on the Internet, an addressing system and various standard protocols for exchanging computer data packets have been developed.
Each packet sent over the Internet includes fields that specify the source and destination address of the packet according to Internet Protocol (IP) addresses assigned to the network interface nodes involved. Currently assigned addresses comprise 32 bits, although future standards allow for 128 bit addresses. The 32 bit addresses are normally written by breaking the 32 bits into 4 groups of 8 bits each and writing the decimal equivalents of each group separated by periods (e.g., 208.25.106.10). Since numerical IP addresses are inconvenient to use and remember, a protocol for assigning and accessing logical names is used known as the domain name system (DNS). DNS servers are deployed within the Internet which perform a translation function between a logical domain name such as "sprint.com" and its numerical equivalent "208.25.106.10". After receiving an IP address back from a
DNS server, a computer can forward data packets to the IP address and establish a connection or session with the remote computer.
While the DNS system works well for hosted content (e.g., material made available for browsing by commercial and private entities), it is not well suited to ad hoc communications or exchanges of data between individuals. Hosting a website and registering an IP address within the DNS system is expensive and time consuming. Furthermore, due to an impending shortage of IP addresses and the cost for maintaining use of each IP address, many Internet service providers assign IP addresses dynamically to their individual users. In other words, when a user signs on to their service, they are temporarily assigned an IP address from an address pool assigned to their service provider. The user occupies that IP address only for their current session.
Even when individual users have their own static IP addresses, and when other users can remember the IP address of a user with whom they would like to establish a connection session over the Internet (e.g., for voice or video telephony), the need to configure their hardware or software is too complex for many μsers. This is one reason why e-mail is such a popular and successful Internet application. A mail server with an easy to remember domain name acts as intermediary between two individual users. Using a simple application program and the recipient's account name on the mail server (i.e., their e-mail address), text messages and computer files can be exchanged. The exchange, however, does not allow the users to interact in real time. Thus, there is a need for a way to allow two or more individual users to establish interactive connection sessions over the Internet without requiring overt knowledge of the other's IP address and without complicated configurations or set-ups. The present invention is also directed to advances in video telephony.
Video telephony comprises the exchange of both audio and video between a caller and called party. If video telephony were deployed on a large scale, it would dramatically improve user-to-user communications and provide greater efficiency to business entities. For example, large-scale video telephony would significantly reduce business
travel expenses. Unfortunately, the mass deployment of video telephony has failed to materialize.
Various systems have been developed to provide video telephony. For example, computers can be used to place video calls over the Internet (provided the IP addresses of both parties are explicitly known in advance). Alternatively, video telephones can be used to place video calls over video telephone networks. Unfortunately, these video telephony systems are complex to install and operate, and they are often expensive. The cost and complexity has inhibited mass deployment. There is a need for video telephony technology that is easier to install and operate, and that is more inexpensive than current systems.
In response to this need, the present invention provides a central server allowing two or more individual users to establish interactive connection sessions over the Internet without requiring overt knowledge of the other's IP address and without complicated configurations or set-ups. Each user registers with the central server, resulting in a database of users and their current IP addresses. A calling user sends a request to the central server to establish a connection with a called user. The central server can either relay all network message packets between the users for the duration of a "call", or it may provide the IP addresses to the users so that they can exchange packets directly. To reduce processing load and the corresponding size of the central server, the provider of the central server may find it preferable to provide the IP addresses to the calling and/or called users so that it does not have to act as intermediary for all packet exchanges (e.g., receiving each packet, detecting sources and intended destinations, and rewriting each packet header). Handing off the connection, however, may be impeded if the existing sessions include any firewalls.
Many different types of firewalls have been developed to block certain types of communication through the firewall. Blocking of particular packets within user traffic directed at the firewall can be performed based on several different criteria, such as IP address where the traffic originated, domain names of the source or
destination of the traffic, the protocol in which the traffic is formatted, and the port sending or receiving the traffic, among others. Firewalls can also perform proxy services or perform network address translation (NAT) or port address translation (PAT) in which a user's local (i.e., private) equipment IP address is translated into a global (i.e., public) IP address of the firewall, so that a particular computer is not directly accessible from outside the firewall.
In the presence of firewalls, some users may only be able to participate in a connection session that they initiate. Thus, a calling user may not be able to get any response to packets it sends to an IP address that it received from the central server. If a firewall is performing address translation, then the IP address reported by the central server is the global address of the firewall and not the local equipment address of the user. Thus, while the user behind the firewall will continue to communicate with the central server (since the user initiated that session when it signed on or registered with the central server), the user will not communicate with a calling user who sends a packet to the global address of the firewall.
The functions of identifying the called telephone number, forwarding a call request to the central server, and conducting a packet exchange during a data call are preferably performed by a specific software application program referred to herein as a call client. A particular call client may include provision for exchanging certain types of data for preselected purposes and according to predefined protocols. Sharing other types of data or other types of computer resources between the users may exceed the capabilities of any particular call client. Thus, it would be desirable to share such computer resources independently of the call client.
During a video telephony call, it would be desirable for the users to share additional types of data, such as still images or photographs, without requiring complicated set-up or installation or complex procedures.
The present invention further has application to conducting various types of transactions over a public data network. Electronic commerce and other uses of the Internet have rapidly developed. A user may navigate with their web browser to an
information provider's (e.g., a seller or manufacturer's) web page to view textual, audio, and graphic information about goods or services prior to making a selection decision or about how to operate, maintain, or repair previously purchased goods or services. While many people have accepted or even embraced the Internet, others are reluctant to use it for many reasons such as unfamiliarity, lack of understanding, worries over security of credit card information, or personal preference. In many instances, it may take a very experienced Internet user to find much of the information that is available. Inexperienced users may have difficulty locating the information they desire or even knowing where to look (i.e., knowing which websites or webpages are relevant).
A more traditional manner of obtaining information from providers of information, goods, or services has been by telephone enquiries (e.g., using toll-free telephone numbers). A drawback of the telephonic method is that information that can be provided is limited to audio information (either prerecorded or spoken by a live operator).
Thus, it would be desirable to provide a means of acquiring information with the simplicity of telephone calls while providing the ability to display video images to the calling person.
When the called user (e.g., information provider) has an auto-attendant (i.e., a computerized automated telephone response system) for screening and/or directing incoming calls to live human operators or agents, the IP address of the operator that will handle an incoming telephone call cannot be known in advance. Furthermore, it may be very desirable for both the auto-attendant and the live operator to provide video images to the calling user. Network data call set-up must take these factors into account.
Two Internet applications that have been implemented to provide real-time interaction between users are chat and instant messaging. In chat applications, a group of users access a chat server which relays communications from each individual user to each of the users in the group (i.e., a chat room). Thus, when one user types text
within the chat application on their computer, the text is transmitted to the chat server which then relays or forwards that text to all the users active in the particular chat session for displaying within the chat application on the active users' computers. Chat servers which utilize video data have also been realized. In a typical chat environment, chat rooms may be available to any users requesting access to them based on predefined subject areas of the chat rooms. Thus, a user can interact in the chat room with other users that they do not know in advance.
Instant messaging (IM) is similar to chat applications except that each user sets up a personalized contact list in advance of other users with whom they may wish to exchange instant messages. When an IM user goes on-line, their IM application sends a message to the IM server. The IM server identifies which other users in the user's contact list are also on line and then sends status messages to the user and the contacts, enabling any of them to initiate a private exchange of messages.
An important issue in Internet communications is the bandwidth or speed at which any particular connection operates. In the case of prior art video conferencing using the Internet (such as video chat), insufficiency of the bandwidth utilized for a video call has caused poor voice and picture quality. In the video telephony system of the present invention, the voice channel provides more than enough bandwidth to ensure that a good quality voice transmission is obtained. In addition, removing the voice data from the Internet transmission frees up more of the available bandwidth for the video data in that channel. Moreover, since the actual understanding of the video telephony conversations by the participants depends more on the voice signals than on the video signals, the overall satisfaction with video telephony is increased even when video quality may be somewhat lacking. Another important issue related to bandwidth is network latency, which is the delay between when a signal is sent by the sender and when it is received by the recipient. Such delays during a two-way communication can cause unnatural conversation. In the system of the present invention wherein voice signals and video
signals are delivered by separate communication channels, the further potential exists for reception of the signals to become unsynchronized.
Summary of the Invention
The present invention solves the problem of determining the IP address at which an Internet user can be reached by introducing a central server that stores information associating each registered user's IP address with identifying information well known or easily discovered by other users, namely their telephone number. In addition, a telephone call is established simultaneously with establishing the computer network session, thereby enhancing the user interaction regardless of the type of computer data to be exchanged (e.g., video frames, computer files, etc.). In one embodiment, the computer network session is automatically established in response to the act of dialing a person's number on a telephone. The present invention has the advantage of detecting the presence of firewalls for each user of the service and dynamically adjusting the call characteristics to enable point-to-point communication between the calling and called users whenever possible.
The present invention facilitates greater sharing of information, regardless of protocol or data format, by creating a virtual server on one user's computer for serving the shared information simultaneously to both users as clients of the virtual server. The present invention provides the ability to share prerecorded motion video (e.g., video and audio data from a digital camcorder) uploaded to one user's computer by streaming compressed data from a streaming video server simultaneously to both users as clients of the video server. The present invention has the advantage that control of video images transmitted to the calling user is transferred in conjunction with the transfer of the telephone call from an auto-attendant to a live operator.
The present invention provides an on-demand ordering system for goods or services employing a computer network communication session that is established
automatically in response to a telephone call made from a requestor to a provider. The computer network data call provides video images synchronized to menu selections presented by an automated telephone response system.
The present invention relates to enhancing communication between individual users of a computer network, such as the Internet, by providing a transition from a chat or instant messaging environment to a video telephony call.
The present invention further has the advantage that the voice and video presentation at the receiving end maintains synchronization by adapting to current latency conditions.
Brief Description of the Drawings
Figure 1 is a block diagram showing the interconnection of users over the Internet to the central server of the present invention. Figure 2 is a block diagram showing a user connection model of the present invention.
Figure 3 is a flow diagram of one preferred embodiment of the invention.
Figure 4 is a block diagram showing a first embodiment of packet flow.
Figure 5 is a block diagram showing a second embodiment of packet flow. Figure 6 is a block diagram showing an alternative embodiment of user equipment initiating the user connection of the present invention.
Figure 7 is a block diagram showing an alternative embodiment for initiating a telephone call portion of the user connection from within the computer network. Figure 8 illustrates a video telephony system in an example of the invention.
Figure 9 illustrates video system operation in an example of the invention.
Figure 10 illustrates a user system in an example of the invention.
Figure 11 illustrates user system operation in an example of the invention.
Figure 12 illustrates user system operation in an example of the invention.
Figure 13 illustrates user system operation in an example of the invention.
Figure 14 illustrates a user system in an example of the invention.
Figure 15 illustrates a server system in an example of the invention.
Figure 16 illustrates server system operation in an example of the invention. Figure 17 illustrates server system operation in an example of the invention.
Figure 18 is a block diagram showing the interconnection of users to the Internet through respective firewalls.
Figure 19 is a flowchart showing dynamic control of call characteristics to obtain a direct point-to-point network session between a calling user and a called user even though a firewall may be present.
Figure 20 is a flowchart showing detection of an address translating firewall associated with a registered user.
Figure 21 is a block diagram showing the elements within each computer for accomplishing the sharing of resources between the computers. Figure 22 is a block diagram showing the elements of the computer hosting the shared resources in greater detail.
Figure 23 is a block diagram showing the elements of the remote computer accessing the shared resources in greater detail.
Figure 24 is a flowchart showing preferred embodiments for establishing the private sharing of computer resources between network users.
Figure 25 is a block diagram showing the interconnection of a video camera to one computer for sharing prerecorded motion video with a remote computer.
Figure 26 is a block diagram showing software functional blocks within the resident computer. Figure 27 is a flowchart showing a preferred embodiment for sharing prerecorded motion video.
Figure 28 is a block diagram of a first embodiment of a system for exchanging video image data between a user and both an auto-attendant and a live operator wherein the process for coordinating the data call is transparent to the user.
Figure 29 is a block diagram of a second embodiment of a system for exchanging video image data between a user and both an auto-attendant and a live operator using an image server controllable by either the auto-attendant or the live operator. Figure 30 is a block diagram showing an auto-attendant in greater detail.
Figure 31 is a flowchart showing a preferred method of the present invention for coordinating a telephone call in conjunction with a data call or data calls.
Figure 32 is a block diagram of a communication model for the ordering system of the present invention. Figure 33 is a block diagram showing a provided ordering system in greater detail.
Figure 34 is a flowchart showing preferred embodiments for acquiring goods or services from a provider using the present invention.
Figure 35 is a block diagram showing the handling of multiple, simultaneous calls to the provider system.
Figure 36 is flowchart showing a method of handling of multiple, simultaneous calls to the provider system.
Figure 37 is a block diagram showing integration of a video telephony system with a central server also providing chat and/or instant messaging services. Figure 38 illustrates the user environment of the integrated services of the present invention.
Figure 39 is a flowchart showing one preferred method of accessing the integrated services of the present invention.
Figure 40 is a flowchart showing a sequence of events after a user selects a video telephony connection using the present invention.
Figure 41 is a block diagram showing the elements within each computer for accomplishing the sharing of still images between the computers.
Figure 42 is a block diagram showing the elements of the computers in greater detail.
Figure 43 is a flowchart showing a preferred embodiment of a method for sharing still images in conjunction with a video telephony call.
Figure 44 is a block diagram graphically depicting the operation of the present invention. Figure 45 is a block diagram showing the video telephony system of the present invention.
Figure 46 is a plot showing relative latency periods and the delay introduced for the voice signals of the present invention.
Figure 47 is a plot showing relative latency periods and the delay introduced for the voice signals of the present invention when video latency becomes excessive.
Figure 48 is a chart relating a determined value of the video latency to the remedial actions taken by the present invention.
Figure 49 is a block diagram showing user equipment for a video telephony call using the present invention. Figure 50 is a schematic diagram showing the buffer of Figure 5 in greater detail.
Figure 51 is a block diagram showing the flow of video data signals.
Figure 52 is a flowchart showing a preferred embodiment of the present invention.
Detailed Description of Preferred Embodiments
Figures 1-17 and the following description depict specific examples to teach those skilled in the art how to make and use the best mode of the invention. For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. Those skilled in the art will appreciate variations from these examples that fall within the scope of the invention. Those skilled in the art will appreciate that the features described below can be combined in various ways to form
multiple variations of the invention. As a result, the invention is not limited to the specific examples described below, but only by the claims and their equivalents.
Referring to Figure 1, a plurality of user computers 10, 11, and 12, and a central server 13 are internetworked via the Internet 14. A plurality of routers 15 within Internet 14 direct packets between various endpoints or nodes. Computers 10 and 11 are shown as being connected to Internet routers belonging to Internet Service Providers (ISP's) 16 and 17, respectively. The connections to the ISP's may be by dial-up, digital subscriber line (DSL), cable modem, or integrated access device (IAD), for example. Central server 13 and computer 12 are shown directly connected to a router.
Network communication comprises data messages or packets transferred between separate endpoints, such as between between computers 10, 11, or 12 (as clients) and central server 13. The packet transfer is accomplished by routers 15 using the IP addresses contained in each packet. Central server 13 typically has a fixed IP address that is listed on the DNS servers accessible to each computer. Each computer user can easily communicate with central server 13 by supplying its logical name (e.g., www.sprint.exchange.com) which is automatically resolved by their browser into an IP address by consulting a DNS server. Exchanging packets between users 10, 11, and 12 themselves cannot usually be accomplished in the same way because the users and their IP addresses are not listed in the DNS system.
The present invention facilitates exchanging data messages between two individual users by providing a specialized directory or look-up within central sever 13. As shown in Figure 2, the present invention preferably functions to simultaneously establish a voice telephone call between the two individual computer users. In certain embodiments, the voice call serves as the user action that initiates the computer processing to establish the computer-to-computer connection. In addition, the voice call provides a way to alert the called party of the requests to establish the computer connection and then serves to enhance the interaction between the two users during the exchange of computer data. However, the present invention also provides
other methods for initiating the computer processing, and a simultaneous voice telephone call is not necessary in the present invention.
Regarding the embodiment with a simultaneous voice telephone call in Figure 2, computers 10 and 11 have associated telephones 18 and 19 used by the same respective users. The computers and telephones may be fixed installations (e.g., in a residence or a business office) or may be mobile devices (e.g., laptop computer and cellular phone), as long as both are accessible to each user at the same time. The telephones are connected to the public switched telephone network (PSTN) 20. Central server 13 provides a user look-up and interconnecting service for registered users. For security and/or billing purposes, access to the service preferably is tied to user ID's and passwords. A user may be given an ID and password with initial sign-up for the service. Each user would manually configure the telephone number that they want to be associated with. When the user is "on-line" (i.e., has their computer turned on and connected to Internet 14), their computer sends a registration message to central server 13 to notify it that the user is available. Central server 13 can inspect the registration message to determine the current IP address and port number at which the user resides for its current connection session. Alternatively, the user may manually configure their IP address in some circumstances. In any case, central server 13 contains a database of currently active, registered users. Each user entry in the database includes fields for user ID, password, telephone number, and IP address (including port number), and user status, for example.
In the connection model of Figure 2, a user #1 dials telephone 18 to make a voice call to a user #2 at telephone 19. The telephone number dialed by user #1 is captured as a target telephone identifier number and sent to computer 10 being used by user #1. Computer 10 forwards the target telephone number to central server 13 as part of an access request for establishing a connection with user #2. Central server 13 looks up the target number in its database. When it finds the target number, central server 13 identifies the IP address associated with user #2 and sends an initiation message to computer 11 being used by user #2. The initiation message identifies user
#1 (preferably by both telephone number and user ID) and the type of data to be exchanged (i.e., the application program to receive the data). User #2 answers the telephone voice call and learns that an initiation message was sent to their computer. Using computer 11, user #2 can verify the calling party as user #1 and can indicate whether they accept the computer network connection with user #1. Once user #2 accepts, data messages can be exchanged between application programs running on computers 10 and 11. The application programs can be written to perform file transfers of various types of files, video data or frames for video telephony, or other real-time data or control signals. Data exchange can continue until either user deactivates their application program.
The sequence of events occurring in the present embodiment is shown in greater detail in Figure 3, in which user #1 events are in the left column, central server events in the center column, and user #2 events in the right column. In step 21, user #1 invokes the real-time interconnection service of the present invention. This can be configured as part of the normal start-up of their computer or can result from manually launching a software application or client program after start-up has finished. When the service is invoked by user #1, a registration message is sent to the central server in step 22. The registration message preferably includes the user ID and password assigned to user #1. The registration message would typically also include the telephone number being used by user #1 and an IP address. Although the IP address may be explicitly added to the message by user #1, the IP address (and port number) is typically embedded in each packet forwarded by the network and the central server preferably extracts the automatically embedded IP address and port number so that the user does not need to know it. Alternatively, the telephone number and/or IP address may have been configured on the central server during a previous connection session of user #1, in which case the registration message only needs to contain the user ID and password so that the central server knows that user #1 is active and ready to receive data calls. In step 23, the central server receives the registration message and adds the new user to the database or updates the user status, as necessary.
Separately, user #2 invokes the real-time interconnection service in step 24. User #2 sends a registration message in step 25, and the central server receives the registration message and adds user #2 to the database or updates the user status, as necessary. Thereafter, the central server may periodically exchange further messages with each registered user to keep the user status current and to maintain an open session with each user, for example. When a user shuts down their application program or their computer, an unregister message (not shown) may also be sent to the central server.
During the time that user #1 is on-line, it is desired to exchange computer data with user #2. In step 27, user #1 initiates an attempt to contact user #2 and set up the data exchange. User #1 identifies user #2 by virtue of user #2's telephone number. This target telephone number may preferably be captured from the act of dialing it on user #l's telephone equipment. According to one example which is described in more detail below, a dedicated module may be connected to user #l's telephone to detect the DTMF tones while dialing and to send the dialed number to user #l's computer. The target telephone number for user #2 is included in an access request message sent to the central server in step 28.
In step 30, the central server looks up the target telephone number and gets the IP address (and port number) associated with user #2. The initiation message is sent by the central server in step 31.
User #2 receives the initiation message in step 32. If not already running, the user #2 computer launches the appropriate client application for responding to the initiation message and then prompts user #2 to either accept or reject the access request. If rejected, then user #2 generates a reject message in step 33 and sends it to the central server. In step 34, the central server forwards the reject message to user #1, which then terminates the data portion of the attempted communication session in step 35 (the voice telephone call is accepted, rejected, or terminated separately).
If user #2 accepts the attempted contact and the request for data exchange, then user #2 causes their computer to generate an accept message in step 36 (e.g., by
clicking an "accept" button in an application interface) and sends it to the central server. In step 37, the cenfral server determines any needed configurations for accomplishing the data exchange and then configures the user #1 and user #2 endpoints in step 38. The two main configurations for the data exchange will be described in connection with Figures 4 and 5. The user #1 and user #2 computers accept the configuration and then begin to exchange the data messages or packets in step 39. Other configuration issues, such as the configuration of the client application programs exchanging the actual data messages can be handled within the access request message, then initiation message, the accept message, and/or other packets exchanged between the endpoints, for example.
A first packet exchange configuration is shown in Figure 4 wherein central server 13 performs a relay function such that all packets exchanged between computer 10 and computer 11 pass through central server 13. In other words, after a desired user (called party) accepts the data call and central server notifies the first user (calling party) of the acceptance, both endpoints continue to address their sent packets to central server 13. At central server 13, each packet is redirected by substitution of IP addresses. For example, a packet sent from computer 10 including its own IP address as the source address of the packet and the IP address of cenfral server 13 as the destination address of the packet is modified after being received by central server 13 to have the central server's address as its source address and to have the IP address of computer 11 as its destination address. After modification, central server 13 sends the packet back to its router and on to computer 11. The same operations are used to send packets from computer 11 to computer 10. The embodiment of Figure 4 has the advantage that greater privacy of a user's IP address is maintained since each user's computer only needs to see the IP address of central server 13. Furthermore, this configuration can readily function in the presence of network address translation (NAT) firewalls at the endpoints. Specific steps to deal with firewalls are shown in copending application U.S. Serial No. (Sprint Docket 1805), filed concurrently herewith, and incorporated herein by reference in its entirety.
Figure 5 shows an alternative configuration in which direct packet exchange between computers 10 and 11 is realized. Central server 13 provides a look-up function and a connection initiation function. If desired user #2 (called party) accepts a data call, then central server 13 provides the IP address of computer 11 to computer 10 and provides the IP address of computer 10 to computer 11. Thereafter, each computer can send packets addressed to the other computer and the packets are no longer relayed through central server 13. This embodiment has the advantage that central server 13 may be reduced in size since less traffic flows through it.
To initiate a data exchange or data call of the present invention, the target telephone number(s) must be presented to the central server. Most preferably, a simultaneous telephone call is being established along with the data call. One way in which to capture the called party's target telephone number is shown in Figure 6. In this embodiment, user #1 utilizes a service provider that deploys an integrated access device (IAD) 41 with each user. Computer 10 and telephone 18 are each coupled to IAD 41. The service provider maintains a multi-service access platform (MSAP) 42 which connects to PSTN 20 and Internet 14. IAD 41 of user #1 and IAD's of other users are connected to MSAP 42. A high speed digital connection between IAD 41 and MSAP 42 carries all the voice telephony and computer data signals for user #1. MSAP 42 separates these separate signals out and routes them as appropriate to PSTN 20 or Internet 14. IAD 41 may be comprised of an IAD1101 Integrated Access Device and MSAP 42 may be comprised of a 6732 Multiple Service Access Platform, each commercially available from Cisco Systems, Inc.
When initiating a telephone call via IAD 41, the called telephone number is detected as computer data and sent to MSAP 42 to establish a voice call. In the present embodiment, IAD 41 is modified to additionally send the called telephone number and the start time of the call to computer 10. In response, an application program running on computer 10 parses the information from IAD 41 and forwards an access request to the central server via the Internet.
In a further embodiment of the invention, the data call can be initiated from within the computer network rather than by the dialing of the telephone voice call. For instance, the first user may simply enter the desired user's target telephone number manually into the application program for exchanging the data messages. Alternatively, a network-based directory can be set up by a user and stored on central server 13 to facilitate making a data call. Furthermore, the data call initiation need not be done by the actual recipients of the data call. As shown in Figure 7, another computer 43 connected to Internet 14 may generate an access request that identifies both user #1 and user #2 as called parties. Cenfral server 13 could also initiate a data call itself, such as for a scheduled download that two users have set up in advance. Any such network initiated data call preferably also requires entry of appropriate user ID(s) and password(s).
Although a telephone voice call is not necessary, the embodiment of Figure 7 shows how a voice call can still be established. In this embodiment, however, the telephone voice call is established in response to the initiation of the data call instead of the other way around. A private branch exchange (PBX) 44 is coupled to central server 13. As a data connection is being established, central server 13 triggers PBX 44 to initiate a telephone voice call over PSTN 20 to the target telephone numbers associated with each endpoint of the data call. The use of the present invention in conjunction with video telephony will now be described in greater detail.
Video Telephony System Configuration and Operation — FIGS. 8-9
Figure 8 illustrates video telephony system 100 in an example of the invention. Video telephony system 100 comprises user systems 101 - 104, public data network 110, server system 111, and public telephone network 120. User systems 101-104 communicate with one another and with server system 111 over public data network 110. User systems 101-104 also communicate with one another over public telephone network 120.
Although various systems could be used within the context of the invention, a few exemplary systems are given for illustrative purposes. Examples of public telephone network 120 include local and long distance telephone companies. Examples of public data network 111 include Internet service providers and the Internet. Server system 111 could be a conventional Internet server configured with software to implement the invention. User systems 101-104 could include a conventional telephone and personal computer in addition to special purpose circuitry and software to implement the invention.
Figure 9 illustrates video system 100 operation in an example of the invention. Public data network 110 is not shown for clarity, although it should be appreciated that communications with server system 111 occur over public data network 110. Before a video call, both users systems 101 and 104 fransfer log-in messages to server system 111 when they are ready to initiate and/or receive video calls. The log-in messages generally include user names and passwords, user data addresses and telephone numbers, and any video call preferences. Server system 111 checks the passwords against the user names, and if they are valid, associates each related user name/data address/telephone number with an indication that the user is ready to receive and/or initiate video calls. Server system 111 also logs any video call preferences. Subsequently, user system 101 transfers a request for a telephone call over public telephone network 120 to user system 104. This telephone call request could be as simple as picking up a telephone and dialing a telephone number. Public telephone network 120 transfers a telephone call request to user system 104 ~ typically by processing the dialed telephone number to ring a telephone. If user system 104 grants the telephone call request, such as by answering the ringing telephone, a telephone call is established between user systems 101 and 104 over public telephone network 120. In response to the telephone call request, user system 101 transfers a video call request to server system 111. The video call request has the called party telephone number, and some caller identification information, such as user name and password, user
telephone number and data address, and any other video call parameters. One such parameter is whether the requested video call is bidirectional or unidirectional. Another parameter is the type of video compression and encryption that is used. Server system 111 uses the called party telephone number to check if the called party is ready to receive video call requests, and since user system 104 has logged-in, server system 111 transfers a video call request to user system 104. User system 104 may then present the called party with a prompt, such as an instant message or tone, to accept the video call request. If the called party accepts the video call request, user system 104 transfers a video call acceptance to server system 111. The acceptance may also have video call preferences for the called party that server system 111 resolves against the preferences of the caller. Server system 111 transfers video call start messages to user systems 101 and 104 indicating the resolved video call parameters. In response to the video call start messages, user systems 101 and 104 generate and transfer video to server system 111. This transferred video has some indicia indicating the caller and called party, so server system 111 can associate the received video with the video call.
Server system 111 uses the user system 104 data address to transfer caller video to user system 104. Server system 111 uses the user system 101 data address to transfer called party video to user system 101. Prior to this transfer, server system may interwork the video to provide compatibility at the receiving end. User systems 101 and 104 receive and display the respective video to establish the video call. Eventually, user systems 101 and 104 indicate telephone call termination to public telephone network 120 ~ typically by hanging-up their telephones. In response to telephone call termination, user system 101 transfers a telephone call termination message to server system 111. In response, server system 111 fransfers a video call termination message to user system 104, and systems 101, 104, and 111 terminate the video call. In some cases, user system 104 may detect telephone call termination and transfer a termination message to server system 111 to eliminate the need for the termination message from server system 111.
In one variation to the above system, the actual video transfer may be directly between user systems 101 and 104 over public data network 110. The server system 111 would set-up the video call and provide the appropriate data addresses to user systems 101 and 104 for a peer-to-peer video transfer over public data network 110.
General User System Configuration and Operation - Figures 10-13
Figure 10 illustrates user system 300 in an example of the invention. User system 300 comprises video system 301, data communication system 302, telephone system 303, and control system 304. Confrol system 304 comprises telephone interface 305 and data interface 306. Data communication system 302 is coupled to a public data communication network, and telephone interface 305 is coupled to a public telephone network.
Video system 301 could be any system configured to generate and/or display video. Video system 301 may include a camera for generating video of a caller or called party. Video system 301 may include a television or computer monitor to display video. Telephone system 303 could be any system configured to initiate telephone calls over a public telephone network, and could be integrated into other systems, such as computers, appliances, and televisions. Telephone system 303 could utilize wireless, wire-line, optical, or other communication media. Control system 304 could be any system configured to initiate a video call using systems 301-302 in response to the Initiation of a telephone call by telephone 303. Telephone interface 305 is configured to receive a called number from telephone system 303 if the telephone number is contemporaneously used to establish a telephone call over the public telephone network. In response to receiving the called telephone number, data interface 306 is configured to transfer the called telephone number to data communication system 302 for transfer to a server system over the public data network. Data system 302 could be any system configured to transfer the called
telephone number to the server system over the public data network and to exchange video between the public data network and video system 301.
It should be appreciated that systems 301-304 could be integrated together or with other systems. Various combinations of equipment could be used to implement user system 300. Some examples of devices that could incorporate data system 302 include, but are not limited to, a computer, set-top box, telephone, network interface card, digital assistant, information appliance, and stand-alone device. Some examples of devices that could incorporate control system 304 include, but are not limited to, a computer, telephone, modem, network interface card, set-top box, and stand-alone device. In addition, the functionality of data system 302 and control system 304 could be provided by a processing system that retrieves and executes software that is stored on a storage system. The storage system could comprise a disk, tape, integrated circuit, server, or some other memory device.
Figures 11-12 illustrate the operation for user system 300 to initiate video calls in an example of the invention. User system 300 awaits an indication from the caller to begin initiating video calls. This indication could be an input to any of systems 301-304. In response to the indication, data communication system 302 transfers a log-in message over the public data network to the server system. For example, data communication system 302 could be configured to automatically transfer the log-in message upon system start. The log-in message indicates' that user system 300 is ready to initiate video calls, and possibly, to also receive video calls. The log-in message may include information such as the user name and password, user data address and telephone number, video call parameters, and other user data or registration information. Telephone interface 305 awaits either a telephone call from telephone system 303 to the public telephone network or an indication from the caller to stop initiating video calls. If such a stop indication is received, data interface 302 fransfers a log-off message to the server system. For example, data communication system 302 could be configured to automatically transfer the log-off message upon system
shutdown. The log-off message indicates that user system 300 is not ready to initiate or receive video calls.
If telephone system 303 initiates a telephone call, telephone interface 305 obtains the called telephone number, and in response, data interface 306 transfers the called number to data communication system 302. Data communication system 302 fransfers a video call request to the server system. The video call request includes the called telephone number and other information, such as the caller user name and password, caller telephone number and data address, and video call parameters. The video call parameters indicate if the requested video call is bidirectional or unidirectional, and the direction if unidirectional - caller to called party or called party to caller. The video call parameters may also indicate requested video quality and security. The information in the video call request is populated by confrol system 304 and/or data communication system 302.
Data communication system 302 then awaits a video call start message. If the video call start message is not received, for example if the video call is unavailable or rejected, telephone interface 305 awaits either a telephone call or an indication from the caller to stop initiating video calls. If received (see Figure 12), the video call start message indicates the actual video call parameters for the video call. If the video call is bidirectional or unidirectional from caller to called party, then video system 301 generates video of the caller that data communication system 302 transfers the caller to the server system. If the video call is bidirectional or unidirectional from called party to caller, then data communication system 302 receives video from the server system that video system 301 displays. Systems 301-302 would typically apply compression, encryption, and other video technologies to the video. At this point between the caller and called party, a telephone call exists over the public telephone network and a video call exists over the public data network. If the caller terminates the telephone call, such as by hanging up the telephone, telephone interface 305 determines that the telephone call has been terminated, and as a result, data interface 306 indicates the telephone call termination to data communication
system 302. Data communication system 302 then fransfers a video call termination message to the server system and video generation, fransfer, receipt, and display are terminated by user system 300. The video call is similarly terminated if a video call termination message is received from the server system or if the caller indicates that the video call should be terminated. User system 300 then awaits an additional telephone call or an indication from the caller to stop initiating video call requests.
Figures 12-13 illustrate the operation for user-system 300 to receive video calls in an example of the invention. In Figure 13, user system 300 awaits an indication from the user to begin receiving video call requests. This indication could be an input to any of systems 301-304. In response to the start indication, data communication system 302 transfers a log-in message over the public data network to the server system. For example, data communication system 302 could be configured to automatically fransfer the log-in message upon system start. The log-in message indicates that user system 300 is ready to receive video call requests, and may include information such as the user name and password, user data address and telephone number, video call parameters, and other user data or registration information. Data interface 302 awaits a video call request from the server system or an indication from the user to stop receiving video call requests. If a stop indication is received, data interface 302 transfers a log-off message to the server system. For example, data communication system 302 could be configured to automatically transfer the log-off message upon system shut-down.
If data communication system 304 receives a video call request, then the user is notified of the requested video call. The notification could be given from any of systems 301-301 and could include screen displays, tones, or other user signals. Although not shown for clarity, telephone system 303 will receive a contemporaneous telephone call from the public telephone network. The user indicates if the telephone call and/or the video call is accepted. Typically, the telephone call is accepted by operating telephone video system 303 to answer the call. The video call may be accepted with an input to one of systems 301-304, such as by pressing, a DTMF key,
pressing a button, or selecting from a screen display. User system 300 could be configured to automatically accept or reject the video call based on whether the corresponding telephone call, is answered.
If the video call is not accepted, then data communication system 302 fransfers a video call rejection to the server system and awaits either a video call request from the server system or an indication from the user to stop receiving video call requests. If the video call is accepted, then data communication system 302 transfers a video call acceptance to the server system. The video call acceptance may indicate accepted video call parameters. Data communication system 302 then awaits a video call start message indicating the actual parameters for the video call. If the video call start message is received, processing proceeds as indicated above for Figure 12.
User System Incorporating Conventional Telephone and Computer - Figure 14 Figure 14 illustrates user system 700 in an example of the invention.
Advantageously, user system 700 is configured for use with conventional telephones, personal computers, and communication services. Advantageously, this makes video telephony easy for a user to implement by simply adding one small device to conventional devices and services. User system 700 includes computer system 701, telephone 703, and interface device 704. Computer system 701 is connected to Internet link 734 that provides Internet service. Telephone 703 is connected to telephone link 731. Telephone link 735 provides telephone service. It should be appreciated that links 734-735 may share the same physical media, especially to egress the user premises. Computer system 701 includes user interface 711, communication interface 712, processing system 713, and storage system 714. User interface 711 includes video equipment 718. Storage system 714 stores operating software 716 and video software 717. Interface device 704 includes RJ-11 jacks 721-722, Dual-Tone Multi-Frequency (DTMF) decoder 723, call sensor 724, controller 725, and computer interface 726.
RJ-11 jack 721 is coupled to telephone 703 by telephone link 731. RJ-11 jack 722 is coupled to telephone link 735. Computer interface 726 is coupled to communication interface 712 by Universal Serial Bus (USB) 733. Alternatively, USB 733 could be a serial cable. Communication interface 712 is also coupled to Internet link 734. Computer system 701 uses Transaction Control Protocol port 80 or any other port assigned by the user to exchange messages with the server system.
User interface 711 includes a keyboard and mouse. Video equipment 718 includes a camera and monitor. Communication interface 712 includes a USB or serial port and a Digital Subscriber Line (DSL) modem or some other broadband access system. Processing system 713 includes a computer microprocessor and other circuitry. Storage system 714 includes a hard disk drive and other circuitry. Processing system 713 retrieves and executes operating software 716 and video software 717 from storage system 714. Software 716-717 could comprise an application program, firmware, or some other form of machine-readable processing instructions. Operating software 716 includes an operating system, networking software, and other utilities typically loaded onto a personal computer. When executed by processing system 713, video software 717 directs processing system 713 to operate in accord with the invention.
Interface device 704 could be a stand-alone enclosure that derives power from the telephone line, battery, AC connection, or another source. RJ-11 jacks 721- 722, DTMF decoder 723, call sensor 724, and computer interface 726 could be conventional components. Controller 725 comprises processing circuitry configured to operate in accord with the invention. Interface device 704 can be turned on and off to control video call initiation. In operation, operating software 716 directs processing system to retrieve and execute video software 717 in response to computer start-up or user input. Video software 717 directs processing system 713 to operate as follows. Processing system 713 maintains a set of user options that can be viewed and altered through user interface 711. A table of possible options follows.
Processing system 713 generates and fransfers a log-in message through communication interface and over the public data network to the server system. The login message include user name and password, user telephone number and data address, and video call parameters.
■ 21 ■
Telephone 703 is operated to transfer DTMF digits to the public telephone network. The public telephone network processes the DTMF digits to extend the call to the called party. DTMF decoder 723 monitors the telephone connection between jacks 721-722 to detect and decode any DTMF tones transmitted by telephone 703 to the public telephone network. DTMF decoder 723 indicates the decoded digits to controller 725. Controller 725 forms the called number from the decoded digits and fransfers a telephone call initiation message through computer interface 726 and USB connection 733.
Processing system 713 receives the telephone call initiation message from communication interface 712. Processing system 713 implements the video call initiation options, and if the video call should proceed, processing system 713 generates a video call request including the called telephone number, user name and password, user data address and telephone number, and video call parameters. Processing system 713 fransfers the video call request message through communication interface 712 to Internet link 734 for delivery to the server system over the Internet. Processing system 713 then awaits a video call start message from the server system with the parameters for the video call. When the video call start message is received, processing system implements the video call parameters and user options. For a bidirectional video call, processing system 713 directs the exchange of video between video equipment 718 and Internet link 734. Video equipment 718 displays the video.
Eventually, telephone 703 is placed on-hook. Call sensor 724 monitors the telephone connection between jacks 721-722 to detect the on-hook condition. Typically, call sensor 724 monitors line current to detect off-hook and on-hook conditions. Call sensor 724 indicates the on-hook condition to controller 725. Controller 725 fransfers a telephone call termination message through computer interface 726 and USB connection 733.
Processing system 713 receives the telephone call termination message through communication interface 712. In response, processing system 713 generates
and fransfers a video call termination message through communication interface 712 to Internet link 734 for delivery to the server system over the Internet. Processing system 713 directs video equipment 718 to stop the generation and display of video. It should be appreciated that the user may operate their telephone in the normal manner and corresponding video calls are automatically established over the Internet. The telephone calls provides the audio, and the Internet connection fransfers the video. As indicated, options are available to exert various levels of user control over the process.
To receive a video call, processing system 713 receives a video call request from the server system over Internet link 734 and through communication interface 712. Processing system 713 implements any user options and may notify the user through user interface 711 or video equipment 718. If the video call is accepted, processing system 713 transfers a video call acceptance to the server system through communication interface 712 and over Internet link 734. When the video call start message is received, processing system 713 implements the video call parameters and user options. For a bidirectional video call, processing system 713 directs the exchange of video between video equipment 718 and Internet link 734. Video equipment 718 displays the video. Video call termination may proceed as indicated above or a video call termination message may be received from the server system. If a video call termination message is received, then processing system 713 directs video equipment 718 to stop the generation and display of video.
It should be appreciated that the user may invoke video software 717 to dynamically control video calls. For example, video software 717 may allow the user to terminate video calls in one or both directions during the call. Video software 717 may allow the user to adjust user options during the call. Video software 717 may allow the user to initiate a previously rejected video call during the telephone call. Computer system 701, telephone 703, and interface device 704 can be configured to operate together for additional user confrol. In such a scenario, the user could transfer
DTMF digits that are decoded by interface device 704 and transferred to computer system 701 to exert control. For example, incoming video calls could be accepted or rejected by transferring specific DTMF sequences from telephone 703. Video calls could be terminated by the user in a similar fashion.
Interface device 704 could be further equipped with a tone generator to alert the user to various conditions. For example, interface device 704 could fransfer a special tone to telephone link 731 for the user to hear to indicate that a video call is available for the telephone call. The tone could be played in response to a video call request or start message from the server system.
Server System - Figures 15-17
Figure 15 illustrates server system 800 in an example of the invention. Server system 800 includes user interface 801, network interface 802, processing system 803, and storage system 804. Storage system 804 stores operating software 806 and video software 807. Network interface 802 is coupled to Internet connection 817.
Processing system 803 uses network interface 802 to communicate over the Internet with user systems.
Processing system 803 retrieves and executes operating software 806 and video software 807 from storage system 804. Software 806-807 could comprise an application program, firmware, or some other form of machine-readable processing instructions. Operating software 806 includes an operating system, networking software, and other utilities typically loaded onto an Internet server. When executed by processing system 803, video software 807 directs processing system 803 to confrol server system 800 in accord with the invention. Figures 16-17 illustrate server system 800 operation in an example of the invention. Server system 800 maintains a database of users including user names and passwords, user telephone numbers and data addresses, and possibly user preferences. If server system 800 receives a log-in message, the user password is checked and if it is valid, the user database is modified to indicate that the user is ready to initiate
and/or receive video calls. If server system 800 receives a log-off message, then the user password is checked and if it is valid, the user database is modified to indicate that the user is not ready to initiate and/or receive video calls. The user database may also be modified by querying the users and receiving responses indicating user video call specifications.
If server system 800 receives a video -call request, the user database is checked using the called telephone number from the request to determine if the called party is ready to receive video calls. If not, a video call unavailable message is returned to the caller. If so, server system 800 sends a video call request to the called party. If a video call rejection is received from the called party, then server system 800 sends a video call rejection to the caller. If a video call acceptance is received from the called party, server system resolves video call parameters and sends video call start messages to both the caller and called party including the video call parameters. Parameter resolution may entail determining if the caller will receive called party video. If so, this is indicated in the video start messages.
Server system 800 uses the called party telephone number to retrieve the called party data address. Server system 800 may use the caller telephone number to retrieve the caller data address. If server system 800 receives video from the caller, it addresses the caller video to the called party data address and fransfers the caller video for delivery to the called party. If server system 800 receives video from the called party, it addresses the called party video to the caller data address and fransfers the called party video for delivery to the caller. In some cases, it may be necessary for processing system 803 to interwork the video so it is compatible with both caller and called party. For example, the caller and called party may use different quality or encryption levels that are interworked by processing system 803.
If server system 800 receives a video call termination message from one user, it transfers a video call termination message to the other user. Server system 800 then modifies the database to return each user to their pre-call status. Server system also generates billing information for the video call.
Server system 800 may be configured to download software to the user systems. The software could be the video software used to control the user systems as described above. The software could provide upgrades from older versions. The software could provide video processing, compression, and encryption. The software could provide system diagnostics and trouble-shooting to recommend optimal system software and settings.
It should be appreciated that the processing and control discussed above could be distributed in various ways between the user system and the server system. For example, the server system could maintain and implement user preferences and provide user prompts .
The handling of firewalls in the context of a real-time interconnection service will be described in greater detail with reference to Figures 18-20. As shown in Figure 18, computer 11 is connected to Internet 14 through a respective firewall 41. Computers 10 and 11 contain application programs 42 and 43 that are adapted to interact with cenfral server 13 and then to exchange data messages (e.g., files, video frames, etc.) with other users and to display or otherwise utilize the exchanged data.
Within the total user group that registers with cenfral server 13, there would typically be a mix of users with and without firewalls. The users with firewalls typically will not respond to network packets they receive that are not in reply to network packets that they initiated. Thus, even if central server 13 provides computer 10 with the global IP address of firewall 41, any packets it sends there will not actually reach computer 11. In this situation where the only firewall exists on the called user side, the present invention solves the problem by dynamically reversing the roles of the user's computers for establishing the network session between the two computers. Thus, the first packets sent directly point-to-point between the two user are initiated by the user with a firewall, regardless of which user is the calling user.
The method of the invention is shown in greater detail in Figure 19. After the cenfral server receives an access request between a calling user and a called user, it checks to determine whether the called user has a firewall in step 50. If the called user
does not have a firewall, then the cenfral server sends the called user's global IP address to the calling user in step 51. In step 52, the calling user sends packets directly to the called user's IP address in order to establish a TCP/IP network session with the called user. Once the network session is established, the application programs can perform tasks of identification, accepting or rejecting a call, transferring information, displaying exchanged data, and terminating a call, for example.
If the called user has a firewall, then the central server checks in step 53 to determine whether the calling user has a firewall. If not, then the cenfral server sends the calling user's global IP address to the called user in step 54. In step 55, the called user sends packets directly to the calling user's IP address in order to establish a TCP/IP network session between the two users. Although the roles of called and calling users are reversed in establishing the TCP/IP network session, the original roles are retained for other call aspects such as the called user's decision whether or not to accept a call. If it is determined in step 53 that the calling user also has a firewall, then the respective network sessions between the two users and the cenfral server are used. Thus, in step 56, the cenfral server relays the packets between the calling and the called users (e.g., as shown in Figure 4).
Figure 20 shows the regisfration and firewall detection process in greater detail. In step 60, when a user's application program creates a registration message for transmission to the cenfral server, it includes in the message the local IP address being used by the computer in its local network. In step 61, the central server receives this information identifying the local IP address. It also inspects the header information of received packets and determines the global IP address from which the message transfer was initiated. The central server compares the local IP address and the global IP address in step 62 to determine whether they match.
If the two addresses do not match, then a firewall is present and the cenfral server stores "firewall present" data (such as a firewall flag) in step 63 as part of the user profile in its database. If the addresses match, then a firewall is not detected and
the corresponding user profile is set to reflect the lack of a firewall in step 64. Even if the IP addresses match, it may be desirable to send a test message (e.g., using a different source address for the cenfral server) in order to determine that the user will receive such a message. If the test message does not receive a reply, then the central server may instead indicate a firewall is present in step 63, for example.
In step 65, the cenfral server sends periodic messages to the registered user in order to keep the current session open between them and to update the status of the user. In particular, an address translating firewall will close out a session that is inactive for a predetermined time as short as a few minutes. The central server and/or the application program will exchange periodic messages to avoid the time-out, which would be a particular problem where a firewall is in place since the cenfral server could not reestablish a session. Any TCP/IP connection session, even without any firewalls in place, may time-out after a predetermined time. Thus, transmission of periodic messages can be performed after both steps 63 and 64. In the event that a firewall in place for any particular user goes undetected by the central server, a user may have an unanticipated failure to establish a network session with another user. Any such failure can be reported to the cenfral server and then the next lower connection strategy of Figure 19 is tried (i.e., the called user establishing the session if the first attempt was by calling user, or the central server relaying all the packets for the call).
The use of either of the connection methods of Figure 4 or Figure 5 is transparent to the users. Once either type of data call is established and the call clients are exchanging data messages over the internetwork, the sharing of computer resources is expanded beyond the functionality of the call clients as shown in Figure 21.
Computer 10 includes a network interface 40 and a call client 41 performing the functions already described. Computer 10 runs a server application 42 for hosting a shared resource 43 such as a particular audio or video media, html pages, or a database, for example. In addition, computer 10 runs a client application 44 which is
designed to access or otherwise interact with or display shared resource 43. A user interface 45 may, for example, include operating system software and input/output devices (e.g. monitor, mouse, and keyboard) by which a user interacts with (e.g., provides user commands to) call client 41, server application 42, and client application 44.
Similarly, computer 11 includes a network interface 50, a call client 51, a client application 52, and a user interface 53 for remotely accessing shared resource 43 via Internet 14.
Figure 22 shows the operation of computer 10 for serving the shared resource between both computers in greater detail. In establishing the data call (e.g., a video telephony call), call client 41 creates a network session 46 between itself (as referenced within computer 10 by the local IP address of computer 10 and the port address used by call client 41) and, depending upon the connection mode, either central server 13 or remote computer 11 (as referenced within computer 10 by a remote IP address and port address which were provided by central server 13). Using conventional network protocols, data is exchanged between computers 10 and 11. In a preferred embodiment in which call client 41 establishes a video telephony call, the one-way or two-way video data is passed between session 46 and video software 47. Video software 47 processes video from a video camera and forwards it to session 46. Video software 47 also processes remote video data received from session 46 and feeds it to a display interface within the overall user interface.
A user command is generated within the user interface to request the sharing of computer resources other than that within the functionality of call client 41 (e.g., a user mouse clicks a program launcher for the desired resource). Server application 42 and client application 44 are launched if not already active. One example of a resource shared in this manner is streaming of compressed, prerecorded video. Client application 44 uses the data or other shared resource in the manner desired by the user, and server application 42 serves the shared data or other resource simultaneously to the local user and one or more remote users. Thus, server
application 42 creates a remote session 48 for exchanging network packets with the remote user (e.g., via cenfral server 13) and a local session 49 for communicating with client application 44.
Local session 49 utilizes the local port numbers of the two applications for communicating the data or other resource between served resource 43 and client application 44. Remote session 48 obtains remote session address and port information from session 46 in call client 41. For example, when creating remote session 48, server application 42 may issue a request via the operating system/user interface to call client 41 for the IP address and port address for the remote call client in the remote computer. Call client reports this session information to server application 42 which then establishes its remote session 48 in one of two ways. In a first method, a separate network session is created by sending an initiation message to remote computer 11. In the initiation message, server application 42 provides its distinct port address rather than the port address of call client 41. Thus, call client 41 and server application 42 can communicate with the remote user in parallel. In a second method, call client 41 either terminates or goes into hibernation and server application 42 takes over the existing network session. In other words, server application 42 assumes the port address used by call client 41 in the existing session and no new initiation message is sent. Figure 23 shows remote computer 11 where the shared resource does not reside. In response to the request for sharing the resource, client application 52 is launched if not already running. A session 56 obtains remote IP address and port address information of computer 10 and creates or accepts a network session as described above. Data utilization software 57 exchanges data with the remote server application via session 56.
The overall method of the present invention is shown in Figure 24. In step 60, multiple users sign-on or register with the central server. A calling user launches their call client on their computer in step 61. Preferably, the calling user makes a telephone call to the called user, and the act of dialing the telephone number may send
a signal to the computer for automatically launching the call client if it is not already running. Alternatively, no telephone call is necessary and the calling user may enter a telephone number or other identifying information of the called user into the call client. In step 62, the phone number or other identifying information is sent to the cenfral server and a data call is established with the called user.
In step 63, either user initiates a request via their user interface for sharing of resources not accessible to the call clients. If the request is initiated by the user that is remote from the shared resources, then their call client forwards the request. The server application is launched on the computer where the shared resources reside in step 64. In step 65, both computers launch appropriate client applications for accessing the served data from the server application, such as a media player or a browser.
In step 66, one or both call clients report IP addresses and port addresses of the other computers to the server application and/or the client application(s). For example, the remote IP address used in the call client of the computer where the shared resource resides is reported to the server application. Also, the remote IP address used in the call client of the computer not containing the shared resource is reported to the client application running in that computer.
Based on the reported IP addresses and ports, the network session between the server application and the client application of the remote computer follows either one of two methods as shown in Figure 24. In step 67, a second session between the two computers is created by means of either the server application or the remote client application sending a session initiation message to the other using the existing IP address information but using a new port address for the origination application. A new port address for the other application can be identified in a response to the session initiation message. Alternatively, both call clients either terminate or hibernate in step 68. In step 69, the server application and the remote client application use the existing session's IP addresses and ports. In step 70, both client applications interact with the server application in order to access the shared resource simultaneously.
While the present invention has been described with respect to two users sharing a particular resource, the invention also contemplates that three or more users could simultaneously share a resource or participate in a video telephony call. In that case, the server application would multicast to each of the remote computers, for example.
The exchange of prerecorded motion video according to the present invention is shown generally in Figure 25. Prerecorded motion video as used herein refers primarily to any digitized video clips (with or without accompanying audio) of a private nature to be shared between private individuals, such as video taping of family events. Such clips may typically be recorded onto tape, DVD, or solid state memory using a portable digital video recorder (i.e., a digital camcorder). A camcorder 71 having the desired motion video clip is connected by user #1 to computer 10 for transferring the clip thereto. Preferably, a firewire interface (i.e., IEEE standard 1394) is used for the transfer of uncompressed video and audio. Computer 10 preferably compresses the desired clip to facilitate fransfer over Internet 14 to computer 11. Computer 10 mcludes a display monitor 72 and computer 11 includes a display monitor 75. During a video telephony call, call windows 73 and 76 show live video received from the other endpoint of the video telephony data call. To share a compressed motion video clip, computer 10 launches a streaming video server and sfreams the clip simultaneously to media players launched in computers 10 and 11. Media player windows 74 and 77 on monitors 72 and 75, respectively, display the stream so that both users are seeing the same part of a clip at the same time. Due to bandwidth limitations of the network connection, it may be desirable to shutdown the video telephony portion of the data call, at least during streaming of the prerecorded motion video clip. The voice telephone call may preferably remain open, thereby allowing the users to discuss the clip as it is viewed. During streaming, the controls of both media players are preferably active so that viewing of the clip is jointly controlled (i.e., either user can stop, play, pause, rewind, or fast forward the clip by mouse clicking on the corresponding confrol buttons in media player windows 74 and
77). Alternatively, the media player controls may be set up so that only one user (e.g., the sending user) can confrol the viewing of the clip.
As shown in Figure 26, a motion video clip from a digital camcorder is received by a serial interface 80 (e.g., a firewire IEEE 1394 interface) and the clip is stored in storage 81 (e.g., a hard disk drive) in an uncompressed file format, such as DV. Alternatively, an analog video signal (e.g., from an analog camcorder) could be coupled to an analog-to-digital converter and stored in storage 81. Prior to sharing over the Internet, the stored clip is preferably compressed in a compression software block 82, and a compressed file is stored in storage 83. The compressed file format may be MPEG-2, for example.
Streaming video server application 84 accesses the compressed clip in storage 83 for streaming through a network session 85 to the remote computer via the Internet. It also locally streams the compressed clip to media player 86. The connections to session 85 and to media player 86 include both the streamed data and signaling, so that both media players can control playback of the clip.
Although compression block 82 is shown separate from video server 84, they may be integrated into a common software product.
Figure 27 shows a preferred method of the present invention. In step 100, a video source such as a digital camcorder is connected to a computer and a prerecorded motion video clip is transferred. If the clip is being uploaded for the purpose of sharing over the Internet, then the user may optionally compress the clip at that time in step 101. If a clip is to be edited prior to sharing, then compressing the file would be deferred.
When a user desires to share a clip with a remote user, a telephone call is placed between the users and a video telephony data call is automatically established in step 102. During the video telephony call, either user initiates a request for sharing the prerecorded motion video clip in step 103. The request preferably includes the identification of the specific compressed file to be streamed.
In step 104, the streaming video server application and the media player applications are launched. In step 105, IP addresses and ports are configured for the sfreaming session between the sfreaming video server and the media players. As previously described, either a new session can be initiated or the existing session for the video telephony data call can be used. If the existing session is used, then the video telephony call must be halted or terminated as shown in step 106. Even if a new session is initiated, it may be desirable to terminate the video telephony call at least temporarily to provide sufficient bandwidth for fransmission of the sfreaming video clip. In step 107, a check is made to determine if the requested clip is already compressed. If it is not, then each user is notified in step 108 that compression is taking place (e.g., by displaying a text message within the media player windows). Then the clip is compressed and stored in step 109.
Using the compressed file, the prerecorded motion video clip is streamed to both media players from the beginning of the file in step 110. During sfreaming, the stream may be altered in step 111 in response to any commands from either of the media players. In step 112, the sfreaming applications (i.e., the sfreaming video server and the media players) are terminated or halted in response to reaching the end of the clip or in response to manual control action by either user, for example. With streaming terminated or halted, the video telephony call can be restored within the call window in step 113. The video telephony call may be terminated in step 114 manually by a confrol action on either computer or by hanging up of the telephones for the voice call, for example.
While the present invention has been described with respect to two users sharing a particular video, the invention also contemplates that three or more users could simultaneously view a clip or participate in a video telephony call. In that case, the sfreaming video server would multicast to each of the remote computers, for example.
In the situation wherein the called user is a commercial enterprise having a telephone system including a computerized, automated telephone response system (referred to herein as an auto-attendant) and the calling user desires to have their telephone call eventually connected to a live operator or other resources that are organized in a pool with a plurality of telephone numbers or extensions, the calling user does not have knowledge of the telephone number of the operator to which the telephone call may be transferred. The present invention coordinates the handling of the network data call(s) so that video image content may be provided to the calling user seamlessly during each phase of the call. In Figure 28, a provider system 25 includes a private branch exchange
(PBX) 26, an auto-attendant 27, an operator station 28, and an operator station 29. PBX 26 receives telephone calls made to a primary telephone number of the provider and can couple any particular incoming call to any one of several telephones within provider system 25. A PBX system is not necessary if telephone calls can otherwise be transferred from auto-attendant 27 to an operator station.
The system shown in Figure 28 is useful in many types of commercial or noncommercial enterprises. For example, a vendor of goods or services, such as a travel agency, can have incoming calls go to auto-attendant 27 so that the calling user can indicate the type of service desired (e.g., to allow the call to be forwarded to an operator specializing in certain types of fravel). During the auto-attendant phase of a telephone call, it may also be desirable to present information to the caller about the travel agency or about the service they are requesting. This information may include audio information from auto-attendant 27 and preferably includes video information transmitted as part of a network data call (e.g., video clips or a slideshow of a highlighted vacation package). When the telephone call is eventually transferred to an agent, it is desirable to continue the video portion of the call so that 1) the caller can see the agent as part of a video telephony session, and/or 2) the agent can initiate other video images such as additional clips, slideshows, or text.
Provider system 25 may also function as a helpdesk or consultant for various kinds of information assistance. For example, a manufacturer or seller of electronic products requiring in-home set-up by the purchaser can provide a toll free telephone number for set-up assistance. The caller can be connected to auto-attendant 27 so that a particular video clip covering a specific product or question can be shown to the caller via the data network call (e.g., how to configure a VCR). If the caller needs assistance from a live operator, then the telephone call can be fransferred. To maintain the video capability, the live operator is given confrol over the existing data call or a new data call is initiated by the operator's computer. The operator can then assist the caller using a live video telephony feed or by showing additional prerecorded clips, still images, or text.
As shown in Figure 28, auto-attendant 27 includes an auto-attendant computer 30 and an automated telephone response system 31. Automated telephone response system 31 is coupled by a telephone line to PBX 26. Automated telephone response system 31 may be a separate hardware unit or may be comprised of hardware and software included in auto-attendant computer 30.
Provider system 25 may include many operator stations, but only two are shown to simplify the drawing. Operator station 28 includes a live operator or agent 32 using an operator telephone 33 and an operator computer 34. A video camera 35 is connected to computer 34 for sending images of operator 32 as part of a video telephony call. Operator telephone 33 is connected to PBX 26 so that a telephone call from a calling user can be transferred to operator 32. Operator computer 34 is coupled to Internet 14 so that a data call can be maintained with user computer 10. Auto- attendant computer 30 is networked with operator computer 34 (and the computers at other operator stations) in order to share caller information to facilitate transfer of an existing data call or creation of a new one. More specifically, user telephone 18 first establishes a telephone call to the provider system. A first data network session is established between user computer 10 and auto-attendant computer 30 using the database of central server 13. Auto-attendant computer 30 transmits predetermined or
user selected video images to user computer 10. The telephone call is fransferred or forwarded to operator telephone 33 (either automatically or in response to a signal from the calling user). A second data network session is established between operator computer 34 and user computer 10 and further video images are exchanged. The second data network session can be initiated by operator computer 34 based on user telephone number and/or IP address information shared by auto-attendant computer 30 or obtained by operator 32 from the calling user over the telephone call after it is transferred. Alternatively, a second data call is not necessary if the first data call can be handed off between computers (e.g., IP addresses of computers within provider system 25 can be dynamically reassigned).
Figure 29 shows an alternative embodiment employing an image server 36 for serving all video images to user computer 10 (thereby requiring only one data network session). Image server 36 is coupled via a local network with auto-attendant computer 30 and operator computer 34, and is also coupled to Internet 14 (it is the IP address of image server 36 that is stored in the database of cenfral server 13). Image server 36 preferably includes a call client that establishes a data call with user computer 10 and that can be subsequently controlled by either auto-attendant computer 30 or operator computer 34. A database of video images (e.g., prerecorded clips to be transmitted in sfreaming format, graphic images, or text displays) may preferably be stored in image server 36 for transmission within any current network data call in response to requests received from computers 30 or 34 over the local network. Alternatively, computers 30 or 34 may also supply the video data to image server 36 (e.g., a live video feed originating from video camera 35).
Figure 30 shows auto-attendant computer 30 (specifically configured for the embodiment of Figure 28) in greater detail. Computer 30 includes a telephone interface 40 for interconnecting an automated telephone menu system client (i.e., software program) 41 to the PBX in order to receive telephone calls from calling users. Automated telephone menu system (ATMS) client 31 may be very similar to
existing commercially available systems such as the PIVR Call Centre Solution from Pulse Software and Consulting of Markham, Ontario, Canada, for example.
ATMS client 41 is coupled to a call client 42 which effectuates the data network call via a network interface 43. ATMS client 41 presents selection menus to the caller using audio prompts transmitted via the telephone call. The menus may include choices for browsing to and then receiving particular video clips and choices for transferring to a live operator, for example. ATMS client 41 is responsive to return audio signals from the user (either DTMF tones or spoken commands) constituting their selection signals. Thus, telephone interface 40 and/or ATMS client 41 preferably include an DTMF tone detector and/or a voice recognition system. In an alternative embodiment, menu prompts from the ATMS client and return selection signals from the requestor can be signaled via the data call using conventional computer interface methods.
As user selections are made within ATMS client 41, a video ID signal is provided to call client 42 to identify content in a video image database 44 contextually appropriate for the current location in the menu. Continuing with the travel agency example, if a calling user chooses to learn about Hawaiian vacations then ATMS client 31 sends an ID signal corresponding to current Hawaiian travel packages to call client 42. The identified video clip is then transmitted over the Internet to the user's computer. A separate server client 45 may optionally be launched in parallel with call client 42 for purposes of sfreaming the video to the user.
An overall method of the present invention is shown in greater detail in Figure 31, wherein actions relating to the telephone call are on the left-hand side of the Figure and actions relating to the network data call are on the right-hand side. In step 50, the calling user dials the telephone number of the provider system. The provider's telephone number may have been obtained from print or television advertisements, from product documentation (e.g., a user's manual), or from a telephone book (e.g., yellow pages), for example. The dialed telephone number may be a direct line to the auto-attendant or a PBX may be included in the provider system. If there is a PBX,
then the PBX automatically forwards the incoming telephone call to the auto- attendant. The auto-attendant receives the call and plays a greeting message in step 51. Various menus choices or prompts are played (e.g., audibly produced) by the auto-attendant in step 52. In step 53, the calling user indicates menu selections by transmitting selection input signals, such as DTMF tones or spoken commands.
As the telephone call progresses, a data call is also initiated. In a preferred embodiment, the dialed telephone number is captured by the user's computer and a data call initiation message is sent to the central server in step 60. In step 61, the central server looks up the provider's telephone number and retrieves the IP address of the auto-attendant (or of the image server in the alternative embodiment). The data call is then established between the user computer and the auto-attendant.
As selections are identified in step 53, a first set of corresponding video images are transmitted in step 63 by the provider to the user's computer under the control of the auto-attendant. As shown by the dashed line, menu prompts in step 52 and user responses in step 53 are repeated as the user navigates through the menu system. The calling user may also eventually decide in step 54 to request being connected to a live operator. Alternatively, a provider system could be structured such that all incoming telephone calls are forwarded to live operators after the caller has viewed a predetermined video clip, for example. In step 55, the user's telephone call is placed into a queue for the next live operator available to take the call (or the next available specialist in a certain topic if one was requested). The auto-attendant may continue to transmit video images (either predetermined images or some selected by the user) while waiting for an operator. In step 56, the operator becomes available and the telephone call is fransferred. Preferably, the telephone number and/or IP address of the calling user are handed off from the auto-attendant to the operator's computer when the telephone call is fransferred. This information may be obtained from the call client that established the initial data call, for example.
In a preferred embodiment, a new data call between the operator's computer and the user's computer is established in step 65. The data call can be initiated based upon the user's telephone number being forwarded to the cenfral server. If the IP address of the user's computer is known, then the operator's computer could instead initiate a data call directly to that IP address. However, if a firewall is present, then the data call would still require sending at least some packets via the central server. In the alternative embodiment wherein an image server handles the data call during both the auto-attendant phase and the live operator phase, a new data call is not needed. Instead, the operator's computer obtains confrol of the selection of video images sent to the user. The operator computer preferably identifies the proper data call that is open between the image server and the user computer by means of the telephone number or IP address information handed off by the auto-attendant.
The operator speaks with the calling user and handles any requests in step 57. Such requests may include requests for information assistance (e.g., product or service helpdesk) or inquiries for purchasing goods or services. When appropriate, the operator may control the transmission of a second set of video images to the user's computer in step 66. Upon completion of the transaction, the telephone call is terminated in step 58 and the data call is terminated in step 67.
The adaptation of a network data call to other uses in e-commerce is shown in Figure 32. The cenfral database on central server 13 may be partitioned into a user database and a provider database, if desired. The user or requestor of goods or services uses computer 10 and telephone 18 as previously described. The provider of the goods or services uses a provider system 21 including a provider computer 22 connected to Internet 14 and an automated telephone response system 23. Response system 23 may be a stand-alone device or may be comprised of software and hardware interfaces implemented within computer 22.
Although not shown in this example, a live human operator could also interact with the user (via the telephone call or the data call) and could perform many of the functions of response system 23.
The provider has on-demand goods or services 24 which are delivered to the user/requestor as a result of interaction with system 21. Any type of goods or service can be provided using the present invention, such as mail-order goods, information services, multimedia entertainment services, or the like. These may be provided for payment or for free.
Figure 33 shows provider system 21 in greater detail. Provider computer 22 includes a telephone interface 30 for interconnecting an automated telephone menu system client (i.e., software program) 31 to the public switched telephone network in order to receive telephone calls from requestors. Automated telephone menu system (ATMS) client 31 may be very similar to existing commercially available systems such as the PIVR Call Centre Solution from Pulse Software and Consulting of Markham, Ontario, Canada, for example.
ATMS client 31 is coupled to a call client 32 which effectuates the data network call via a network interface 33. ATMS client 31 presents selection menus to a requestor using audio prompts transmitted via the telephone call. ATMS client 31 is responsive to return audio signals from the user (either DTMF tones or spoken commands) constituting selection signals by which the user 1) browses the menus, and 2) indicates a selection of the goods or services. Thus, telephone interface 30 and/or ATMS client 31 preferably include an DTMF tone detector and/or a voice recognition system. In an alternative embodiment, menu prompts from the ATMS client and return selection signals from the requestor can be signaled via the data call using conventional computer interface methods.
As ATMS client 31 navigates through its menu system, a video ID signal is provided to call client 32 to identify content in a video image database 34 contextually appropriate for the current location in the menu. For example, where the present invention is used for acquiring on-demand video services (e.g., pay per view), the menu may be comprised of video programs available and the contextual video content to be shown may be comprised of a "trailer" or preview of the video program. A
separate server client 35 may optionally be launched in parallel with call client 32 for purposes of sfreaming the video to the requestor.
Once a requestor completes their selection of goods or services, the item selection(s) are sent to an order processing client 36. Preferably, the requestor's billing/shipping address and credit card information are stored by and retrieved from the central server. Alternatively, the requestor can be prompted to provide these through the ATMS client (or a live operator). Order processing client 36 may verify that any requested goods are in stock, electronically obtain credit card approval, and perform other accounting functions, for example. Order information is then provided to an order fulfillment system 37 which actually retrieves the goods or services 38 and delivers them to the requestor.
By way of one example, the requested service may be to view an on- demand video/audio program via the requestor's Internet connection. Thus, a content server 40 containing the program may be triggered by order fulfillment system 37 to transmit the program to the requestor's computer. The IP address of the requestor's computer is obtained from call client 32 through order processing client 36, for example. The program may be transmitted using the same data call or a separate network session can be established. In yet another example, a selected video/audio program may instead be provided over a non-Internet connection, such as a cable television connection (e.g., cable pay-per-view).
A method of the present invention is shown in greater detail in Figure 34. In step 41, both the provider's computer and the user/requestor's computer are registered with the cenfral server. At such time that the requestor decides to acquire a good or service, the requestor initiates a telephone call to the provider in step 42. The provider's telephone number may have been obtained from print or television advertisements or from a telephone book (e.g., yellow pages), for example.
In a first preferred embodiment, the contact to the central server to set-up the data call is done with the requestor as the calling party. Thus, a call client is launched in the requestor's computer and the dialed telephone number of the provider
is captured and sent to the cenfral server by the requestor's computer in step 43. Alternatively, the provider may need to be the calling party for purposes of the network data call. This can be achieved by having the central server command the call clients accordingly, or, as shown in step 44, the requestor's phone number may be captured by the provider's computer and sent to the central server by the provider to initiate the data call.
In step 45, the data network call is established so that network packets are exchanged between the call clients of the requestor and provider computers. Preferably, a welcome message or other initial still or motion video image is automatically sent from the provider to the requestor in step 46 immediately after the data call is established. In step 47, the provider system sends menu prompts to the requestor via the telephone call and/or the data call. In step 48, the requestor sends menu selection signals to navigate through the menu options of the ordering system and to view contextual video synchronized with the particular locations within the menu. For example, a first menu prompt might say "press 1 for drama movies, 2 for children's movies, and 3 for comedy." A still image may be sent to the requestor within the data call having a graphic to reinforce the choices. After pressing a selection, a second menu may present choices for specific movie titles (e.g., speaking the titles over the telephone call and showing the titles in a graphic over the data call). After selecting a movie title (e.g., pressing a digit on the requestor's telephone or clicking a button in the call client), a frailer for the selected movie may appear on the requestor's computer, allowing the requestor to decide whether or not to order up the movie for pay-per-view. In playing the frailer or other video, the present invention can conserve bandwidth over the computer network by playing the audio portion of the trailer over the telephone call, for example.
As shown in Figure 34, steps 47 and 48 may be repeated until the requestor reaches a final selection. The requestor sends item selection signals (e.g., DTMF tones, a spoken selection, or clicking an order button in the call client) in step 50 to obtain their selection of goods or services. In step 51, the provider accepts the order
and obtains billing information, preferably from the central server so that the requestor need not be asked for it. In step 52, the provider delivers the selected item(s) and collects payment, if any. During steps 51 and 52, the telephone call and the data call are terminated whenever they are no longer needed. In order to avoid confusion at the provider computer, it may be desirable to defer the initiation of a data call until a telephone call is actually answered by the provider. In other words, the call client of the requestor's computer waits until it receives a signal indicating that the telephone call was accepted. Thus, the provider computer can be sure that an incoming data call matches the telephone call that it is processing. The acceptance signal may be manually generated or may be detected electronically using an add-on device that also captures the dialed telephone number and transfers it to the requestor's computer.
In the event that the provider system is capable of receiving multiple telephone calls and data calls simultaneously, then steps must be taken to ensure that each phone call is properly associated with the corresponding data call with the same requestor. As shown in Figure 35, provider system 21 receives a phone call #1 and a phone call #2 substantially simultaneously. A data call A and a data call B are also received at about the same time. Due to latency differences in the computer network fransmission paths, the data calls may not be received in the same order as the telephone calls. Thus, the telephone and data calls must include identifying information so that they can be properly matched together. As shown in Figure 35, the matching can be accomplished using IP address-to-telephone number associations from the central server (either by separate query to the cenfral server or as information embedded in the data calls themselves). In a preferred embodiment, each telephone call must include information identifying the telephone number from which the call is originating. Typically, this can be provided by an automatic number identification (ANI) signal (also known as Caller ID). However, this signal may be blocked (i.e., turned off). If blocked, then it may be necessary for the ATMS client to prompt a requestor to manually transfer their telephone number via the telephone call.
One embodiment of a preferred method for matching telephone calls to data calls is shown in Figure 36. In step 60, a telephone call is received. A check is made in step 61 to determine whether ANI data (e.g., the incoming telephone number transmitted by the PSTN between the first and second rings) was captured. If not, then the provider system prompts the requestor to input their telephone number (by either touch-tone input or by speaking) and the number is received from the requestor in step 62.
With the incoming telephone number obtained, a check is made in step 63 to determine whether there is an existing data call having identifying information that matches the incoming telephone number. The data calls may preferably include packets containing the requestor's telephone number. Such packets can be included in a protocol used between the requestor and provider computers, for example. If a matching number is found, then the matching telephone and data calls are associated by the provider system in step 64. If a match is not yet found, then a check is made in step 65 to determine whether there are pending data calls not yet associated with a telephone call and for which the corresponding telephone number data is unknown. If such a data call is found, then a query message is sent to the cenfral server in step 66 for the telephone number to be associated with the requestor's IP address for those data calls. When the telephone numbers, if any, are received in response to the query, the provider system determines in step 67 whether any match the telephone number of the incoming telephone call. If yes, then the matching telephone and data calls are associated by the provider system in step 68. If there is still no match, then a return is made to step 63 to recheck any new data calls. If step 65 determines that there are no data calls without telephone number data available, then a check is made in step 69 to determine whether the incoming telephone call is still active. If not, then the method ends at step 70 for that telephone call. Otherwise, a return is made to step 63 to continue to monitor incoming data calls.
Depending upon the capabilities of the provider system or whether live operators are used, incoming telephone calls may be placed into a queue until resources are available to handle a call. In one embodiment of the invention, it may not be necessary to associate the active telephone call with a particular data call. Instead, the video images for the active telephone call may be sent to all data calls (even those in fact associated with a telephone call that is waiting in the queue). The requestor with the active telephone call will thus see the desired images. Since the images will also be seen by requestors in the queue, no personal or confidential information would be included in the images. Furthermore, selection signals from the requestor could only be fransmitted via the telephone call (i.e., the data call becomes a one-way broadcast to the requestors).
Figure 37 shows integration of video telephony with chat and instant messaging applications, wherein cenfral server 13 runs a video telephony (VT) server application 50, a chat server application 51, and an instant messaging (IM) server application 52. Server applications 50-52 are programmed to interact in the manner described below, and a supervisory application may be provided. Server 13 also includes a VT user database 53, a chat user database 54, and an IM user database 55. Other equivalent structures will be apparent to those skilled in the art, such as a single shared database. Furthermore, server 13 could comprise several host computers acting in coordination.
Databases 53-55 include appropriate details for supporting each respective communication service, such as user ID's and passwords, billing information, IP addresses, telephone numbers, user preferences, and on-line status.
Computers 10 and 11 communicate with central server 13 via a public data network, such as the Internet. Telephones 18 and 19 are connected to a public switched telephone network, as is PBX 44 which is controlled by central server 13. Computer 10 includes a user interface with a display monitor and video camera 45 and computer 11 has a similar user interface 46.
The user interface for communicating with each server application is shown in greater detail in Figure 38. A display monitor 60 is preferably located in close proximity with a video camera 61. Each client application running on the user's computer runs in a respective program window, including a VT call window 62, a chat window 63, and an IM window 65. The user interface preferably also includes a pointing device such as a computer mouse for manipulating a mouse pointer 65 to activate various selection items within the program windows.
Chat program window 63 is controlled by a chat client running within the user's computer and communicating with the chat server application running in the cenfral server. Relayed content within a current chat room is shown within a relayed content area 66. A plurality of thumbnail graphics 67 (e.g., pictorial or text representations) for each of the other users active within the current chat room are displayed in chat program window 63. The representations may be provided by the chat server application as part of the active user identifying information that it sends concerning each chat room participant to all the other participants.
According to the present invention, a thumbnail 67 of another user may be selected in order to initiate a separate (e.g., private) communication with the corresponding user. The separate communication may be by a video telephony call or within another application, such as another (restricted) chat room. IM program window 64 is controlled by an IM client running within the user's computer and communicating with the IM server application running in the central server. A message window 70 shows the messages exchanged between the user and their contacts. A contact list window 71 shows a listing of all the other preselected users on the user's contact or buddy list. The list shows the on-line status of each contact, allowing the user to identify which contacts are currently available. According to the present invention, each listed contact that is on-line may be selected in order to initiate a video telephony call with the contact. Thus, mouse pointer 65 can be moved over the user selection item (e.g., the contact's name) and by
mouse clicking, the selection item is activated and a sequence of events is initiated to complete a VT call with the contact.
When a VT call is established, the remote user may be seen within a video window 72 in VT program window 62. One preferred method of the present invention is shown in Figure 39. At step 80, the cenfral server maintains the databases of registered users for each of the communication services (e.g., VT, chat, and IM). When respective users are on-line, their computers log-in to the services for which they are eligible in step 81. In step 82, the cenfral server sends to an active user the identifying information (e.g., user ID, name, and/or picture) of all other active users or a subgroup of users within the service. In step 83, each user's computer displays the identifying information of the other active users within the program window for the corresponding communication service.
In step 84, user #1 selects user #2 for making a direct contact as a supplemental communication to the chat room or to the instant messaging window. In one embodiment, the supplemental communication may always be comprised of a VT call which may be initiated immediately after step 84 using the steps shown in Figure 10. Alternatively, the supplemental communication may be initiated from any of several communication channels including the VT application, the chat application, or the IM application. Thus, the cenfral server may look-up the user regisfrations for the called user and/or the calling user in step 85. The valid services within which a direct communication channel can be opened is reported back from the central server to user #1 in step 86. The user chooses the method of direct contact in step 87 and the cenfral server opens the communication channel in step 88. Figure 40 shows a preferred method for establishing a video telephony call.
In step 90, a video telephony call request signal is sent from user #1 to the central server (e.g., in response to user #l's choice between a VT call or a private chat room in step 87). In step 91, the central server forwards the call request to user #2. The user interface on user #2's computer generates a VT call prompt wherein user #2 can accept
or decline the VT call. The prompt may take the form of a pop-up message with an accept button and a decline button that are chosen by clicking of the computer mouse. A check is made in step 93 to determine whether user #2 accepts the VT call. If not, then a decline message is sent in step 94. Otherwise, both computers launch a respective VT call client application (if not already running) and begin to exchange live video signals from their respective video cameras in step 95. In order to initiate the voice portion of the VT call, the cenfral server commands the private branch exchange (PBX) to complete telephone calls to each user in step 96. In step 97, the telephone calls are bridged together and the users can speak to one another.
Alternatively, a telephone call can be established directly between user #1 and user #2 in response to a message from the cenfral server to the call client of user #1, wherein the message includes the telephone number of user #2 (which may be encrypted). If user #l's telephone can be controlled by the call client (e.g., via the computer modem or an add-on telephone interface) then dialing of the telephone call can be automated. Alternatively, the call client could display the telephone number for manual dialing by user #1.
Once a video telephony call is established, still images can be exchanged as shown in Figure 41. Computer 10 includes a network interface 40 and a call client 41 performing the functions already described. A video camera 45 provides live video images to call client 41 which formats video frames for fransmission as the video portion of the video telephony call. Computer 10 also runs an image viewer subclient application 42 for loading, displaying and transmitting graphical still images (e.g., compressed digital photographs) from a still image data memory 43. The image data is preferably stored in compressed graphic files, such as jpg files. Still images stored in memory 43 may be obtained from an image source 44 (e.g., a digital camera or an optical scanner) connected to computer 10 or could be downloaded from other computer sources (e.g., from the Internet or from floppy discs). A user interface 46 may, for example, include operating system software and input/output devices (e.g.,
monitor, mouse, and keyboard) by which a user interacts with (e.g., provides user commands to) call client 41 and image viewer subclient 42.
Viewer subclient 42 operates under confrol of call client 41. Call client 41 preferably includes a command for launching viewer subclient 42 such as a mouse button or a pulldown menu for indicating that the user wants to display and transmit still images in conjunction with an ongoing video telephony call. When it is running, viewer subclient 42 is linked to call client 41. The image data to be transmitted from viewer subclient 42 is preferably handled using the same IP address and port as are assigned to call client 41. Due to the coordinated interaction of call client 41 and viewer subclient 42, no separate network session needs to be created in order to exchange still images or subclient control commands with another user.
Computer 11 includes a network interface 50, a call client 51, an image viewer subclient 52, a video camera 53, and a user interface 53. Computer 11 may also have local still image data accessible by viewer subclient 52, but need not have any in order to receive and display the fransmitted still image data from computer 10. Figure 42 shows the operation of call client 41 and image viewer subclient 42 in greater detail. In establishing the data call (e.g., a video telephony call), call client 41 creates a network session 47 between itself (as referenced within computer
10 by the local IP address of computer 10 and the port address used by call client 41) and, depending upon the connection mode, either central server 13 or remote computer
11 (as referenced within computer 10 by a remote IP address and port address which were provided by cenfral server 13). Using conventional network protocols, data is exchanged between computers 10 and 11. One-way or two-way video data is passed between session 47 and video software 48. Video software 48 processes video from the video camera and forwards it to session 47. Video software 48 also processes remote video data received from session 47 and feeds it to a display interface within the overall user interface.
Prior to viewer subclient 42 becoming active, all network traffic through session 47 is routed to/from video software 48. Once viewer subclient is active and
transmitting still images, a switch 49 is activated in call client 41 for properly directing the received network packets to the correct application. When subclient 42 is the one sending still images to a remote user, the image data itself is coupled directly to session 47, bypassing switch 49. Even while sending, subclient 42 may receive network fraffic from the remote viewer subclient since either subclient can confrol the still image display (e.g. by generating pause, rewind, and other picture browsing commands). These received commands also pass through switch 49. The switching is preferably based upon a flag or other identifying data encoded at the appropriate protocol level within the packets generated by either viewer subclient. An overall method of the present invention is shown in Figure 43. In step
60, multiple users sign-on or register with the central server. A calling user launches their call client on their computer in step 61. Preferably, the calling user makes a telephone call to the called user, and the act of dialing the telephone number may send a signal to the computer for automatically launching the call client if it is not already running. Alternatively, no telephone call is necessary and the calling user may enter a telephone number or other identifying information of the called user into the call client. In step 62, the phone number or other identifying information is sent to the central server and a data call is established with the called user.
In step 63, a first user (i.e., either the calling or called user) initiates their image viewer subclient. The first user selects one or more images that they would like to transmit to the other user. For example, a series of photographs may be arranged into an ordered array or slideshow. Alternatively, such a slideshow can be defined in advance of the video telephony call and then selected in step 63. Any parameters for displaying and transmitting (i.e., playing back) the array or slideshow are selected by the user, such as display time for automatic advancing of the pictures.
In step 64, the first user generates a command in the user interface for initiating the actual fransmission of the selected still image data to the other user (e.g., by selecting a send or start button in the viewer subclient). Consequently, the still image data is fransmitted to the other user within the existing network session of the
video telephony call. In step 65, the receiving user's call client recognizes the reception of still image data packets and launches its own image viewer subclient and loads and displays the still images as they are received. Thus, the image viewer subclients show the same still image or picture simultaneously, allowing the two users to view the still image and to still see and hear each other at the same time.
During the still image presentation, the call client at the receiving end switches incoming network packets between the live video software and the image viewer subclient in response to identifying data in the packets. Both users watch and confrol the picture array or slideshow in step 67. At the end of the presentation of still images, the users may terminate their image viewer subclients in step 68.
The user experience of simultaneous video telephony and sharing of still images is shown in Figure 44. Still image data as used herein refers primarily to any digitized still images or graphics in a computer file format compatible with the image viewer subclients. Such images may typically be generated by a digital still camera 70, for example. Images are downloaded from camera 70 into computer files stored in computer 10 via a universal serial bus (USB) interface, for example. Computer 10 preferably compresses the image data to facilitate fransfer over Internet 14 to computer 11.
Computer 10 includes a display monitor 72 and computer 11 includes a display monitor 75. During a video telephony call, call windows 73 and 76 show live video received from the other endpoint of the video telephony data call. To share still images, computers 10 and 11 launch viewer windows 74 and 77 on monitors 72 and 75, respectively, so that both users are seeing the same still images at the same time. Due to the low bandwidth required to send still image data, the video telephony call can be easily maintained at the same time thereby allowing the users to see each other and to discuss the still images as they are viewed. During the still image presentation, several viewer control are preferably active so that viewing of the images is jointly controlled (e.g., either user can navigate to a next or previous image or access a menu to modify the automatic display parameters by mouse clicking on the corresponding
control buttons in viewer windows 74 and 77). Alternatively, the viewer controls may be set up so that only one user (e.g., the sending user) can control the viewing of the images.
While the present invention has been described with respect to two users sharing still images, the invention also contemplates that three or more users could simultaneously view images or participate in a video telephony call. In that case, the sending subclient would multicast to each of the remote computers, for example.
Regarding the video telephony call itself, the synchronization of video and voice channels is handled as follows. Referring to Figure 45, user equipment for a calling party in a video telephony system includes a calling telephone 10 and a calling computer 11. Computer 11 is connected to a video camera 12 for generating video signals to be fransmitted in the video portion of a video telephony call. A display monitor 13 is connected to computer 11 for displaying video signals received in the video portion of the call.
Calling telephone 10 connects via a public switched telephone network (PSTN) 14 to a called telephone 15 of a called party. Calling computer 11 connects via the Internet 16 to a called computer 17 of the called party. Computer 17 is connected to a video camera 18 and a display monitor 19. Voice signals from calling telephone 10 to called telephone 15 traverse
PSTN 14 with a voice latency LI. Video signals from calling computer 11 to called computer 17 traverse Internet 16 with a video latency L2. Based on their known relative performance, video latency L2 is always greater than or equal to voice latency LI (and is almost always greater). Furthermore, while the voice latency stays relative fixed during a call (it depends mainly on distance of the call), the video latency call vary significantly during a call as network load rises and falls. Thus, reconstructed video images at the receiving end can become unsynchronized with the corresponding voice signals by varying degrees.
The present invention solves the synchronization problem by delaying transmission of voice signals into the PSTN by an amount that causes the voice signals to arrive at the receiving end more nearly simultaneously with the corresponding video signals. As shown in Figure 46, an undelayed voice signal has a voice latency time 5 LI . The video signals arrive at the receiving end with a video latency time L2. The difference in latencies equals L2 minus LI. By delaying the voice signals by a delay equal to the difference L2-L1, the delayed voice signals arrive at a time comprised of the delay plus voice latency time LI. Thus, the voice signals arrive in synchrony with the video signals after a total time period equal to L2 (since the delay plus the voice l o latency equals L2-L 1 +L 1 ) .
It is known that latency of a voice telephone call above a certain threshold can lead to degradation of perceived call quality. For example, voice latencies of greater than about 100 milliseconds should normally be avoided. Therefore, the present invention preferably prevents adding delays for the voice fransmission that
15 would result in a total voice latency greater than the threshold. A predetermined maximum allowed voice latency, Max, is shown in Figure 47, which may have a value of about 100 milliseconds. A video latency L2 has a value in Figure 3 which is greater than maximum voice delay Max. If the delayed voice signal were to use a delay equal to the difference L2-L1, then the total effective voice latency would exceed Max. In
20 order to keep the total voice latency below predetermined maximum voice latency Max, the added delay must not be allowed to exceed the difference between maximum voice latency Max and voice latency LI.
Figure 48 summarizes the actions taken to maintain synchronization in response to a current value of the video latency L2. When the video latency L2 is in a 5 first range 20 between about LI and about Max-Ll, then a voice delay is added which is equal to video latency time L2 minus voice latency time LI . When L2 is in region 21 (i.e., above range 20), then the added voice delay is equal to about Max-Ll. When the video latency time L2 becomes greater than predetermined maximum voice delay Max, then the preferred embodiment can no longer maintain
exact synchronization. In order to minimize the fall behind of the video portion of the video telephony call, on embodiment of the invention takes the further step of reducing the information content of the video signals when the video latency time L2 is in a second range 22 (Figure 48) in order to expedite reception of succeeding video frames. The reduced information content can be obtained by dropping video frames from the fransmitted signal, applying a greater compression ratio to the data, and/or reducing the resolution or screen size of the video frames. By sending less video data to the recipient, it is possible to favorably impact the network latency due to the overall traffic reduction in a particular path through the network. Since voice latency is generally much smaller than video latency and since voice latency is substantially fixed, it is sufficient for purposes of the present invention to estimate its value as a constant. For instance, the estimate can be based on distance between the endpoints of the telephone call. In one embodiment, the estimate can be based on the area codes for the calling and called parties. Since voice latency will often be extremely short, it is also possible to estimate voice latency time as zero. Video latency is preferably determined in real time. In one embodiment, time clocks in the calling computer and the called computer are synchronized. Then at least some of the network packets sent from one computer to the other are timestamped as they are being sent into the network. Once the packets are received, the time within a timestamp is compared with the time on the synchronized clock computer clock to determine the latency. It is known in the art to synchronize clocks in networked computers using the Network Time Protocol (NTP), for example.
In another embodiment for determining video network latency, a round trip time of a sequential message between the two computers can be measured and then divided in half. Thus, a "ping" message is sent from a first computer to a second computer. The second computer receives the first ping message after a network latency period L2 and immediately responds to the first computer with a second ping message. If not responding immediately, then the second computer may include in the second ping message an identification of the length of the delay between receiving the
first ping message and sending the second ping message back to the first computer. When the first computer receives the second ping message, it determines video latency L2 in response to the time elapsed between sending the first ping message and receiving the second ping message. Specifically, L2 may be equal to about one-half of the elapsed time (not including any identified delay in the second computer).
Specific hardware for implementing the present invention is shown in Figure 49. Computer 11 includes a call client 25 which performs such functions as identifying the called telephone number, forwarding a call request to a cenfral server which completes a video telephony call, and conducting a video packet exchange during a video telephony data call. Thus, call client 25 handles the network transmission of live video images from a video camera coupled to the computer and the reception and displaying of live video images sent from the other user. Computer 11 includes a network interface controller (NIC) 26 for coupling computer 11 to the Internet via a broadband DSL connection or a similar connection. A voice unit 30 may be integrated within a conventional telephone or may be an add-on device for connecting to a conventional telephone for performing specialized functions according to the present invention. A DTMF decoder 31 is coupled to an outgoing signal line of a telephone which carries voice signals from a microphone (not shown) and dialing tones from a tone generator (not shown). It detects and converts a dialed telephone number into an electronic (e.g., digital) representation of the dialed telephone number. This representation is coupled to call client 25 for forwarding on to the cenfral server to initiate a video call in the computer network, as described in the related applications mentioned above.
Voice unit 30 further includes a buffer 32 having a variable length for selecting from a plurality of signal delays for signals passing through buffer 32. Call client 25 preferably performs the determination of a delay as shown and described in connection with Figures 46-48. Once a delay is determined, it is provided from call client 25 to buffer 32 in the form of a confrol signal for implementing the corresponding delay. After delaying voice signals by a commanded time delay, the
delayed voice signals are coupled to a phone line 33 for transmission to the other party. To keep the delayed voice signals from coupling to the speaker of the local telephone, a duplex coil (not shown) may be used to couple the voice signals to phone line 33 as is known in the art. To determine video network latency, computer 11 includes a network time protocol application for communicating with a similar application on the other party's remote computer and, if necessary, a time server connected to the Internet. Alternatively, the use of ping messages can be performed by call client 25 to determine the video latency (e.g., periodically throughout a video call). Figure 50 shows buffer 32 in greater detail. The voice signals are input into a series of unit delay blocks 35. A multiplex switch 36 is set by the control signal to a desired position in order to obtain a predetermined delay.
Figure 51 illustrates the progression of video data signals in the present invention. Video frames are captured one at a time in a block 40. A captured frame is compressed in a block 41. Video latency times are detected in block 42. Provided that normal video latency times are experienced (i.e., less than range 22 in Figure 48), a default compression is used (e.g., a standard resolution). The default compression may preferably involve creating base frames and difference frames wherein a base frame includes full detail of a frame and difference frames include only portions of a frame that change from frame to frame. After a predetermined number of difference frames have been fransmitted, another base frame is sent.
Compressed video frame signals are formatted for transmission as network packets in a block 43. When video latency determined in block 42 reaches the second range, then the video information from block 41 that is formatted for transmission in block 43 is reduced. When dropping frames, preferably the difference frames are dropped first. If video latency fails to improve sufficiently, then difference frames are also dropped. Information content is also reduced by reducing resolution as previously described.
Packets formatted in block 43 are sent to a network block 44 (e.g., Internet) which may be subject to network congestion which affects the video latency time L2. Finally, the video signals are received and processed by the recipient in block 45. By adaptively adjusting the amount of video data being sent in response to the detected latency, latencies great enough to prevent voice synchronization can be avoided.
A method of the present invention is shown in Figure 52. A data call is initiated in step 50 and the calling and called computers may begin sending a live video signal. In step 51, at least one of the computers determines video latency time and voice latency time (i.e., each party is responsible for synchronizing their voice and video signals). In step 52, a check is made to determine whether video latency L2 is in the first range or higher. If it is not, then voice delay is turned off and a return is made to step 51 to re-determine the video latency. If it is, then buffering of voice signals at a corresponding delay (up to the maximum delay) is turned on in step 53.
Next, a check is made in step 54 to determine whether video latency L2 is in the second range. The second range may include only values above the first range or may include an overlap at the upper end of the first range. If not in the second range, then any previous reduction in video Content is turned off and a return is made to step 51 to re-determine the video latency. If it is in the second range, then the information content of the video signal is reduced in step 55 as appropriate.