WO2002033558A1 - Reseau de capteurs multimedia - Google Patents

Reseau de capteurs multimedia Download PDF

Info

Publication number
WO2002033558A1
WO2002033558A1 PCT/US2001/031799 US0131799W WO0233558A1 WO 2002033558 A1 WO2002033558 A1 WO 2002033558A1 US 0131799 W US0131799 W US 0131799W WO 0233558 A1 WO0233558 A1 WO 0233558A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network
network according
sensor
sensors
Prior art date
Application number
PCT/US2001/031799
Other languages
English (en)
Inventor
Andrew A. Kostrzewski
Sookwang Ro
Tomasz P. Jannson
Chih-Jung Judy Chen
Original Assignee
Physical Optics Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Physical Optics Corporation filed Critical Physical Optics Corporation
Priority to AU2002213121A priority Critical patent/AU2002213121A1/en
Publication of WO2002033558A1 publication Critical patent/WO2002033558A1/fr

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B25/00Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
    • G08B25/007Details of data content structure of message packets; data protocols
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V1/00Seismology; Seismic or acoustic prospecting or detecting
    • G01V1/22Transmitting seismic signals to recording or processing apparatus

Definitions

  • the present invention relates to an intelligent sensor network configured to process, compress and transmit highly synchronized data in real-time to a network.
  • the invention relates to transmitting homogenized TCP/IP-packetized data streams of sensor data from a sensor network to a user network (e.g., the Internet).
  • human vision is not to detect the simple presence or absence of light, but rather to detect and identify objects.
  • Objects are defined principally by their contours and the visual cortex registers a greater difference in the brightness between adjacent visual images, faithfully recording the actual physical difference in light intensity.
  • the data is transmitted over a communication network to the end-user system, and finally it is displayed on the end-user system at a constant data rate.
  • these components i.e., storage, network and end-system
  • current multimedia requirements are 15-fps (frames per second) animation, 30-fps NTSC (National Television Systems Committee) television quality video and 60-fps HDTV (High
  • current networks are unable to process, compress and transmit all different types of sensor data into a single, homogeneous, TCP/IP-packetized stream.
  • current networks do not support the interpretation of high-resolution spatial data (still imagery) with compressed temporal data (video).
  • the integration of the spatial and temporal data allows a user to view a high-resolution still image based on a highly compressed temporal data stream without the need for separate JPEG or similar still- image encoding and compression.
  • NETWORKS AND PROTOCOLS Networks are designed to share resources and can be as simple as two computers connected together or as complex as over 20 million computers connected together (the Internet). Other devices such as printers, disk drives, a terminal server and communication servers are also connected to networks. There are certain network rules (protocols) that establish rules for all operations within a network and how devices outside of the network must interact.
  • the Internet includes client computers that access information and servers that sort and distribute information.
  • a program becomes a client when it sends a request to a server and waits for a response.
  • the client typically runs on a user's computer and facilitates access to information provided by the server (e.g., Netscape and Internet Explorer as WWW clients, xftp and Fetch as FTP clients).
  • Information is typically transmitted over the network in packets of data and each packet includes source and destination addresses, a packet length and time-to- live information.
  • the packets travel along links that are guided to the destination address by routers that optimize the travel path. After packets reach the destination address, they are reassembled.
  • Circuit-switched networks include a dedicated connection between two computers to guarantee capacity (e.g., the telephone system). Therefore, once a circuit is established, no other network activity will decrease the capacity of the circuit. Unfortunately, these networks are typically costly.
  • packet-switched networks e.g., the Internet
  • packet-switched networks divide the network traffic into packets to allow concurrent connections among computers. Multiple computers share the network, thereby decreasing the amount of connections. As network activity increases, however, the network resources between the communicating computers decreases.
  • Star, hierarchical and loop configurations are examples of point-to-point network topologies wherein each computer communicates directly with its neighbor and relies of its neighbor to relay data around the network.
  • Bus and ring configurations are examples of broadcast network topologies wherein a message is placed on the bus or ring containing the name of the intended receiving computer, all computers listen constantly, and if the name is identified by listener, then the message is captured. In this configuration, only one node can broadcast at a time.
  • Local Area Networks link several computers in a building and a LAN can itself be linked to other LANs.
  • LANs provide the high speed connections, but cannot span large distances.
  • typical LANS are usually bus-based (Ethernet) networks that span a small building and operate between 4 Mbps and 2 Gbps.
  • Wide Area Networks are not constrained by the physical distance between the endpoints in that they are intended to be used over large distances and operate at speeds between 9.6 Kbps and 45 Mbps.
  • networking protocols establish communication on a lower level of computer communication up to how application programs communicate.
  • ISO Open Systems Interconnection
  • the OSI model includes the following 7 layers in the particular top-down ordering:
  • the Physical Layer is the interface between the medium and the device that transmits bits (ones and zeros) and defines how the data is transmitted over the network, what control signals are used and the mechanical properties of the network.
  • the Data Link Layer provides low-level error detection and correction (e.g., if a packet is corrupted, this layer is responsible for retransmitting the packet).
  • the Network Layer is responsible for routing packets of data across the network (e.g., a large e-mail file is divided up into packets and each packet is addressed and sent out in this layer).
  • the Transport Layer is an intermediate layer that higher layers use to communicate to the network layer. This layer hides the complexities of low-level network communication from the higher layers.
  • the Session Layer is the user's (transparent) interface into the network. This layer manages the "current" connection (or session) to the network and maintains the communication flow. 6.
  • the Presentation Layer ensures computers speak the same language by converting text to ASCII or EBCDIC form and encoding/decoding binary data for transport.
  • the Application Layer includes communication between the user's programs (e.g., a file transfer or e-mail program).
  • a file transfer or e-mail program e.g., a file transfer or e-mail program.
  • network protocols that support specific layers (such as XTP and TP5), and other protocols that are in the form of entire protocol suites (such as the Heidelberg Transport System and Tenet). All of these protocols combine different transmission modes with the possibility of resource reservation.
  • Stream Protocol-II, SPII, and Resource ReServVation Protocol (RSVP) are both transport-level protocols that support guaranteed performance over one-way multicast (point-to-multipoint) communications.
  • Next Generation IP is designed to include format provisions for expanded addressing, multicast routing, labeling flows by QoS specifications and security and privacy. Unfortunately, few if any of these provisions have yet been developed or implemented.
  • current networks support a conventional teleconferencing system (160x120, 6-bit gray scale, lOfps, 11.05kHz audio, total bandwidth of 1 megabit/second).
  • lOfps 6-bit gray scale, lOfps, 11.05kHz audio, total bandwidth of 1 megabit/second.
  • any new network distributing real-time multimedia over the Internet will have to deliver highly-compressed data with a corresponding low transmission rate, that maintains frame-to-frame synchronization.
  • any good transmission system needs to be able to deal with the finite and possibly dynamic delay between the sender and the receiver.
  • the television signal engineers need only worry about a small, fixed delay due to the distance between the transmitter and the television set to which the television signal in the form of an electro-magnetic wave that propagates at the speed of light.
  • Multimedia streaming was developed to overcome, or at least temporarily stave off the effects of varying transmission delays.
  • Multimedia streaming buffers an amount of the data before presenting it to the user.
  • the rate of data output is independent of the input rate, as long as there is enough data in the cache to source the required amount of output. If the input rate begins to lag behind the output rate, eventually there will not be enough data in the cache to support the high output rate, and the data stream runs dry.
  • streaming applications may buffer anywhere from a few seconds of data to a few minutes of data.
  • Designers of streaming applications assume that the reservoir of data is never emptied because the output rate is in excess of the input rate and that the reservoir dries up when the end of the multimedia stream is reached.
  • the TCP/IP breaks the data down into a series of packets or frames. Buffers are created to hold the frames and the data contained in them. Due to the inherent asynchronous nature of the network, the buffers must be deployed and de-deployed dynamically from a pre-allocated series of buffer pools or from the global memory pool. The buffers have a variable lifetime depending on various conditions including network throughput, whether the included frame is part of a stream transmission, or whether the packet is a datagram.
  • a common TCP/IP buffering mechanism (“mbufs”) allows for variable length frames. Small amounts of data are placed directly in the data area of the mbuf, but larger amounts of data require mbufs to be extended with clusters.
  • a cluster is a data structure that holds heterogeneous data for extending the data-carrying capability of an mbuf. The mbuf contains header information and the data area that is extendable with the cluster if the frame does not fit into the data area. While these buffering mechanisms help reduce delay and increase synchronization, these protocols still do not support reliable transmission of real-time, multimedia data over a network.
  • a socket is a way to speak to other programs using standard Unix file descriptors.
  • Any I O process in Unix programs reads or writes to a file descriptor.
  • a file descriptor is simply an integer associated with an open file. This open file, however, can be a network connection, a FIFO, a pipe, a terminal, an "on-a-disk file", or just about anything else. Therefore, file descriptors are the means by which current programs communicate with other programs over the Internet.
  • a connection is established by calling a socket system routine that returns a socket descriptor, and a communication is established through it using a set of specialized send and receive socket calls. The send and receive calls offer better control over the data transmission than calling the normal read and write calls directly.
  • sockets including DARPA Internet addresses
  • Internet Sockets Internet Sockets
  • path names on a local node Uniform Sockets
  • CCITT X.25 addresses X.25 Sockets
  • two general types of internet sockets include “Datagram Sockets” and “Stream Sockets” (SOCK_STREAM and SOCK_DGRAM, respectively).
  • Datagram sockets are often referred to as “connectionless sockets” and stream sockets are two-way connected communication streams (e.g., if two items are output into the socket in the order "1, 2", they should arrive in the order "1, 2" at the opposite end).
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • Datagram sockets also use IP for routing, but they use the User Datagram Protocol (UDP: RFC-768) instead of TCP. Characteristic of datagram sockets is the fact that the packets may or may not arrive and they may arrive out of order. If, however, the packet arrives, the data within the packet will not contain errors.
  • the datagram sockets are connectionless because an open connection does not have to be maintained. In particular, a packet is assembled, it is labeled with an IP header with the destination information, and the packet is transmitted without the need for any connection. These sockets are generally used for packet-by-packet transfers of information including tftp and bootp.
  • Each datagram includes its own protocol on top of UDP.
  • the tftp protocol implements the functionality that for each packet that it sent, the recipient sends back a packet indicating receipt of the packet ("ACK" packet). If the sender of the packet does not receive the ACK packet within a predetermined time period, the sender will retransmit the packet until the ACK packet is received.
  • a packet is born and then it is wrapped ("encapsulated") in a header (and maybe footer) by the first protocol (e.g., the tftp protocol), then the entire packet including the header (e.g., a tftp header) is encapsulated again by the next protocol (e.g., UDP), and then again by the next protocol (e.g., IP), then again by the final protocol on the hardware (physical) layer (e.g., Ethernet). Thereafter, when another computer receives the packet, the hardware strips the Ethernet header, the kernel strips the IP and UDP headers, the tftp program strips the tftp header, and then the receiving computer finally has the data.
  • the first protocol e.g., the tftp protocol
  • UDP next protocol
  • IP e.g., IP
  • the hardware physical layer
  • the current Internet environment uses a system of network functionality called a Layered Network Model based on the 7-layer OSI model discussed above.
  • a Layered Network Model based on the 7-layer OSI model discussed above.
  • the same socket program is used without regard to how the data is physically transmitted (e.g., serial, thin Ethernet, AU1) because programs on the lower levels deal with these issues. Therefore, the actual network hardware and topology is transparent to the socket programmer.
  • the "stack" corresponding to the Layered Network Model includes (1) Application/Presentation/Session; (2) Transport; (3) Network; (4) Data Link; and (5) Physical.
  • FIGURE 2 illustrates the current implementation of TCP encapsulation including a sequence number field and an acknowledgement number field to maintain the order of the individual datagrams.
  • the state bits (Finish, Synch, Reset, Push, Acknowledge and Urgent) are used by the protocols on both ends to keep track of the state of the connection and manage the connection.
  • the actual data transmission is buffered and asynchronous. Unfortunately, this buffering and asynchronous transmission has resulted in little progress towards delivering real-time, highly-synchronous, streaming multimedia over a network.
  • a system would have to implement timer management associated with the retransmissions to compensate for lost packets and an intelligent, dynamic buffering manager to buffer packets until they are retransmitted or discarded.
  • the data link layer isolates the layers above from the physical and electrical transmission details. The majority of the implementations of the data link layer for TCP/IP do not implement any mechanism for reliability.
  • the upper part of the data link layer handles the framing details and the interface with the upper layers.
  • the lower part is often referred to as a device driver for the Network Interface Cards (NIC), including device management, interrupt handling and DMA control.
  • FIGURE 3 illustrates the data link layer encapsulation that includes the upper part and the lower part. After prepending the LLC headers in front of the data, the lower half of the link layer picks up the data and sets up the DMA and hardware for frame transmission.
  • the network layer encompasses the Internet domain knowledge, contains the routing protocols, and understands the Internet addressing scheme.
  • IP is the part of the network layer in the stack for handling the routing of packets across network interfaces. Domain naming and address management are also included in this layer. IP further includes a mechanism for fragmentation and reassembly of packets that exceed the data link layer's maximum transmission unit (MTU).
  • MTU is the maximum packet size that can be transmitted across a physical layer.
  • FIGURE 4 illustrates the IP layer encapsulation.
  • the IP layer prepends its headers on the front of the data it receives from the transport protocols, establishes the route for the packet according to routing tables, and inserts the appropriate address in the IP header.
  • the IP layer also calculates the checksum and sets the time-to-live (TTL) field.
  • TTL time-to-live
  • the transport layer implements sequenced packet delivery known as connection-oriented transfer by incorporating retrying and sequencing necessary to correct for lost information at the lower layers.
  • connection-oriented transfer sequenced packet delivery
  • UDP connectionless transmission
  • IP does not make any assumptions about the reliability in the underlying data link and physical layers. If a mechanism for reliable transfer is implemented, it is implemented above the network layer.
  • sockets Associated with the transport layer are (1) sockets; and (2) the Application Programming Interface (API).
  • the API defines a standard way of using network features by implementing various end-to-end protocols and interfacing these protocols with application programs. Unlike the interaction between the protocols that is buried deep in the operating system kernel, transport protocols are directly available to application programs through the API.
  • the session layer was originally intended to keep track of a logged-in user talking to a central time-sharing system (e.g., Telnet).
  • Telnet a central time-sharing system
  • TCP/IP only incorporates protocols through the transport layer, thereby resulting in the session layer not being differentiated from the application layer.
  • the presentation layer is included in the application layers and maps the user's view of the data as a structured entity to the lower layer protocol's view as a stream of bytes.
  • the application layer generally encompasses all applications of TCP/IP networking, including network file systems, web server or browsers, and client/server transaction protocols.
  • network management is in the application layer, although there are hooks to the lower layers to gather statistics.
  • a variation of the Layered Network Model collapses the layers into (1) an Application Layer ⁇ telnet, ftp, etc.); (2) a Host-to-Host Transport Layer ⁇ TCP, UDP); (3) an Internet Layer ⁇ IP and routing); and (4) a Network Access Layer (previously Network, Data Link, and Physical). These layers directly correspond to the encapsulation of the original data.
  • the data is simply sent out using a send command.
  • the packet is encapsulated in any of a variety of methods and sent out using a sendto command.
  • the kernel builds the Transport Layer and Internet Layer on top of the packet and the hardware builds the Network Access Layer.
  • the router strips the packet to the IP header, consults its routing table, and routes the packet.
  • the User Datagram Protocol is becoming an important player in the realm of multimedia protocols because it is essentially an interface to the low-level Internet Protocol, and it offers a fast checksum and I/O multiplexing through Berkeley sockets.
  • UDP is the choice of many multimedia network developers for applications that cannot be constrained by the flow control mechanism in TCP. In this environment, however, operation without any flow control in place quickly fills the local socket-level buffers, and the UDP datagrams are discarded before they even reach the physical network.
  • applications that generate datagrams faster than the kernel can handle the data result in poor utilization of CPU time and degradations in performance are observed, resulting in larger Round Trip Times (RTTs) and slower network throughputs.
  • RTTs Round Trip Times
  • protocols such as TCP, with a highly-refined flow control mechanism, attempt to dynamically center in on that optimum transmission rate through the feedback loop formed by data transmission and subsequent data acknowledgment.
  • UDP networks that operate without flow control are an unlikely candidate in the current network environment to handle high- bandwidth, highly-synchronized multimedia data streams.
  • any network that is expected to handle this type of real-time, multimedia data must preserve the linear presentation of timed-sequenced data.
  • current networks or protocols that attempt to preserve this relationship significantly compromise the overall quality of the image.
  • the data to be sent over the network is wrapped in a layer that is recognized by the delivery service that will be used and contains an address for the destination and an address of the sender.
  • the best-effort delivery service of UDP involves one attempt at delivery and, upon failure, the discarding of the entire data packet.
  • RTP is the Internet-standard protocol for the transport of real-time data, including audio and video. It is used for media-on-demand as well as interactive services such as Internet telephony. RTP consists of a data element and a control element (RTCP).
  • RTCP control element
  • the data element of RTP is a thin protocol providing support for applications with real-time properties such as continuous media (e.g., audio and video), including timing reconstruction, loss detection, security and content identification.
  • continuous media e.g., audio and video
  • timing reconstruction e.g., loss detection, security and content identification
  • RTCP provides support for real-time conferencing of groups within an Internet, including source identification and support for gateways like audio and video bridges, as well as multicast-to-unicast translators.
  • UDP/IP is RTP's target networking environment, but there have been efforts to make RTP transport- independent so that it can be used over CLNP, IPX or other protocols.
  • RTP does not address the issue of resource reservation or quality of service control, but relies entirely on resource reservation protocols such as RSVP.
  • ATM Asynchronous transfer mode
  • ATM is a high-performance, cell-oriented switching and multiplexing technology that utilizes fixed-length packets to carry different types of traffic.
  • ATM is a technology that enables carriers to capitalize on a number of revenue opportunities through multiple ATM classes of services.
  • Services based on asynchronous transfer mode (ATM) and synchronous digital hierarchy (SDH) and synchronous optical network (SONET) architectures were developed to provide the infrastructure for the evolving multimedia market. Unfortunately, ATM provides little support for multicasting.
  • ATM technology has its history in the development of broadband ISDN in the 1970s and 1980s. From a technical view, ATM is an evolution of packet switching. Similar to packet switching for data (e.g., X.25, frame relay, transmission control protocol [TCPJ/Internet protocol [P]), ATM integrates the multiplexing and switching functions, and is typically a good match for bursty traffic (in contrast to circuit switching). Additionally, ATM allows communication between devices that operate at different speeds.
  • packet switching for data e.g., X.25, frame relay, transmission control protocol [TCPJ/Internet protocol [P]
  • TCPJ/Internet protocol [P] transmission control protocol
  • ATM integrates the multiplexing and switching functions, and is typically a good match for bursty traffic (in contrast to circuit switching). Additionally, ATM allows communication between devices that operate at different speeds.
  • ATM Unlike packet switching, ATM generally supports high- performance, multimedia networking and has been implemented in a broad range of networking devices including PCs, workstations, server network interface cards, switched-Ethernet and token-ring workgroup hubs, workgroup and campus ATM switches, ATM enterprise network switches, ATM multiplexers, ATM-edge switches, and ATM-backbone switches.
  • ATM is also a capability that can be offered as an end-user service by service providers (as a basis for tariffed services) or as a networking infrastructure for these and other services.
  • the most basic service building block is the ATM virtual circuit, which is an end-to-end connection that has defined end points and routes, but does not include dedicated bandwidth. Bandwidth is allocated on demand by the network as users have traffic to transmit.
  • ATM also defines the following international standards to meet a broad range of application needs:
  • all information is formatted into fixed-length cells consisting of 48 bytes (8 bits per byte) of payload and 5 bytes of cell header.
  • the fixed cell size ensures that time-critical information such as voice or video is not adversely affected by long data frames or packets.
  • the header is organized for efficient switching in high-speed hardware implementations and carries payload-type information, virtual-circuit identifiers, and header error check.
  • the ATM connection standard organizes different streams of traffic in separate calls, thereby allowing the user to specify the resources required and allowing the network to allocate resources based on these needs. Multiplexing multiple streams of traffic on each physical facility (between the end user and the network or between network switches), combined with the ability to send the streams to many different destinations, results in cost savings through a reduction in the number of interfaces and facilities required to construct a network.
  • VPCs virtual path connections
  • VCCs virtual channel connections
  • a virtual channel connection (or virtual circuit) is the basic unit, which carries a single stream of cells, in order, from user to user.
  • a collection of virtual circuits can be bundled together into a virtual path connection.
  • a virtual path connection can be created from end-to-end across an ATM network. In this case, the ATM network does not route cells belonging to a particular virtual circuit. All cells belonging to a particular virtual path are routed the same way through the ATM network, thus resulting in faster recovery in case of major failures.
  • An ATM network also uses virtual paths internally for the purpose of bundling virtual circuits together between switches.
  • Two ATM switches may have many different virtual channel connections between them, belonging to different users. These can be bundled by the two ATM switches into a virtual path connection that serves the purpose of a virtual trunk between the two switches. The virtual trunk is then handled as a single entity by, perhaps, multiple intermediate virtual path cross connects between the two virtual circuit switches.
  • Virtual circuits are statically configured as permanent virtual circuits (PVCs) or dynamically controlled via signaling as switched virtual circuits (SVCs). They can also be point-to-point or point-to-multipoint, thus providing a rich set of service capabilities. SVCs are the preferred mode of operation because they can be dynamically established, thus minimizing reconfiguration complexity.
  • ATM allows the user to specify the resources required on a per-connection basis (per SVC) dynamically.
  • There are the five classes of service defined for ATM (as per ATM Forum UNI 4.0 specification). The QoS parameters for these service classes are as follows:
  • the technical parameters of ATM include:
  • circuit-based techniques e.g., via TI multiplexers or circuit switching
  • ATM adaptation Layer 1 ATM adaptation Layer 1
  • a TI 1.544-Mbps circuit requires 1.74 Mbps of ATM bandwidth when transmitted in circuit-emulation mode.
  • QoS Quality of Service
  • QoS-related parameters are typically expressed using the following QoS-related parameters:
  • Latency the delay that an application can tolerate in delivering a packet of data
  • synchronized data refers to the digital data streams that are highly correlated in the time domain. The data occurs at regular intervals, contrary to the asynchronous communication between computers and devices.
  • Image compression reduces the amount of data necessary to represent a digital image by eliminating spatial and/or temporal redundancies in the image information.
  • Video compression is a four dimensional problem that includes scenes containing an object that continuously changes without jumps and has no edges. There are also scenes where there are cuts that are large jumps in the temporal domain and large jumps in the spatial domain (e.g., edges).
  • Image compression is generally classified into two categories: (1) lossless and (2) lossy.
  • Loseless compression reduces the amount of image data stored and transmitted without any loss of information, thereby resulting in no image degradation. On the contrary, lossy compression results in information loss, thereby resulting in at least some image degradation.
  • the majority of compression standards use lossy compression to compress image data to fit a set of network constraints (e.g., limited memory for storage of data and/or limited bandwidth available for transmission of data). Lossy compression, however, would be unnecessary if the image data could be compressed enough to meet the network constraints using lossless compression of the data.
  • Lossy compression techniques typically use cosine-type transforms like DCT and wavelet compression that have a tendency to lose high frequency information due to limited bandwidth.
  • Fractal compression also suffers from high transmission bandwidth requirements and slow coding algorithms.
  • the compression of a digital signal reduces the bandwidth required for signal storage or transmission.
  • a high definition television (“HDTV") signal can require as much as 1 billion bits per second.
  • By reducing the amount of data by as much as a factor of fifty (e.g., to 20 million bits), present-day compression techniques facilitate compact storage and real-time transmission of complex signals.
  • Some well-known compression techniques include "JPEG,” “MPEG-1,” “MPEG-2,” “MPEG-4,” “H.261,” and “H.263.”
  • the primary goal of most of these compression techniques is to take an input stream of full-length video or audio, determine redundancies that exist in the signal, and encode those redundancies such that the input signal is compressed to be shorter in length. Compression can be used to eliminate spatial and temporal redundancies. For example, pixel values in a region of an image frame may be converted into information indicating that the region can be reproduced based upon another part of the same image frame or of a previous image frame, respectively.
  • the prior art compression algorithms generally rely on block-based, tile- based, or object-based encoding.
  • One of the image frames is divided into a number of square tiles, and the frame is compressed so that relatively less data is used for the image representation.
  • pixels for each tile will be separately compressed to remove either spatial redundancies within the same frame or temporal redundancies between frames.
  • a digital processing device compares pixels in each tile in one of the frames with image pixels found near the same tile location within the other frame.
  • the digital processing device compares pixels from a reference tile with pixel subsets of a fixed “search window” to determine a "closest match.'' After locating the "closest match", the digital processing device calculates a motion vector and a set of pixel value differences called “residuals.”
  • the "search window" for each tile defines a maximum set of boundaries beyond which searching will not be performed for the "closest match.”
  • a portion of one of the images includes many pixels, wherein a pixel is the smallest element of a picture consisting only of a single color and intensity.
  • An image frame typically consists of hundreds of tiles in both of X and Y directions, and each tile may have, for example, eight rows and eight columns of pixels.
  • Searching for the "closest match" of a data block is conventionally performed within the fixed search window about an expected location of the closest match.
  • Each square subset of contiguous pixels is sequentially compared to the data block and the "closest match” is the particular subset which differs least from the data block.
  • a motion vector identifies the location of the "closest match" with respect to the expected location, and the associated residuals contain pixel-by-pixel differences also arranged in a square tile.
  • the motion vector and residuals are then encoded in a compact manner, usually through "run-length coding,” “quantization” and “Huffman coding.”
  • the digital processing device repeats this process for each tile until the entire image frame is compressed.
  • the image frame is completely recalculated from an already-decompressed frame by reconstructing each tile using motion vectors and residuals.
  • the various standards mentioned above generally operate in this manner, although some new standards call for subdividing . images into variable size objects instead of tiles.
  • the underlying compression principles, however, are similar.
  • A. MPEG-1 MPEG compression results from the desire to distribute and display motion pictures in digital form, such as by means of a computer system.
  • MPEG Motion Pictures Experts Group
  • MPEG achieves a high compression rate by storing only the changes from one frame to another, instead of each entire frame.
  • the video information is then encoded using the DCT technique.
  • MPEG uses a type of lossy compression, since some data is removed, but the diminishment of data is generally imperceptible to the human eye.
  • the two major MPEG standards are: MPEG-1 and MPEG-2.
  • the most common implementations of the MPEG-1 standard provide a video resolution of 352x240 at 30fps. This produces video quality slightly below the quality of conventional VCR videos.
  • a second standard, MPEG-2 offers resolutions of 720x480 and 1280x720 at
  • MPEG-2 is often used by DVD-ROMs. MPEG-2 can compress a 2 hour video into a few gigabytes.
  • the MPEG video compression algorithm employs two basic techniques: block-based motion compensation for the reduction of the temporal redundancy, and transform domain (DCT) coding for the reduction of spatial redundancy.
  • the motion compensation technique is applied both in the forward (causal) and backward (non- causal) direction.
  • the remaining signal is coded using the transform-based technique.
  • the motion predictors e.g. the motion vectors discussed above are transmitted together with the spatial information.
  • the MPEG-2 standard uses the same set of algorithms as MPEG-1, and has an additional support for interlaced video sources and scalability options. Although there are minor differences in the syntax, the MPEG-2 standard is conceptually a super-set of MPEG-1.
  • MPEG three types of pictures are defined in MPEG: intra ( I) pictures, predicted ( P) pictures, and bidirectionally interpolated ( B) pictures.
  • I pictures are coded with reference to a previous picture, which can be either an I or P picture.
  • B pictures are intended to be compressed with a low bit rate, using both the previous and future references. The B pictures are never used as the references.
  • the MPEG standard does not impose any limit to the number of B pictures between the two references.
  • the I-frame is sent every fifteen frames regardless of video content.
  • This introduction of I-frame asynchronously into the video bitstream in the encoder is wasteful and introduces artifacts because there is no correlation between the I-frames and the B and P frames of the video, thereby resulting in wasted bandwidth.
  • bandwidth is wasted because the I-frame was unnecessary.
  • each picture is divided into blocks of 16 x 16 pixels, called a m ⁇ croblock.
  • Each macroblock is predicted from the previous or future frame, by estimating the amount of the motion in the macroblock during the frame time interval.
  • the MPEG syntax specifies how to represent the motion information for each macroblock. It does not, however, specify how such vectors are to be computed. Due to the block- based motion representation, many implementations use block-matching techniques, where the motion vector is obtained by minimizing a cost function measuring the mismatch between the reference and the current block. Although any cost function can be used, the most widely-used choice is the absolute difference (AE).
  • AE is calculated at several locations in the search range.
  • TSS Three- Step-Search
  • This algorithm first evaluates the AE at the center and eight surrounding locations of a 32 x 32 search area. The location that produces the smallest AE then becomes the center of the next stage, and the search range is reduced by half. This sequence is repeated three times.
  • the MPEG standard For the reduction of spatial redundancy in each /picture or the prediction error in P and B pictures, the MPEG standard employs a DCT-based coding technique.
  • the two-dimensional DCT is separable, and it can be obtained by performing one- dimensional DCTs on columns and one-dimensional DCTs on rows.
  • An explicit formula for the 8 x 8 two-dimensional DCT can be written in terms of the pixel values and the frequency-domain transform coefficients.
  • the transformed DCT coefficients are then quantized to reduce the number of bits to represent them and also to increase the number of zero-value coefficients.
  • a combination of quantization and run-length coding contributes to most of the compression.
  • a uniform quantizer is used in the MPEG standard, with a different step size for each DCT coefficient position. Since the subjective perception of the quantization error varies with the frequency, higher frequency coefficients are quantized more coarsely, using a visually-weighted step-size.
  • different quantization matrices are used for intra-coded and inter-coded blocks, since the signal from intra-coding has different statistical characteristics from the signal resulting from prediction or interpolation.
  • Intra-coded blocks contain energy in all frequencies and are likely to produce blocking artifacts if too coarsely quantized.
  • blocks coded after the motion prediction contain predominantly high frequencies and can be subject to much coarser quantization.
  • E. ENTROPY CODING The quantized DCT coefficients are then rearranged into a one-dimensional array by scanning them in a zig-zag order. This rearrangement puts the DC coefficient at the first location of the array and the remaining AC coefficients are arranged from the low to high frequency, in both the horizontal and vertical directions. The assumption is that the quantized DCT coefficients at higher frequencies would likely be zero, thereby separating the non-zero and zero parts.
  • the rearranged array is coded into a sequence of the run-level pair.
  • the run is defined as the distance between two non-zero coefficients in the array.
  • the level is the non-zero value immediately following a sequence of zeros.
  • the MPEG-2 standard was targeted at the bit rate of around 1.5 Mbits/s, it was assumed that the source video signal will be digitized at around 352 x 240 for 60 Hz systems (e.g., in U.S.) and 352 x 288 for 50 Hz systems (e.g., in Europe).
  • the standard video signals carry twice the scan lines as the above sampling rates, with an interlaced scanning order. Therefore, the simplest way of creating a half-size digital picture was simply sampling only one field from each frame. The other field was always discarded. Since only one field from every frame is used, these sampled fields form a progressively-scanned video sequence.
  • MPEG-1 only addresses the coding parameters and algorithms for progressively-scanned sequences.
  • the syntax of the MPEG-1 standard does not constrain the bit rate or the picture size to any such values.
  • MPEG-2 is targeted for coding broadcast-quality video signals, it is necessary to digitize the source video at its full bandwidth, resulting in both even and odd field pictures in the sequence. Since these two fields are separated by a time interval, coding the sequence using the MPEG-1 algorithm does not produce good quality pictures as MPEG-1 assumes that there is no time difference between successive lines in the picture.
  • the MPEG-2 standard provides a means of coding interlaced pictures by including two field-based coding techniques: field-based prediction and field-based DCT.
  • the term picture refers to either a frame or a field. Therefore, a coded representation of a picture may be reconstructed to a frame or a field.
  • the encoder has a choice of coding a frame as one frame picture or two field pictures. If the encoder decides to code the frame as field pictures, each field is coded independently of the other, i.e., two fields are coded as if they were two different pictures, each with one-half the vertical size of a frame.
  • each macroblock can be predicted (using motion compensation) on a frame or field basis.
  • the frame-based prediction uses one motion vector per direction (forward or backward) to describe the motion relative to the reference frame.
  • field-based prediction uses two motion vectors, one from an even field and the other from an odd field. Therefore, there can be up to four vectors (two per direction, and forward and backward directions) per macroblock.
  • the prediction is always field-based, but the prediction may be relative to either an even or odd reference field.
  • each macroblock in a frame picture can be coded using either frame-based or field-based DCT.
  • the frame-based DCT is same as the DCT used in MPEG-1.
  • the field-based DCT operates on alternating rows, i.e., rows from the same field are grouped to form an 8 x 8 block.
  • the sensor network includes a set of interconnected sensors coupled to a control module.
  • the control module receives a set of sensed data from the sensors and generates a homogenized data stream based on the sensed data.
  • the communication bridge is coupled to the sensor network and buffers the homogenized data stream.
  • the user network is coupled to the communication bridge and receives the homogenized data stream from the sensor network. The user network transmits data back to the control module through the communication bridge.
  • Yet another object of this invention is to provide a method for providing multimedia data over a network including the steps of processing a set of multimedia information including a set of temporal data and a set of spatial data, compressing the set of temporal data and the set of spatial data, and interpreting the set of spatial data from the set of temporal data.
  • Another object of this invention is to provide a multimedia sensor network configured to integrate temporal data with spatial data.
  • the network includes a plurality of sensors configured to generate multimedia data, and a processor.
  • the processor processes, compresses and transmits the multimedia data.
  • the processor includes an encoder coupled to a local area network, wherein the local area network transmits compressed temporal data through a first communication channel and the compressed spatial data through a second communication channel.
  • Another object of this invention is to provide a network including a sensor network, an intelligent compression module, a communication bridge and a user network.
  • the sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a homogenized data stream based on the sensed data.
  • the intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data.
  • the communication bridge is coupled to the sensor network and buffers the homogenized data stream received from the sensor network.
  • the user network is coupled to the communication bridge, receives the homogenized data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Another object of this invention is to provide a multimedia network including a sensor network, an intelligent compression module, a communication bridge, and a user network.
  • the sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data.
  • the intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data.
  • the communication bridge is coupled to the sensor network and includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Another object of this invention is to provide a multimedia network including a sensor network, a communication bridge and a user network.
  • the sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the control module receives a set of sensed data from the plurality of sensors including a set of temporal data and generates a data stream based on the sensed data.
  • the communication bridge is coupled to the sensor network and includes a buffer manager to buffer the data stream received from the sensor network and a quality of service manager to guarantee a particular bandwidth for the transmission of the data stream.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Another object of this invention is to provide a tracking network including a sensor network, a communication bridge and a user network.
  • the sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data from the plurality of sensors and generates a data stream based on the sensed data.
  • the communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Yet another object of this invention is to provide a tracking network including a sensor network, an intelligent compression module, a communication bridge and a user network.
  • the sensor network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track a moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors and generates a data stream based on the sensed data.
  • the intelligent compression module is coupled to the sensor network and a set of spatial data is interpreted from the set of temporal data.
  • the communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Another object of this invention is to provide a tracking network including a motion detection network, a communication bridge and a user network.
  • the motion detection network includes a plurality of interconnected sensors coupled to a control module, wherein the plurality of sensors track at least one moving object in a monitoring area, and the control module receives a set of sensed data including a set of temporal data from the plurality of sensors, and generates a data stream based on the sensed data.
  • the communication bridge is coupled to the motion detection network and buffers the data stream received from the motion detection network.
  • the user network is coupled to the communication bridge, receives the data stream from the motion detection network and transmits a set of input data to the control module through the communication bridge.
  • the control module receives at least a set of location coordinates corresponding to the at least one moving object from a first sensor and transmits the set of location coordinates to a second sensor that tracks the at least one moving object.
  • Yet another object of this invention is to provide a method of tracking at least one moving object in a monitoring area including the steps of testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object, processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects, tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects, transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects, and transmitting the second set of data from the second sensor to a user network through a communication bridge.
  • Another object of this invention is to provide a method of tracking at least one moving object in a monitoring area including the steps of testing a plurality of interconnected sensors coupled to a control module in a motion detection network to determine if any of the plurality of sensors is activated based on movement of the at least one moving object, processing a set of sensed data, including a set of temporal data, from a first sensor to calculate a set of location coordinates associated with each of the at least one moving objects, tracking each of the at least one moving objects with a second sensor based on the set of location coordinates associated with each of the at least one moving objects, transmitting a second set of data from the second sensor to an object recognition module coupled to the control module to determine if each of the at least one moving objects is in a set of significant objects, intelligently compressing the second set of data by interpreting a set of spatial data from the set of temporal data, and transmitting the compressed second set of data from the second sensor to a user network through a communication bridge.
  • Another object of this invention is to provide a method of fusing data in a sensor network including the steps of approximating an initial draft of a set of fuzzy rules corresponding to a set of sensed data from a plurality of interconnected sensors coupled to a control module, mapping the initial draft of the set of fuzzy rules to a location and a curve of a set of membership functions, fine-tuning the location of the set of membership functions for optimal performance of the set of fuzzy rules using a neural network, submitting a set of training data to a fuzzy rule base and the neural network, generating a set of initial fuzzy membership functions using the neural network, submitting the set of initial fuzzy membership functions to the fuzzy rule base, generating an actual output from the fuzzy rule base, comparing the actual output with a desired output contained in the set of training data, adjusting a set of neural network weights, thereby adjusting the set of membership functions, and presenting the adjusted set of membership functions to the fuzzy rule base until a difference between the actual output and the desired output is below a predetermined minimum threshold value.
  • Yet another object of this invention is to provide a network including a sensor network, a gateway software agent, a host software agent, a communication bridge, and a user network.
  • the sensor network includes a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module.
  • the control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data.
  • the gateway software agent is coupled to the set of middle area sensors, intelligently filters a contextual meaning from the sensed data and determines whether the sensed data is meaningful.
  • the host software agent is coupled to the set of wide area sensors and collects, processes and transmits the sensed data to the control module.
  • the communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • another object of this invention is to provide a network including a sensor network, a communication bridge and a user network.
  • the sensor network includes a set of local area sensors, a set of middle area sensors, and a set of wide area sensors coupled to a control module.
  • the control module receives a set of sensed data from the set of local area, the set of middle area and the set of wide area sensors, and generates a data stream based on the set of sensed data.
  • the communication bridge is coupled to the sensor network and buffers the data stream received from the sensor network.
  • the user network is coupled to the communication bridge, receives the data stream from the sensor network and transmits a set of input data to the control module through the communication bridge.
  • Each of the set of local area sensors, the set of middle area sensors, and the set of wide area sensors monitors a limited region of a monitoring area, and a portion of each the limited regions overlaps with the limited region corresponding to each of the set of local area, the set of middle area, and the set of wide area sensors.
  • FIGURE 1 is a diagram of a Layered Network Model according the prior art
  • FIGURE 2 is a diagram of TCP encapsulation according to the prior art
  • FIGURE 3 is a diagram of the Data Link Layer encapsulation according to the prior art
  • FIGURE 4 is a diagram of the IP layer encapsulation according to the prior art
  • FIGURE 5 is a diagram of a multimedia network system according to the present invention.
  • FIGURE 6 is a diagram of the sensors and the Internet in the multimedia network system according to the present invention
  • FIGURE 7 is a diagram of the topography of a sensor network according to the present invention
  • FIGURE 8 is a diagram of a Gateway Software Agent (GSA) according to the present invention
  • FIGURE 9 is a diagram of a network of Gateway Software Agents according to the present invention.
  • FIGURE 10 is a flowchart of the transmitter flow of the sensor network according to the present invention.
  • FIGURE 11 is a flowchart of the receiver flow of the sensor network according to the present invention.
  • FIGURE 12 is a diagram of the hardware configuration for performing meaningful I-frame insertion according to the present invention.
  • FIGURE 13 is a flowchart of the error accumulation procedure in the motion estimation according to the present invention.
  • FIGURE 14 is a diagram of a system for encoding multiple channels of video data according to the present invention
  • FIGURE 15 is a diagram of the multimedia network system according to the present invention
  • FIGURE 16 is a diagram of a LAN design of the multimedia network system according to the present invention.
  • FIGURE 17 is a diagram of a single-channel encoder with a communication interface according to the present invention.
  • FIGURE 18 is a diagram of a decoder with a communication interface according to the present invention.
  • FIGURE 19 is a diagram of the frame data content of an original frame, a compressed meaningful I-frame at low compression, a compressed I-frame at high compression in the MPEG stream, and compressed MPEG daughter frames according to the present invention
  • FIGURE 20 is a diagram of a frame cycle corresponding to MPEG frames according to the prior art
  • FIGURE 21 is a diagram of standard MPEG time performance according to the present invention.
  • FIGURE 22 is a diagram of time performance related to inserting an I-frame at the transition point between scenes according to the present invention
  • FIGURE 23 is a diagram of the transmission of only the meaningful I-frames at a low compression rate according to the present invention
  • FIGURE 24 is a flowchart of the meaningful I-frame insertion procedure according to the present invention
  • FIGURE 25 is a diagram of a circular buffer according to the pres .i invention
  • FIGURE 26 is a diagram of the memory of a circular buffer according to th ⁇ present invention.
  • FIGURE 27 is a flowchart of the encoder process implementing buffer management according to the present invention
  • FIGURE 28 is a flowchart of the decoder process implementing buffer management according to the present invention
  • FIGURE 29 is a flowchart of the buffer write process according to the present invention.
  • FIGURE 30 is a flowchart of the buffer read process according to the present invention.
  • FIGURE 31 is a diagram of a neural network for a tracking system according to the present invention.
  • FIGURE 32 is a diagram of a tracking module according to the present invention
  • FIGURE 33 is a flowchart of the tracking process in the sensor network according to the present invention
  • FIGURE 34 is a flowchart of the object identification process in the sensor network according to the present invention.
  • FIGURE 35 is a diagram of the tracking of separate objects in the sensor network according to the present invention.
  • a multimedia network system 10 includes an intelligent, integrated sensor network 12 having a variety of sensors including a set of Local Area Sensors 14 (LAS), a set of Middle Area Sensors (MAS) 16, and a set of Wide Area Sensors (WAS) 18.
  • LAS Local Area Sensors
  • MAS Middle Area Sensors
  • WAS Wide Area Sensors
  • Sensor network 12 is coupled to a sensor fusion defect module 20, a tracking module 22, and a compression module 24.
  • Compression module 24 includes a bit selective error correction module 26 and intelligent agent module 28.
  • Sensor network 12 is coupled to a bridge 30 over a communication line 32.
  • Bridge 30 includes a QoS Manager 34 and a Buffer Manager 36.
  • Bridge 30 connects a highly synchronous, real-time TCP/IP -packetized multimedia data stream on line 32 from a sensor network 12 to an asynchronous user network 38 (e.g., the Internet) over a communication line 40.
  • asynchronous user network 38 e.g., the Internet
  • Buffer Manager 36 intelligently and interactively changes the buffering settings of the data buffers in response to the network and process conditions, thereby minimizing latency in the network while maintaining frame-to-frame synchronization of the data. Buffer Manager 36 further minimizes the use of memory while maintaining the asynchronous communication in network 38 and over lines 32 and 40. Similarly, QoS Manager 34 reduces latency in network 38 while maintaining the quality of the full-frame multimedia data stream.
  • low-level implementation of TCP/IP in the present invention includes modifying the transport layer by implementing a Buffer Manager API and a QoS Manager API (as opposed to the standard MicrosoftTM Winsock APIs) to optimally configure and manage the NIC transmitter (encoder) and NIC receiver (decoder) buffers on the physical layer.
  • the present invention operates on the layers below the application layer (e.g., the transport and physical layers) to ensure synchronous transmission of data in an asynchronous network environment.
  • bridge 30 provides full-frame, multimedia data to network 38 by synchronizing and integrating the synchronous multimedia data streams and the asynchronous network streams.
  • data that is transmitted on lines 32 and 40 also includes a variety of sensor data such as temporal data (video), spatial data (still imagery), audio data, sensor telemetry, infrared data, control data, messages, laser data, magnetic data, thermal data, seismic data, motion data, and chemical data.
  • data transmitted by sensor network 12 to Internet 38 may comprise one-dimensional audio data or one-dimensional sensor data, as opposed to the "streaming" video and audio that is also transmitted to Internet 38.
  • WAS 18 in network 12 is mounted on a mobile platform and configured to transmit data using wireless communication signals.
  • each of a plurality of sensors 42 in sensor network 12 are TCP/IP-addressed, thereby allowing a user at a remote location to connect to Internet server 38 and directly communicate with any of sensors 42.
  • sensors 42 are high-resolution cameras that can be placed along the U.S. /Mexican border, and a government agent sitting in an office in Washington D.C. can connect to Internet server 38 and select any of sensors 42 to view real-time video of activity being recorded by cameras 42.
  • FIGURE 7 illustrates the hierarchic, "molecular" sensor network architecture of sensor network 12.
  • Network 12 includes LAS sensors 14, MAS sensors 16 and WAS sensors 18, wherein a communication link 44 connects WAS sensors 18 together, a communication link 46 connects WAS sensors 18 to MAS sensors 16, a communication link 48 connects MAS sensors 16 to LAS sensors 14, and a communication link 54 connects WAS sensors 18 to a control module 56.
  • An emergency communication link 50 connects LAS sensors 14 to each other and an emergency communication link 52 connects MAS sensors 16 to each other.
  • communication links 44, 46, 48, 50 and 52 are wireless communication links and communication link 54 is a satellite relay to control module 56.
  • LAS sensors 14 and MAS sensors 16 each have omnidirectional antennae, thereby simplifying network reconfiguration and sensor relocation.
  • MAS sensors 16 and WAS sensors 18 include stand-alone, highly- distributed 8 billion operations per second (BOPS) supercomputer-path processing power for the reduction of intra- and inter-network communication bandwidth, information overflow and for supporting redundancy and the self-healing capability of network 12.
  • Compression module 24 (FIGURE 5) includes specific graphic ICs that perform simple arithmetic operations on 256 parallel processors at 8 BOPS performing application-specific integrated circuit (ASIC) operations.
  • All sensor data in sensor network 12, including video, is homogenized into TCP/IP -packetized streams, thereby significantly simplifying the real-time multimedia data transfer over network 38 (e.g., the Internet).
  • the homogenization, or fusing of sensor data is the joining together of data from a variety of different sensors (e.g., a sensor pair fusing of vision and GPS for autonomous navigation).
  • the sensor fusion of data is described in detail in the invited paper "Soft Computing and Soft Communication For Synchronized Data," the Society of Photo-Optical Instrumentation Engineers Proceedings, Vol. 3812, pp. 55-67 (1999), inco ⁇ orated herein by reference.
  • sensor network 12 is a highly redundant system with a self-healing and configurable architecture that is extremely tolerant to jamming, multi-path errors, and sensor failure with electromagnetic interference immunity in a frequency hopping spread spectrum wireless or fiber sensor communication network.
  • Sensor network 12 operates in a nearly-autonomous mode based on specialized compression in compression module 24 implementing bit selective e ⁇ or correction 26 and using intelligent agent module 28. In effect, sensor network 12 is insensitive to the physical communication layer because of the multiple options for communication paths between any two nodes on network 12 and the choice of a data routing network protocol that is redundant.
  • sensor data is optimized for a non-TCP protocol (e.g., TCP/IP with elements of ATM packetizing, UDP, etc.).
  • the high level of intranetworking is based on compression module 24 in which all types of multimedia data (including TV-grade video) are compressed into packetized digital streams up to 4000:1 (VGA) and negotiated through a Gateway Software Agent 58 (FIGURE 8).
  • This intelligent, high compression ratio further allows data rates to be as low as a few kilobits per second - even for imagery.
  • bit selective e ⁇ or co ⁇ ection module 26 performs bit co ⁇ ection at the TCP level as well as further hierarchic lossless and lossy data compression.
  • the minimum latency (100ms or lower) is based on a low transmission bandwidth and bit-selective e ⁇ or co ⁇ ection, thereby reducing buffering.
  • Sensor network 12 is autonomously adaptable and tolerant to sensor failure, insofar that network 12 relies on progressive accuracy and automatic sensor network reconfigurability to compensate for sensor failure.
  • Sensor fusion defect module 20 processes any "requests for help" from sensor network 12 in the near-autonomous mode. In the prefe ⁇ ed embodiment of the present invention, sensor network 12 is autonomous 90% of the time and only at critical moments will sensor fusion defect module 20 alert an operator for assistance in the form of 3-D teleoperation at WAS sensor 18.
  • the "request for help" from sensor network 12 that is processed by sensor fusion defect module 20 is in response to contradictory data acquired from stationary and/or mobile sensors.
  • a GPS system transmits satellite data of visual landmarks from a vertical point of view and this data is translated to a horizontal view using different techniques such as template matching. If the satellite data contradicts data being transmitted in network 12 from other sensors relating to the same visual landmarks, sensor fusion defect module processes a "request for help" to resolve the contradictory information.
  • sensor network 12's 8 BOPS of processing power eliminates the power constraints of conventional small communication platforms, thereby making sensor network 12 data-centric instead of node-centric.
  • Sensor network 12 is only limited by the fact that each sensor has a limited (but overlapping) view of a scene or a limited (but overlapping) monitoring area. In this sense, each sensor in network 12 synthesizes a partial scene based on its own data and that of its two closest neighbors. These data overlaps create redundancy that, in turn, increases the robustness of sensor network 12.
  • intelligent sensor network 12 includes the following categories of sensors interconnected through various communication links:
  • LAS Local or “Point” Sensors 14 such as magnetic, simple seismic, simple chemical, temperature, and wind sensors that detect locally (e.g., reading 1 m/s and 10Hz vibration), etc.;
  • MAS 16 such as voice, spectrometers, x-rays, and complex chemical sensors that are non-imaging sensors of characteristics such as intensity vs. wavelength; and
  • "2-D” or “Wide Area Sensors” (WAS) 18 such as video, forward-looking infrared, imaging radar, and complex seismic sensors that include 2-D imagery, 3-D volumetric renderings, as well as 4-D video (3-D spectral video sequences) and higher dimensionality (e.g., hyperspectral video).
  • sensor fusion of data from a large number of heterogeneous sensors 14, 16 and 18 in sensor network 12 is based on neuro-fuzzy processing.
  • Fuzzy rules are generally derived based on human experts' knowledge and observation. This process of deriving the fuzzy rules is labor intensive and it may be impossible to retain the accuracy of the fuzzy rules as the number of sensors increases and new types of sensors are added.
  • sensor network 12 is preferably a system based on neuro-fuzzy processing that relies on human experts approximating initial drafts of the fuzzy rules and then mapping the rules to the location and curve of the membership functions. The neural network in network 12 fine-tunes the location of the membership functions to ensure the optimal performance of the fuzzy rules.
  • training data is submitted to both the rule base and the neural network.
  • the neural network presents the initial fuzzy membership functions to the fuzzy rule base.
  • the fuzzy rule base uses these functions to generate an actual output that is compared with the desired output contained in the training data.
  • the training algorithm changes the neural network weights, thus adjusting the membership functions.
  • the new functions are then presented to the rule base, and the process repeats until the difference between the actual and the desired outputs is minimized. Additional sets of training data are iteratively applied until the final membership function parameters, and therefore the membership function shapes, converge to final values.
  • the operation of the fuzzy rule base in the present invention closely mimics the operation represented by the training data.
  • This level of compression opens up additional channels, thereby offering sensor network 12 additional communication channels for situations that require a rerouting of the data or additional bandwidth. For example, if one of sensors 14, 16 or 18 is broken, open channels provide redundant communication paths for the data to bypass the broken sensor.
  • the natural network hierarchy of sensor network 12 is derived from the fact that the hierarchic level associated with each LAS sensor 14, MAS sensor 16 and
  • WAS sensor 18 defines its communication platform level. For example, LAS sensor 14 does not require digitization (or it can use low bandwidth digitization), thereby transmitting analog data to MAS sensor 16 over standard analog RF channels that require a few cubic centimeters of hardware and drawing around 0.1 W of power. These types of sensors are primarily in "sleep" mode and are activated by a "wakeup" signal from MAS sensor 16. In an alternative embodiment of the present invention, LAS sensor 14 includes sophisticated analog video.
  • Gateway Software Agents 58 are self-organized fuzzy controllers that operate with a million operation per second (MOPS) processing power.
  • Gateway Software Agent 58 includes an analog-to-digital converter 60, a template matching block 62 coupled to a filter bank 64, a decision making block 66 and a communication interface (CI) 68.
  • communication interface 68 is a "Harris prism" or some other common CI.
  • template matching block 62 may be replaced with a similar target/signature recognition system.
  • Template matching block 62 receives digitized input data from converter 60 and digitally cross-co ⁇ elates the sample signal with a set of filter signals from filter bank 64, thereby generating a co ⁇ elation peak D.
  • decision making block 66 the magnitude of D is compared with a predefined threshold value T. In the alternative, the magnitude of D is compared with multiple threshold values in an advanced fuzzy logic system. If the co ⁇ elation peak D is greater than or equal to the threshold value T, then a positive decision is sent to communication interface 68 and transmitted to WAS sensor 18.
  • the output signal is organized as a simple fuzzy sentence logic such as "audio data indicates human voice making it highly probable that a person is passing no farther than 100 mfrom the border".
  • MAS sensor 16 also collects and digitizes analog signals from LAS sensors 14 and formulates the output in a fuzzy sentence as described above.
  • the molecular architecture of sensor network 12 makes information filtering and transfer hierarchic. Moreover, sensor information filtering is performed intelligently — meaning within the "contextual meaning" of the information, thereby significantly reducing bandwidth because a fuzzy sentence output requires only a few bits to transmit. On the contrary, digitized input data requires at least a few kilobits of bandwidth.
  • WAS sensor 18 communication is very complex as compared to LAS sensor 14 and MAS sensor 16 processing.
  • WAS sensor 18 requires 8 BOPS of processing power or even higher because the full frames with Mbits or Gbits of data must be evaluated in real-time to maintain real-time operation.
  • each WAS sensor 18 includes a processor with 8 BOPS implanted on a 3 inch x 2 inch x 0.5 inch printed circuit board.
  • the complexity of processing power required for WAS sensor 18 is derived from template matching in 2-D space and processing a full-image frame of approximately 7 Mb of data in a fraction of a second.
  • Each WAS sensor 18 collects data from MAS sensor 16 and transmits the data over communication links 54 to control module 56.
  • INTELLIGENT COMPRESSION Sensor network 12 reduces bandwidth using compression module 24 by implementing simple compression relying on redundancy or intelligent compression based on intelligent or software agents similar to Gateway Software Agent 58.
  • Standard MPEG compression is based on a simple subtraction operation in that the MPEG method includes a mother frame and daughter frames that represent a pixel-by-pixel difference between the mother frame and a subsequent frame.
  • other prior art compression standards also eliminate redundancy in a similar manner, but these methods do not filter out important information from the frames.
  • Performing a simple operation like subtraction at high speeds leads to operating with parallel systems.
  • compression module 24 includes significant processing power to calculate a simple subtraction operation numerous times. This processing power combined with the simple subtraction processing results in network system 10 evaluating a full-frame of data within 3 frames latency (90ms). Therefore, any latency issues in network system 10 are primarily based on operating with parallel systems.
  • compression module 24 of sensor network 12 includes a different compression algorithm such as template matching.
  • Template matching is based on pattern recognition techniques that include template-by-template (or pixel-by-pixel) comparisons to stored data in a database.
  • Sensor network 12 is configured to implement template matching based on a simple comparison of data, similar to the simple subtraction in MPEG.
  • template matching is an extremely slow process, thereby motivating users to turn to Fourier processing.
  • Fourier processing is complex and not invariant to off-plane movement (rotation).
  • Gateway Software Agent 58 in FIGURE 8 implements template matching of the characteristics of multidimensional distributions. As explained above, Gateway Software Agent 58 is simply a "gateway" in that it only decides whether the data information is meaningful or not. If the information is meaningful, then agent 58 transmits the information to WAS sensor 18, otherwise the data is not transmitted.
  • Gateway Software Agent 58 is a Host Software Agent 70 illustrated in FIGURE 9 that collects, processes and transmits visual information in the form of video, imagery, radar imagery and other sensed data, in addition to data from MAS sensor 16. Host Software Agent 70 collects, processes and transmits this data using a concept called progressive accuracy — meaning that the visual information transmission is organized in such a way that only an "information sample" is transmitted to control module 56.
  • Host Software Agent 70 negotiates with control module 56 to determine which subset of more complete information should be transmitted. Even at a compression ratio of 4000:1, the original VGA bandwidth of 221 Mbps in sensor network 12 is not always reduced to a level that avoids nodal overload. Therefore, the implementation of progressive accuracy in sensor network 12 guards against nodal overload by selecting transmitting a "first cut" of critical data, quickly followed by more detailed information as required by control module 56. In the prefe ⁇ ed embodiment of the present invention, data from Host Software
  • Agent 70 to control module 56 is packetized in the transport layer using TCP/IP.
  • the data is packetized using a combination of TCP/IP and ATM.
  • ATM includes fast flow control, hard QoS (as opposed to QoS emulation) and high voice quality
  • ATM also has a fixed cell size, a significant amount of operation system interrupts and high computational overhead.
  • the prefe ⁇ ed embodiment of packetizing using TCP/IP is based on variable-size packets, a 6:1 decrease in operation system interrupts as compared to ATM, cheap Ethernet, and efficient trunk packets.
  • standard implementation of TCP/IP includes very slow flow control and no QoS provisions. Therefore, in implementing TCP/IP in the present application, sensor network 12 also includes bridge 30 supporting QoS Manager 34 and Buffer Manager 36.
  • LAS sensors 14 are coupled to MAS sensors 16 via communication links 48. LAS sensors 14 are also coupled to each other via communication links 50.
  • MAS sensors 16 are coupled to WAS sensors 18 via communication links 46 and to each other via communication links 52.
  • MAS sensor 16 is also coupled to Gateway Software Agent 58.
  • WAS sensors 18 include a flash memory 72 and are coupled to Host Software Agent 70 and control module 56.
  • a visualization module 74, a graphic overlay module 76 and a memory 78 are coupled to Host Software Agent 70, and visualization module 74 is also coupled to control module 56.
  • Host Software Agent 70 also transmits intelligent, filtered, sensor information within the "contextual means" of the information, thereby reducing transmission bandwidth, in addition to applying progressive accuracy to transmit a "first cut" of the critical data.
  • sensor network 12 is configured to identify and transmit to control module 56 an initial sampling of critical information based on processing by network 12.
  • this approach models the way in which a person first recognizes the contours and edges of an object, followed by a more detailed analysis of the entire scene.
  • network 10 defines and isolates a window of opportunity around a moving object relying on 8 BOPS of processing power in network 10.
  • the object in the window is assigned the majority of the available bandwidth while the background is assigned a very small percentage (but not zero) of the available bandwidth.
  • a user at control module 56 watches the defined window with the target object and determines whether the particular image requires further analysis. If the user requires additional data, a video clip from network system 10 is sent to control module 56. The user views either the original compressed video clip or a still image that has been compressed at a lower compression ratio.
  • Bo is the uncompressed VGA frame (7.3Mbits)
  • CD is the average compression ratio of an MPEG stream
  • CI is the compression ratio of only the I-frames in this synchronized I-frame MPEG cycle
  • CW is the window compression
  • the frame bit volume must be 0J3kbps, which is equivalent to 10,000:1 compression. Obviously, this 10,000:1 compression ratio is a prohibitively high compression ratio for compression module 24.
  • compression module 24 defines a window of opportunity that is 1/64 of the total area (640 x 480).
  • the 1/64 window is compressed with a relatively low compression ratio (e.g., 1000:1) and the balance of the image is compressed with a high compression ratio (e.g., 11,600:1) just to approximate the background of the image.
  • a full VGA video clip is transmitted to control module 56.
  • still images are transmitted to control module 56 using one of the formats outlined below:
  • the still image includes high resolution details not likely to be included in the compressed video clip.
  • still images of video clips in the prior art network systems are not representative of meaningful data because the insertion of the I-frame occurs every 15 frames in accordance with MPEG standards independent of the context of the frames instead of in response to scene changes.
  • the still images from the video stream from sensor network 12 are meaningful because the I-frames are intelligently inserted as needed co ⁇ esponding to the beginning of a scene, and therefore represent the full temporal event based on constantly monitoring an accumulated e ⁇ or that is derived from an actual e ⁇ or between co ⁇ esponding blocks of a cu ⁇ ent frame and a predicted frame co ⁇ esponding to the cu ⁇ ent frame.
  • sensor network 12 and compression module 24 interpret meaningful spatial data (still imagery) from the temporal data (video) without resorting to an entirely different image compression method of the spatial data (e.g., JPEG), that would require additional processing resources.
  • compression module 24 simultaneously compresses in real-time the video data and the imagery at different compression ratios. As discussed above, the imagery is compressed at a relatively low compression ratio to preserve the image quality as much as possible.
  • a user views either the video or the high resolution still image, extracted from the video stream in sensor network 12, or both.
  • the still image is meaningful because compression module 24 encodes the I-frames based on changes in the scenes. This encoding method results in a still image that represents the entire scene. Both the insertion of the meaningful I- frame and the integration of video and imagery are discussed in detail below. Additionally, U.S. Application Nos. 09/617,621 filed July 17, 2000, 09/136,624 filed August 19, 1998, and 08/901,832 filed July 28, 1997, incorporated herein by reference, fully disclose data compression relying on meaningful I-frames to reconstruct a scene from a video clip.
  • the temporal and spatial data are transmitted to control module 56 in real-time.
  • sensor network 12 stores the temporal and/or spatial data in flash memory 72, thereby providing the user with transmission data "off-line".
  • flash memory 72 can store approximately 1 minute of uncompressed data (40fps), but implementing even 1001 :1 compression results in being able to store 2000 minutes of data (approximately 33 hours of video).
  • the survivability of a network is defined as the ratio of the number of users a random user can communicate with previous to the failure to the number for a given component failure. Since sensor network 12 is highly redundant, the network survivability coefficient is close to unity. Sensor network 12 is self-healing insofar that emergency communication lines 50 between LAS sensors 14 and emergency communication lines 52 between MAS sensors 16 will replace communication lines 44, 46 and 48. In this regard, the network routing protocol is based on a "mapping" function or lookup table.
  • FIGURE 10 illustrates the transmitter flow of the self-healing process of sensor network 12.
  • a first step 80 data is acquired and packetized in a step 82.
  • Sensor network 12 then establishes a communication channel with a receiving node in a step 84.
  • a data channel is requested from the receiving node in a step 86 and a step 88 tests if an acknowledgment signal is received. If an acknowledgment signal is not received in step 88, the process continues to a step 90 to test if the process timed out. If the process is not timed out yet, control passes back to step 88 to continuously test whether the acknowledgment signal has been received. If, however, the process is timed out in step 90, another receiving node is selected in a step 92 and control passes back to step 84 to establish a connection with a receiving node.
  • sensor network 12 determines in a step 94 if there is sufficient bandwidth available for transmission of the data. If sufficient bandwidth is not available, control passes back to step 84 to establish a communication channel with a receiving node. On the other hand, if sufficient bandwidth is available, the data header is sent in a step 96 and the data is sent to the destination node in a step 98.
  • FIGURE 11 illustrates the receiver flow of the self-healing sensor network 12.
  • the process tests if the data was requested in a step 102. The process does not proceed past step 102 until the data matches the requested data.
  • the communication bandwidth is checked for sending new data. Thereafter, the communication bandwidth limit is communicated to the requesting node in a step 106.
  • the process waits for the data header and in a step 110 the data is received and buffered. The received data is then multiplexed with the acquired sensor data into a transport stream in a step 1 1 1. Finally, in a step 112, the data is transmitted to the destination node.
  • Soft computing differs from conventional (hard) computing in that, unlike hard computing, it is tolerant of imperfections, uncertainty, and partial truth.
  • Soft computing attempts to solve problems that are inherently well solved by people, but not well suited for classical algorithmic solution.
  • the basic processing tools of soft computing include fuzzy logic, neural networks, genetic algorithms, and simulated annealing. Combining these tools or merging them into combinations such as fuzzy neural networks, genetically tuned neural networks, or fuzzified neural networks makes them more flexible and increases their efficiency.
  • These soft computing techniques are efficient in search and optimization problems, especially when the search space is large, multidimensional, and not fully characterized. Standard optimization and search techniques such as steepest descent and dynamic programming fail under the same conditions.
  • the prefe ⁇ ed embodiment of the present invention relies on soft computing techniques (e.g., neuro-fuzzy processing) to solve the problem of fusing together the data from heterogeneous sensors 14, 16 and 18 in sensor network 12. Furthermore, the present invention relies on soft computing techniques (e.g., genetic algorithms) for the intelligent video and still image compression discussed below.
  • soft computing techniques e.g., genetic algorithms
  • the decision whether to insert an I-frame is based on analyzing the e ⁇ ors between the I-frame and the B and P frames into which it will be inserted. The differences are transmitted to a decoder.
  • Motion estimation systems also "skip" a number of frames (intermediate frames) that can be readily estimated because they typically include relatively few motion changes from the previous frame.
  • the hardware for performing I-frame insertion is depicted in block format including a host computer 114 communicating with a video processor board 116 over a PCI bus 118.
  • Host computer 114 is preferably a 500 MHz Pentium ® PC (Pentium ® is a registered trademark of Intel Co ⁇ oration of Santa Clara, California).
  • a PCI bus controller 120 controls communications over PCI bus 118.
  • a memory (EPROM) 122 stores and transfers the compression coefficients to PCI bus controller 120 so all of the internal registers of PCI bus controller 120 are set upon start-up.
  • An input video processor 124 is a standard input processor responsible for scaling and dropping pixels from a frame.
  • Input video processor 124 includes a standard composite NTS signal input 126 and a high-resolution Y/C signal input 128 having separated luminance and chromance signals to prevent contamination.
  • Input video processor 124 scales the 702x480 resolution of the NTSC input 126 to standard MPEG-1 resolution of 352x240.
  • Video processor 124 further includes an A/D converter (not shown) to convert the input signals from analog to a digital output.
  • An audio input processor 130 includes a left input stereo signal 132 and a right input stereo signal 134 that are converted using an A/D converter (not shown).
  • the output from audio input processor 130 is input into a digital signal processor (DSP) audio compression chip 132.
  • DSP digital signal processor
  • the output from DSP 132 is input into PCI bus controller 120 that sends the compressed audio onto PCI bus 118 for communication with host computer 114.
  • ASIC Application Specific Integrated Circuit
  • DTC-based compression chip 136 DTC-based compression chip 136 and a motion estimator chip 138.
  • ASIC 134 is responsible for signal transport, buffering and formatting of the video data from input video processor 124 and controls both the DTC-based compression chip 132 and motion estimator chip 138.
  • Output of ASIC 134, compression chip 136 and motion chip 138 are fed to PCI bus controller 120 for sending the compressed video on PCI bus 118 for communication with host computer 114.
  • the compressed video stream from video processor board 1 16 undergoes lossless compression in host computer 114 using standard lossless compression techniques including statistical encoding and run-length coding.
  • the video and audio are multiplexed using standard methods into a standard video signal.
  • the packets containing the video and audio are interleaved into a single bit stream with proper labeling for synchronization during playback.
  • the e ⁇ ors that are calculated in motion estimator 138 between the cu ⁇ ent frame and the predicted third subsequent frame are transmitted to host computer 114 over PCI bus 118 for transmission to the encoder (not shown) to recreate the cu ⁇ ent frame using that e ⁇ or or difference signal and the motion vectors generated during motion estimation.
  • the e ⁇ or accumulation in the motion estimation procedure includes reading the e ⁇ or buffer in the compression processor 136 through PCI bus 118 at a step 140. Thereafter, in a step 142, that e ⁇ or is accumulated in an error buffer created in software in host computer 114 so that the accumulated error will equal the preexisting e ⁇ or plus the present error. At a step 144, the accumulated e ⁇ or is compared to a threshold e ⁇ or. If the accumulated e ⁇ or is larger than the threshold e ⁇ or, then a new I-frame is sent in a step 146 and the e ⁇ or buffer in the compression processor 136 is not read again for that particular frame.
  • the process loops back up to a step 148 to choose the next subsequent microblock in that particular frame. If there is a subsequent microblock in that frame, then the process continues to step 140 to read the e ⁇ or buffer in compression processor 136. This e ⁇ or is accumulated in the accumulated buffer at step 142 and compared to the threshold at step 144. This iterative looping continues until the accumulated e ⁇ or exceeds the threshold, at which point it is no longer necessary to test any more microblocks for that particular frame because the accumulated e ⁇ or became so high that host computer 114 determined that a new I-frame should be sent to restart the motion sequence.
  • the standard MPEG compression process will continue without modifications (e.g., the next B or P frame will be grabbed and compressed).
  • stereoscopic data consists of multiple channels of input that typically include two sources looking at an object from two points of view. Clearly there is a significant amount of redundant information between the two sources.
  • Another common application of multi-channels is capturing video in a "look-around" environment where multiple cameras are utilized to look at a range of scenery or a designated object, with each camera accounting for one channel of data representing a particular view of the designated object or scene (e.g., from a variety of angles).
  • a single camera may be used to look at a spectral image whereby the signal obtained is divided into separate channels based upon narrow bandwidth windows using filters.
  • hundreds of channels can be realized within a few nanometers.
  • the image data in each such channel contains a tremendous amount of co ⁇ elated data vis-a-vis adjacent channels, each channel co ⁇ esponding to a slightly different bandwidth. It is very inefficient to transmit full video content of each of these channels.
  • data captured by a single source at different times may have a significant amount of co ⁇ elated data, as may be the case when using a video phone from a particular environment to send information over the Internet. For example, if the user transmits a video phone message over the Internet on a subsequent day from the same place as the previous day, mudi of the su ⁇ ounding information will stay the same, and only certain aspects of the transmission will change. Due to the amount of similar data from each of the transmissions, it is inefficient to encode and transmit all the information contained in each message.
  • FIGURE 14 illustrates a system 152 for encoding multiple channels of video data according to the present invention.
  • System 152 includes an encoder 154 with a series of multiple inputs 156 for receiving video data signals Si, S 2 , . . . S N from multiple sources or channels 158.
  • encoder 154 processes the video data signals input from channels 158 in groups comprising a predetermined number of frames from each channel.
  • Firmware 160 is preferably artificial intelligence (Al) fuzzy logic software that controls the encoding process, including determining when I-frames should be inserted.
  • Al/fuzzy logic software achieves high throughput, and consequently higher resolution of the video signals.
  • Encoder 154 further includes compression software 162 for further compressing particular portions of the encoded video data in accordance with standard video compression techniques, such as MPEG intra-frame video data compression. This additional level of data compression enhances efficient use of available bandwidth without sacrificing video quality.
  • Intelligent and interactive buffer management of a buffer 164 in encoder 154 and a parallel buffer 166 in a decoder 168 forms a bridge between the highly synchronized real-time video stream and a asynchronous network 170.
  • encoder 154 after encoding at least a portion of the video data from each channel 158, encoder 154 transmits the resultant signals, in an appropriate sequence, to receiver/decoder 168 that includes firmware 172 to recreate the video images. After reconstructing the video images of each channel 158 based on the encoded transmitted signals, decoder 168 transmits the decoded signals to a display unit 174.
  • the number of skipped frames is typically dependent upon the type of video being compressed such that for a high action video, wherein the differences between successive frames of each channel are relatively many, fewer frames should be skipped because there is a higher risk that significant data may be lost which may compromise video quality.
  • Predicted "B" frames for each channel co ⁇ espond to the number of skipped frames, the B frames being filler frames for the "skipped" frames.
  • An intra-frame difference exceeding a threshold can trigger an I-frame, as can an initial inter-frame difference or a predicted inter-frame difference.
  • Encoder/transmitter 154 of FIGURE 14 preferably uses parallel processing such that while earlier encoded channels of data are being transmitted, the subsequent channels of video data are being encoded. Furthermore, although the method has been specifically described as encoding and transmitting one channel of data at a time, the channels can be encoded/transmitted in pairs to increase throughput.
  • network system 10 homogenizes sensor data from a variety of sensors 178 that includes, for example, a first video camera 180, a second video camera 182, an infrared sensor 184, a seismic sensor 186 and an imaging radar 188.
  • the homogenized and packetized data is transmitted to a video encoder and a sensor data formatting module.
  • data from camera sensors 180 and 182 is transmitted over communication lines 194 and 196, respectively, to encoder 190.
  • data from IR sensor 184, seismic sensor 186 and imaging radar 188 is transmitted over communication lines 198, 200 and 202, respectively.
  • a compression processor 204 is an application-specific integrated circuit (ASIC) board (e.g., PCMCIA packaging drawing ⁇ 0.5 W) configured for supercomputer-grade 8 BOPS video processing per non-local sensor.
  • ASIC application-specific integrated circuit
  • Encoder 190 further includes a buffer 206 configured to synchronize data over an asynchronous local area wireless network 208 to a buffer 210 in a decoder 212.
  • Encoded data from video encoder 190 and formatted data from formatting unit 192 is transmitted to local area wireless network 208 via a bus 214 and a bus 216, respectively.
  • Data from instrumentation controls 218 is also transfe ⁇ ed via a data transfer unit 220 to local area wireless network 208 via a bus 222.
  • Asynchronous spread spectrum network 208 transmits a synchronous stream of interactive video data to a display 224 and an instrumentation module 226 via decoder 212 over a bus 228 and 230, respectively.
  • Local area wireless network 208 is also coupled to a remote- network 232.
  • Processor 204 integrates video with high-quality still imagery, thereby providing a user with temporal data (e.g., video) and spatial data (e.g., still imagery) through separate channels on bus 228.
  • a user can display both the video on a screen 234 and/or the still images on a separate screen 236.
  • a simplified asymmetric network example of sensor network 12 includes a series of cameras 238 coupled to a series of processors 240 using a 64 kbps or 128 kbps channel including a high-speed trunk line (2 Mbps).
  • Each processor 240 is further coupled to a series of displays 242 and a server 244.
  • FIGURE 17 illustrates a video encoder 246 for a single channel example that includes an image compression ASIC 248 and a motion estimation ASIC 250 coupled to a pair of SD RAM (256x32) 252 and 254, respectively.
  • Composite and S-Video are coupled to video processor 256 via a series of low pass filters 258.
  • a buffer 260 is coupled to SD RAM 252.
  • Video processor 256 communicates with a PCI interface 262 via a bus 264.
  • an address decoder 266 communicates with PCI interface 262 via a bus 268.
  • a buffer 270, a buffer 272, a buffer 274 and a set of fractional TI controllers 276 communicate over a PCI bus 278.
  • a data module 280 is coupled to buffer 270 through a data interface 282. Audio is transmitted to an A/D converter 284 coupled to buffer 272 through an analog device unit 286. Similarly, a SRAM 288 is coupled to buffer 272 through analog device 286.
  • a video logic control unit 290 also communicates over bus PCI 278. The decompressed data is transmitted from buffer 274 to TI controller 276 through an MPEG data FIFO unit 292 down to a line interface unit 294 and out to a network 296.
  • a decoder 298 is illustrated in FIGURE 18.
  • Data from network 296 is transmitted through a line interface unit 299 to a network controller 300 coupled to a C51 controller 302.
  • the data from the spread spectrum wireless network is processed by a chip set including an MPEG video decoder 304 and a video processor 306.
  • Video overlay graphics 308 are superimposed on the video interconnection before it is displayed.
  • Controller 300 transmits the data to a data FIFO queue 310 and then through an MUX 312 to decoder 304.
  • a PCI bus 314 is coupled to a 16-bit register 316 and video processor 306.
  • Register 316 is also coupled to decoder 304 through MUX 312.
  • Decoder 304 transmits the audio data to an MPEG audio decoder 318 that is coupled to an audio D/A converter 320.
  • the RGB signal from processor 306 is transmitted to a video D/A converter 322 and then to a display 324.
  • Compression processor 204 is configured for approximate real-time processing, computing, compression and transmission of multimedia data.
  • network 10 there is a strong co ⁇ elation between software and hardware design, leading to minimizing processing overhead, thereby maximizing transmission speed and efficiency.
  • the processing speed of compression processor 204 is maximized at the expense of computing generally.
  • the processing speed of compression processor 204 is equivalent to 100 Pentiums, while the actual computing performed by processor 204 is restricted to only simple arithmetic operations.
  • network 10 in general and compression processor 204 in particular provide image processing, editing, or even doctoring in real-time, thereby creating an illusion that video about natural events is transmitted while, in fact, these events can be doctored "on the fly.”
  • processor 204 due to imagery transmission in only approximate or statistical form (yet tolerable to the human eye) and, therefore, significant data reduction (or compression), processor 204 provides imagery/video processing/transmission with a minimum transmission delay (or latency).
  • processor 204 relies upon highly specialized chip-set integrated circuit (IC) electronics that minimizes processing overhead and provides selected operations with supercomputer-type speed (e.g., speeds equivalent to 100 Pentium Is or 20 Pentium ⁇ is).
  • IC integrated circuit
  • compression processor 204 may be a compact 2" x 3" PC-board or a fully-upgradable PCMCIA-card with minimum power consumption ( ⁇ 1W) despite 8 BOPS (eight billion-operations-per-second) processing.
  • Compression processor 204 primarily processes synchronized temporal data including video events and spatial data including high-resolution still imagery.
  • Processor 204 also processes other multimedia data (e.g., audio sensor and data). Due to the high co ⁇ elation (or high redundancy) of the data, the potential compression is also high. Therefore, in spite of the fact that original data rates of video, audio, and data are drastically different, the compression ratios are also very different, thereby resulting in comparable data rates for all three media as follows.
  • compression processor 204 transmits and stores both the temporal sequence of events and the precise spatial structure of each important scene that has been recorded in order to preserve any unanticipated critical event for further analysis.
  • video camera 178 may record all individuals crossing the border from Mexico into California on a given day of the week.
  • Government agents e.g., FBI, DEA, etc.
  • an agent is alerted to the fact that a particular individual crossed the border illegally on a certain date.
  • the agent accesses the particular video from the specific date co ⁇ esponding to camera 178 to determine if there is any evidence of the individual illegally crossing the border.
  • a practical solution that is feasible using compression processor 204 is to use image compression ASIC 248 to balance redundancies in the space-time domain, thereby allowing a user to view a past image on display 224 via a separate channel.
  • this particular still image is compressed at a lower ratio (e.g., 40: 1) than the I-frames in the MPEG stream, so that the image contains enough data to adequately reconstruct the scene that the user is interested in.
  • This still image is meaningful because image compression ASIC 248 compressed the MPEG video stream by modifying the standard MPEG compression algorithm and inserting an I- frame as needed to co ⁇ espond with a scene change in the video.
  • scene changes can result from a variety of conditions including a change of a video clip (e.g., a movie), sudden camera movement, sudden object movement, sudden noise, etc.
  • motion estimation is important to compression because many frames in full-motion video are temporally co ⁇ elated (e.g., a moving object on a solid background such as an image of a moving car will have high similarity frame to frame).
  • Efficient compression can be achieved if each component or block of the current frame to be encoded is represented by its difference with the most similar component, called the predictor, in the previous frame and by a vector expressing the relative position of the two blocks from the cu ⁇ ent frame to the predicted frame. The original block can be reconstructed from the difference, the motion vector and the previous frame.
  • the frame to be compressed can be partitioned into microblocks which are processed individually.
  • microblocks of pixels for example 8x8, are selected and the search for the closest match in the previous frame is performed.
  • the mean absolute e ⁇ or is typically used because of the trade-off between complexity and efficiency.
  • the search for a match in the previous frame is performed in a, for example, 16x16 pixel window for an 8x8 reference or microblock.
  • a total of, for example, 81 candidate blocks may be compared for the closest match. Larger search windows are possible using larger blocks 8x32 or 16x16 where the search window is 15 pixels larger in each direction leading to 256 candidate blocks and as may motion vectors to be compared for the closest match.
  • the standard methods provide that the e ⁇ or between a microblock in the cu ⁇ ent frame and the co ⁇ esponding microblock in the predicted frame are compared and the e ⁇ or or difference between them is determined. This is done on a microblock by microblock basis until all of the microblocks in the cu ⁇ ent frame are compared to all of the microblocks in the predicted frame. In the standard process, these differences are sent to the decoder real time to be used by the decoder to reconstruct the original block from the difference, the motion vector and the previous frame. The e ⁇ or information is not used in any other way in the prior art.
  • the e ⁇ or or difference calculated between microblocks in the cu ⁇ ent frame and the predicted frame are accumulated or stored and each time an e ⁇ or is effectively calculated in real-time between a microblock in the cu ⁇ ent frame and the co ⁇ esponding microblock in the predicted frame, that e ⁇ or is accumulated to the existing e ⁇ or for the frame.
  • This method is MPEG-compatible and results in high-quality video images.
  • the accumulated error is used by comparing it to a threshold that is preset depending upon the content or type of the video such as action, documentary or nature. If the threshold for a particular cu ⁇ ent frame is exceeded by the accumulated e ⁇ or, this means that there is a significant change in the scene that wa ⁇ ants sending an entire new I-frame.
  • an I-frame is encoded.
  • E. COMPRESSING SPATIAL I-FRAME The temporal data displayed on screen 234 is transmitted through a first communication channel with identical data throughputs as the spatial data displayed on screen 236 that is transmitted through a second digital communication channel.
  • the temporal data transmitted through the first communication channel includes a highly compressed MPEG stream (approximately 4000:1), while the spatial data transmitted through the second communication channel includes compressed I-frames only that have been compressed at a ratio significantly lower than the MPEG stream in order to ensure high image quality.
  • the spatial signals are compressed using a low compression ratio, (CI)o.
  • the value of (CI)o is significantly lower than the value of (CI) wherein CI is the compression ratio of an I-frame in the synchronized I-frame MPEG cycle.
  • the average I-frame cycle contains N-number of frames, including a single I-frame (mother frame) and N-l frames (daughter frames), with an average compression ratio of CD.
  • Compression processor 204 is configured to determine the following:
  • Compression processor 204 computes the average compression ratio of the N- frame synchronized cycle based on: _
  • Equation (1) then reduces to:
  • CD is the compression of the data frame and CI is the compression ratio of an I-frame in the synchronized I-frame MPEG cycle.
  • the average compression ratio of the N-frame synchronized cycle is illustrated for an original frame 326, a compressed I-Frame corresponding to a significant event 328, a compressed I-Frame (MPEG) 330, and a series of daughter frames (MPEG) 332.
  • Compression processor 204 further computes the following ratio:
  • compression processor 204 calculates the following ratio:
  • the above table presents equivalent values for the I-frame still imagery, given the ⁇ coefficient as a function of the k coefficient.
  • the ⁇ coefficient is the compression ratio of the I-frame in the MPEG encoding using meaningful I-frame insertion (CI) to that for still imagery (CI)o with the same channel bandwidth.
  • CI I-frame insertion
  • the compression ratio of the I-frame (CI) can be high due to high stationary JPEG compression, reducing of pixel resolution, and reducing of color contrast.
  • Network 10 further includes fuzzy logic automatic control of the frame global e ⁇ or evaluation.
  • compression processor 204 computes the normalized global e ⁇ or (GE) based on:
  • N is the number of pixels in the frame (e.g., 640 x 480 for VGA standard), and di and dio are RGB-pixel gray levels for a given I-frame and reference frame.
  • compression processor 204 further calculates:
  • compression processor 204 continues the I-frame cycle, whereas in FIGURE 22 the standard I-frame cycle of FIGURE 21 is discontinued if (GE) ⁇ T and a new I-frame 334 is inserted.
  • Compression processor 204 statistically evaluates the global e ⁇ or (the difference between an I-frame and any other frames in the I-frame synchronized frame subset) or any block e ⁇ or within the I-frame synchronized frame subset.
  • the I-frame synchronized frame subset is fixed in length so that an I-frame 336 is inserted every 15 frames regardless of frame content.
  • the I-frame synchronized frame subset is variable in length (e.g., not always 15 frames as in state-of-the-art MPEG-1, but less or more than 15 depending on the duration of the scene).
  • compression processor 204 evaluates in real-time the global e ⁇ or of a given frame. If the e ⁇ or is too high, new I-frame 334 is generated, thereby starting a new I-frame synchronized frame subset 338. New I-frame insertion and consequent new motion estimation can be increased not only by overall motion within the entire frame, but also by motion in a selected region of the frame such as a particular target.
  • the quality of the highly-compressed (4000:1) image using intelligent I-frame insertion may be even better than the quality of a standard MPEG image compressed at a relatively low compression ratio because of the elimination of artifacts 340 using intelligent I-frame insertion.
  • FIGURE 23 illustrates the frame cycle associated with the transmission of spatial data on the second channel.
  • (CI)o is the compression ratio of an I-frame 342 that is compressed at a lower ratio than the compression ratio (CI) of I-frame 334 in the MPEG frame cycle of FIGURE 22 that synchronizes the insertion of the I-frames with the scene changes.
  • compression processor 204 implements the soft computing algorithm to implement the insertion of a meaningful I-frame at a scene change in a video stream and includes processing power on the magnitude of 8-10 BOPS.
  • network system 10 supports original, high-quality, full-motion color video (NTSCNGA- 221 Mbps) at a compression ratio of 3946:1 (a ten- fold increase over the state-of-the-art compression schemes).
  • a bandwidth of 56 kbps is supported with time-multiplexed control data.
  • a bandwidth of 112 kbps is supported for 3-D.
  • a bandwidth of just a few kbps is likewise supported for extremely low bandwidth applications similar to "cartooning".
  • a 6-times factor is obtained directly from the real-time frame evaluation of compression processor 204 when the I-frame is generated only if needed (when the actual accumulated e ⁇ or exceeds a particular threshold, thereby indicating movement co ⁇ esponding to a change of scene).
  • these software agents are preferably applied by compression processor 204 to reduce bandwidth requirements associated with the compression of (1) the meaningful I-frame MPEG stream; and (2) the meaningful I-frames extracted from the MPEG stream in (1) and compressed at a lower ratio to preserve image quality.
  • intelligent processes or software agents are species of Artificial Intelligence (Al) and include applications that assist computer users to use the tools available on the computer system and exhibit human intelligence and behavior such as robots, expert systems, voice recognition, natural and foreign language processing.
  • Al Artificial Intelligence
  • Digitization of video transmission opens up a new area of applications that not only apply to compression into low-bandwidth communication, but also can restore some mature pattern recognition techniques, such as template matching.
  • the state-of-the-art compression systems have abandoned template matching because it cannot be performed in real-time without the processing power found in compression processor 204.
  • An analog of Fourier processing includes a cross-co ⁇ elation operation comparing two identical format frames with some translation or rotation about the center of gravity. If the two frames are identical, then for a particular translation and rotation, there is an exact match, otherwise a match will not exist.
  • aij- is the vector component
  • the M-number of cross-co ⁇ elation analysis is in the form:
  • the outlined area in the above-identified table defines the total times of practical applicability for purely electronic real-time template matching implemented in network 10, simulating the cross-co ⁇ elation operation.
  • This template matching is an example of an alternative embodiment of the present invention as applied to object recognition and targeting instead of video compression.
  • VGA 640x480, 30fps, 24bpp
  • VGA 640x480, 30fps, 24bpp
  • some platforms provide a maximum bandwidth of 16 kbps.
  • FIGURE 24 An automatic selection surveillance process control for temporal and spatial events through a single low-bandwidth channel is disclosed in FIGURE 24.
  • a user is given a choice in a step 346 whether to operate in autonomous mode. If the user selects not to operate autonomously, control passes to a step 348 for the user to select:
  • a communication channel is established in a step 350. If, however, the user selects autonomous operation at step 346, the default values for the compression parameters are used in a step 352 and two-stream connection of the temporal and spatial data is requested at a step 354.
  • the quality of the transmission is compared against a threshold based on the bit e ⁇ or rate (BER). In general, a 10 '4 BER is the maximum acceptable e ⁇ or level for minimum artifacts with a compression ratio of 4000:1 as described above in connection with meaningful I- frame compression. If the transmission quality is not sufficient, additional e ⁇ or co ⁇ ection is applied at a step 358.
  • BER bit e ⁇ or rate
  • bit selection e ⁇ or co ⁇ ection module 26 applies bit- selective (BSEL) e ⁇ or reduction based on the observation that in the meaningful I-. frame compression there are some bits (e.g., I-frame bits and synchronization control bits) that are more important than other bits.
  • BSEL bit- selective
  • control bits are added to co ⁇ ect the e ⁇ or. In this case, if one bit changes, the sum becomes uneven and the e ⁇ or, in addition to the location of the e ⁇ or, is detected.
  • bit selection e ⁇ or co ⁇ ection module 26 applies an additional level of e ⁇ or co ⁇ ection on the transport layer.
  • e ⁇ or co ⁇ ection is performed on the packets prior to the packets being transmitted over the network.
  • the e ⁇ or co ⁇ ection applied by module 26 on the transport layer is in addition to the e ⁇ or co ⁇ ection that is performed on the physical layer inherent in the IP protocol.
  • the bits arrive at module 26 already having been subjected to the IP e ⁇ or correction in the physical layer.
  • Module 26 applies additional e ⁇ or co ⁇ ection to the significant bits to reduce the significant bits to a level of 10 '5 BER while the other bits are left at a level of 10 '3 BER, or reduced only to 10 '4 BER.
  • the significant bits are compressed by two orders of magnitude and the other bits are reduced by one order of magnitude, or not reduced at all.
  • E ⁇ or co ⁇ ection methods that are applied by module 26 include, as a simple example, traditional Hamming codes developed by Richard Hamming that are based on algebraic ideas originating in the number- theoretic research of Gauss in 1801.
  • the method can also be applied to more complex codes such as the Reed-Solomon (RS) code that can co ⁇ ect multiple e ⁇ ors and bursts of e ⁇ or based on finite fields and associated polynomial rings.
  • RS Reed-Solomon
  • Module 26 uses the following table to determine how much bandwidth is required and an optimal size of the MTU-internal packets to significantly reduce BER by applying, for example, Hamming codes (one e ⁇ or plus position).
  • module 26 applies bit selective e ⁇ or correction to the data
  • the co ⁇ ected data is retested at step 356 until the BER > threshold. Thereafter, at a step 360, data is continuously transmitted until a termination step 362.
  • communication bridge 30 between sensor network 12 and network 38 includes QoS manager 34 and intelligent and interactive buffer manager 36.
  • bridge 30 is responsible for ensuring the delivery of the highly-synchronized multimedia streams despite the inherently asynchronous "on- demand" nature of networks.
  • the fundamental challenge of transmitting of real-time multimedia over the Internet is to create such a bridge to dynamically adjust timing and buffering network parameters to guarantee that the synchronized data as sent is synchronously received despite network latency (e.g., routers and switch delay) and prioritization problems.
  • Communication bridge 30 in network system 10 creates modifications in many of the standard layers in the stack including:
  • TCP/IP sockets subprotocol is extended by incorporating intelligent and interactive buffering.
  • bridge 30 relies on MicrosoftTM QoS emulation and an Application Program Interface (API) configured to implement a circular buffer that intelligently and interactively buffers multimedia data based on network conditions.
  • API Application Program Interface
  • buffer sizes are statistically predicted based on standard models.
  • the maximum allowable packet size is configurable by the operating system.
  • the MTU is set at 1500 bytes (e.g., sent from encoder 190 to the receiver at 1500 bytes and decoder 212 receives in 2k packets) to minimize network latency.
  • transmission between encoder 190 and decoder 212 is controlled by network interface cards (NIC) (e.g., PCMCIA wireless LAN card IEEE 80211) having encoder buffer 206 and decoder buffer 210.
  • NIC network interface cards
  • Standard transmission rates are 64 kbps to 1.5Mbps at 29.97fps (standard for NTSC) with image size SIF(320x240) (standard MPEG).
  • buffer manager 36 also includes variable rate allocation to manage network bandwidth based on priorities of different users and processes.
  • each user is allocated 1 Mbps of bandwidth, which in implementation is actually closer to 700kbps.
  • 700kbps typically, only about 70% of bandwidth is useable given e ⁇ ors, collisions between packets and buffering and waiting time. If one user needs 1.6 Mbps but is only allocated 700kbps, while a different user is allocated 700kbps but is only using 300kbps, buffer manager 36 "negotiates" the process to allocate the excess bandwidth from the second user to the first user.
  • Global Smoothing is a time-based method to determine network bandwidth optimal operating parameters. Statistics are gathered over a particular period of time (e.g., 1 month) based on bandwidth usage co ⁇ elated to time, and statistical optimization methods provide manager 36 with optimal operating bandwidth parameters. Similarly, using local smoothing, manager 36 creates a window of statistics based on a minimal time delay.
  • a circular (ring) buffer 364 is implemented in RAM memory (encoder buffer 206 and decoder buffer 210).
  • Buffer 364 is a permanently allocated buffer with a read and a write position. The read position will never be greater than the write position, since if it were then the program would be reading data that had not been written to the buffer. Likewise, the write position will never wrap the read position, otherwise the data that had not been read would be overwritten. Since buffer 364 has no "ending" element, it is important during iteration to avoid infinite loops. In the present invention, these infinite loops are avoided by maintaining a count of the number of list entries and controlling the loop by measuring the number of values scanned.
  • An alternative approach includes using a special, recognizable node in the linked list as a sentinel. The sentinel is never removed from the list. Iteration will check at each increment, and halt when the sentinel node is encountered.
  • Another variation includes maintaining links at each node that point both forward and backward. This a ⁇ angement results in a doubly-linked circular list that includes maintenance of two links in order to discover both the predecessor and successor of any node, thereby making insertions and removals possible with only a pointer to a single node instead of two pointers required when defining operations using iterators.
  • FIGURE 26 illustrates a different view of circular buffer 364 residing in a memory 366.
  • Buffer manager 36 in bridge 30 compensates for asynchronous network 10 over which synchronous data streams are sent by implementing circular buffer 366 to accommodate for latency and timing complications.
  • buffer manager 36 "tunes" the timing between a write process 368, and a read process 370 in memory 366.
  • Memory 366 is divided into 2k increments 372.
  • FIGURE 27 illustrates the encoding process including the flow from an input video block to a video encoder block 376 to a video MPEG compression block 378, and then to a buffer in main memory (DRAM) 380.
  • a network module block 382 includes a software driver to manage a circular network buffer 384 connected to a network block 386.
  • FIGURE 28 illustrates the decoding process including a network interface block 388 connected to a circular network buffer (DRAM) 390 that is connected to a decoding block 392 and then to a display 394.
  • FIGURE 29 illustrates the write implementation of circular buffer 364 including data from a block 396 sent to a block 398 to test if circular buffer 364 is available. If buffer 364 is not available, the process waits. As soon as buffer 364 is available, data is written to buffer 364 in a step 400 and the total size count of the buffer is incremented at a block 402. Thereafter, control passes back to start block 396.
  • the read implementation of circular buffer 364 includes testing if buffer 364 is available in a step 404. If buffer 364 is not available, the process waits. As soon as buffer 364 is available, data is read from buffer 364 in a step 406 and the co ⁇ esponding memory is unallocated (released) in a step 408.
  • manager 36 determines for buffer 364 that are forced from the server on each of the nodes.
  • Video on the Internet is transmitted asynchronously so that it is chopped based on frames arriving earlier and later with a variable delay.
  • Buffer 364 is configured to maintain a constant stream, thereby maintaining a consistent timing of delivery of the frames in the stream.
  • each user on the network may have a different priority and a different video rate.
  • the parameters controlling distributed transmitter and receiver buffers 206 (encoder) and 210 (decoder) are modified by manager 36 to force a different transmission rate or priority on particular sensors (e.g., a camera) and dynamically adjust the width of buffers 206, 210.
  • manager 36 dynamically adjusts buffers 364 and transmission rates for a set of "priority" cameras so that network system 10 does not overload.
  • some sensors (cameras) located in key locations are assigned a higher priority and more network resources (transmission bandwidth) than sensors (cameras) that are located in less strategic locations.
  • the intelligent buffering is combined by manager 36 with QoS manager 34
  • QoS manager 34 engages in a negotiated process to maintain thresholds (e.g., certain cameras will not drop below a certain bandwidth like 512kbps). For example, transmission rates on priority cameras may be in the range of 512kbps - 1Mbps, and the non-priority cameras may have transmission rates in the 64 kbps - 512kbps range. If the network 10 is not loaded, then the non-priority cameras will transmit at full capacity (512kbps), and manager 36 "tunes" or optimizes buffer 364 to this 512kbps bandwidth.
  • buffer manager 36 adjusts the variety of buffering parameters to accommodate the load while maintaining synchronous transmission of the multimedia data stream based on optimization methods.
  • QoS manager 34 negotiates each process through network 10 by dynamically adjusting bandwidth, latency, jitter and loss parameters depending on the operating load of the network and the priority of the data.
  • the bandwidth parameter is the rate at which an application's traffic must be carried by the network.
  • the latency parameter is the delay that an application can tolerate in delivering a packet of data.
  • the jitter parameter is the variation in the latency and the loss is the percentage of lost data.
  • the network interface cards in encoder 190 and decoder 212 are physical connections to network 10.
  • the interface cards receive data from the PC, process the data into an appropriate format, and transmit the data over a cable to another LAN interface card.
  • the other card receives the data, translates the data into a form the PC understands, and transmits the data to the PC.
  • NIC 190 and 212 The role of NIC 190 and 212 is broken into eight tasks: host-to-card communications, buffering, frame formation, parallel-to-serial conversion, encoding and decoding, cable access, handshaking, and transmission and reception.
  • the first step in transmission is the communication between the personal computer and the network interface card.
  • I/O I/O
  • direct memory access I/O
  • shared memory shared memory
  • NTCs 190 and 212 also form frames - the basic units of transmission that include a header, data and trailer.
  • the header includes an alert to signal that the frame is on its way, the frame's source address, destination address, and clock information to synchronize transmission. Headers also include preamble bits used for various purposes, including setting up parameters for transmission and a control field to direct the frame through the network, a byte count, and a message type field.
  • the trailer contains e ⁇ or checking information (the cyclical redundancy check (CRC)).
  • CRC cyclical redundancy check
  • the receiving card answers with its parameters. Parameters include the maximum frame size, how many frames should be sent before an answer, timer values, how long to wait for an answer, and buffer sizes.
  • Parameters include the maximum frame size, how many frames should be sent before an answer, timer values, how long to wait for an answer, and buffer sizes.
  • the card with the slower, smaller, less complicated parameters always wins because more sophisticated cards can "lower” themselves while less sophisticated cards are not capable of "raising” themselves.
  • QoS manager 34 (FIGURE 5) works in parallel with buffer manager 36 to optimize and dynamically adjust network parameters based on network load and prioritization. As priorities and transmission rates vary, a variety of parameters including frame resolution, frame rate, color depth, frame drop out frequency, and buffer width are adjusted to optimize the new bandwidth rate.
  • Manager 36 intelligently monitors the load of the network and dynamically calculates optimal operating parameters for buffers 364. This is a multidimensional optimization problem because each parameter is a dimension in the optimization space and the parameters are nonlinearly linked (e.g., reducing the frame rate does not necessarily mean that you are going to reduce the data rate by the same factor).
  • QoS manager 34 applies a function (e.g., a genetic algorithm) to find the global minimum of the multifunction. If the continuous functions are not well-defined, then a fuzzy neural network teaches the network to recognize various cases.
  • QoS implementation in an ATM network is fundamentally different than an implementation of QoS in the Internet because ATM is connection-based so there is a pipe from the client to the server to guarantee the sending of a certain bandwidth.
  • connections are emulated.
  • QoS manager negotiates bandwidth based on prioritization on the network using QoS schemes (e.g., Microsoft QoS emulation) to give parity to real-time video transfer so that the server emulates packets and guarantees bandwidth.
  • QoS schemes e.g., Microsoft QoS emulation
  • system network 10 including sensor network 12 is based on integrating fuzzy logic with neural networks to automate the tasks of sensor data fusion and determining alarms.
  • One application of using the sensor fusion in sensor network 12 described above is in security systems having a number of camera sensors monitoring activity and transmitting surveillance data to control module 56 (e.g., tracking objects along the U.S./Mexican border and determining whether individuals are illegally crossing the border).
  • sensor fusion in network 12 includes the capability to adapt dynamically to changing external conditions, perform real-time tracking of objects using camera sensors and recognize objects based on intelligent neurofuzzy sensor control.
  • the merged multisensor data is converted into knowledge and used to interpret whether an alarm condition exists (e.g., illegal border crossings).
  • system network 10 has a low false alarm rate due to intelligent multisensor data fusion and includes remote activation, manual control of each sensor 14, 16 or 18, real-time remote sensor information retrieval, and on-line programming of system 10 through a graphical interface.
  • Some prior art security systems are based on neural networks that cannot adapt to sensor failures within a sensor suite and require extended training, a large amount of memory and powerful processors, thereby significantly increasing the complexity of the system. If the number of sensors is large, the capability to dynamically adapt to changing external conditions is particularly important.
  • N Ni, N 2 , N 3 , ... Nn
  • the prefe ⁇ ed embodiment of the present invention avoids the computational overhead associated with this implementation of a neural network by implementing a fuzzy logic-based functionality evaluator that detects sensor failures in real-time and a fuzzy logic generator that fuzzes the large number of neural networks into a manageable number of groups. Additionally, an intelligent decision aid based on a neurofuzzy network automates and assists in deciding whether an alarm is true or false.
  • Sensor network 12 includes a functionality evaluator to determine the functionality of each sensor 14, 16, and 18 in real-time so that any malfunctioning sensors are eliminated from multisensor network 12, ignoring worthless or wrong data.
  • the evaluator is based on fuzzy logic and includes the following steps:
  • tracking sensor network 12 may include numerous red-sensitive CCD cameras as sensors. Five membership functions are be used for the input gray scales (very high, high, medium, low and very low). Three membership functions are used to evaluate the output credibility of each CCD camera (high, medium and low). Based on the following fuzzy rules, the output for the input universe of discourse is generated. Using fuzzy logic, the outputs for all possible inputs are generated in advance, and the outputs are saved as a lookup table for use in real-time operation. IF (gray level is very high), THEN (credibility is low). IF (gray level is high), THEN (credibility is low). IF (gray level is medium), THEN (credibility is medium). IF (gray level is low), THEN (credibility is high).
  • the fuzzy weight generator automatically groups the alternative neural networks into clusters based on the co ⁇ elation among the neural networks. Therefore, all of the neural networks belonging to a given cluster are fuzzed into a single fuzzy neural network, replacing the large number of neural networks otherwise required.
  • the large number of neural networks that co ⁇ espond to each sensor failure situation are grouped into clusters based on the co ⁇ elation among the neural networks. Therefore, all of the neural networks belonging to a given cluster are fuzzed into a fuzzy neural network, replacing the large number of neural networks otherwise required.
  • Unsupervised competitive learning is preferably implemented by sensor network 12 because the desired class labels among the available neural networks that co ⁇ espond to each sensor failure situation are unknown. Given some input sample data, the output neurons of a competitive network compete amongst themselves for activation, and only one output neuron can win the competition at any time (e.g., "winner take all").
  • each fuzzy neural network is trained with its sample data using modified learning vector quantization.
  • the objective of this quantization is to find the set of reproduction vectors that represents an information source with the minimum expected "distortion".
  • the learning vector quantization is especially powerful in pattern recognition and signal processing applications.
  • sensor network 12 that is configured to track a moving object also includes an intelligent decision aid 412 to process and integrate all of the information from heterogeneous sensors 14, 16 and 18 in a reasonable amount of time.
  • intelligent decision aid 412 integrates a neural network-like structure that adaptively generates fuzzy logic rules.
  • initial drafts of fuzzy rules are prepared by experts and mapped to the location and curve of membership functions to build a fuzzy rule base 414.
  • a neural network 416 tunes the location of the membership functions to optimize the performance of the fuzzy rules.
  • a set of training data 418 is submitted to both rule base 414 and neural network 416.
  • Neural network 416 presents the initial fuzzy membership functions to fuzzy rule base 414. Fuzzy rule base 414 then generates an actual output, which is compared with the desired output contained in training data 418.
  • the training algorithm changes the neural network weights, thereby adjusting the membership functions.
  • the new functions are then presented to rule base 414, and the process repeats until the difference between the actual and desired outputs is minimized. Additional sets of training data are iteratively applied until the final membership function parameters, and therefore the membership function shapes, converge to their final values. With membership functions defined in this manner, the operation of fuzzy rule base 414 closely mimics the operation represented by training data 418.
  • Video motion detection is based on analog or activity detection. Analog detectors respond to changes in the video output level from a camera, but any slight change in the video level often triggers a false alarm (e.g., blowing leaves or rain). Activity detection is a form of video motion detection using limited digital technology. Instead of detecting the change in video levels, activity detection discerns changes in individual pixels or cells of a video image. On the contrary, sensor network 12 is an actual digital motion detector insofar that network 12 determines the size, direction and number of objects within a particular environment.
  • a tracking system 420 based on sensor network 12 includes a motion detection sensor module 422, a wide field of view (FOV) camera 424 coupled to a wide FOV channel 426, a na ⁇ ow FOV camera 428 coupled to a na ⁇ ow FOV channel 430, and a transmitter 432 coupled to control module 56.
  • FOV wide field of view
  • cameras 424 and 428 are digital cameras.
  • cameras 424 and 428 are analog cameras and tracking system 420 further includes the additional interface elements co ⁇ esponding to analog cameras 424 and 428 (e.g., A/D converters, etc.).
  • Tracking system 420 includes two stationary "fish eye” cameras 424 with each camera having a field of view of 180° coupled to two na ⁇ ow FOV cameras 428 that are capable of tracking multiple objects using time division multiplexing (e.g., dividing tracking time co ⁇ esponding to a single camera between multiple objects).
  • the polar coordinates of objects based on "fish eye” cameras 424 are converted to Cartesian coordinates by preprocessing the data provided to channel 426.
  • Each narrow FOV camera 428 has tilt/pan capabilities that are controlled via a RS232 connection using the Video System Control Architecture (VISCA) protocol (up, down, left, right, zoom in, zoom out).
  • VISCA Video System Control Architecture
  • the VISCA message packet has a packet header in the first byte containing the source address and the destination address and a terminator in the last byte (FF).
  • FF terminator in the last byte
  • the maximum message is 14 bytes.
  • Tracking system 420 is in "sleep mode" while active wide FOV cameras 424 remain active to detect any potential objects that may be of interest to control module 56.
  • Each of the cameras 424 and 428 are IP-addressed in sensor network 12, thereby allowing a user to inte ⁇ ogate each camera over user network 38 (e.g., the Internet). Control commands are also taken from users over network 38 to control sensors (cameras) 424 and 428.
  • a series of motion detectors are placed along the border, thereby allowing wide FOV cameras 424 to remain in a "sleep mode" along with na ⁇ ow FOV cameras 428 until one of the motion detectors indicates a need for system 420 to "wake up".
  • a "bug eye” sensor replaces the motion detectors to alert wide FOV cameras 424 to "wake up” based on activity along the border.
  • the "bug eye” sensor is a directional sensor having numerous small, nonimaging, optical elements placed on the hemisphere of the sensor with a specific separation between each element.
  • the co ⁇ esponding element is activated and tracking system 420 receives directional data from sensor (camera) 424.
  • This sensor is based on a nonimaging optic multiaperture compound eye architecture satisfying the Liouville theorem.
  • a series of motion detectors with wireless transmitters are distributed among na ⁇ ow FOV cameras 428.
  • tracking system 420 "wakes up” and moves na ⁇ ow FOV cameras 428 to the coordinates of the activated motion detector, thereby eliminating the use of wide FOV cameras 424 to determine the coordinates of a particular object.
  • motion detection sensor module 422 an object in an input scene 434 is detected in a sensor block 436 based on a motion detector detecting movement of the object or a "bug eye” sensor detecting a change in intensity. If sensor 436 is a "bug eye” sensor, an object detection sensor 438 is triggered to indicate an object has been detected in input scene 434.
  • a neurofuzzy processor 440 detects the direction of the detected object and the co ⁇ esponding coordinates of the object are calculated in a block 442 and provided to channel 426.
  • Wide FOV camera 424 may include a data formatter 444 and/or a data digitalization block 446 depending on whether camera 424 is an analog or digital camera.
  • camera 424 is a digital camera.
  • the video from camera 424 is transmitted to a single frame buffer 448 coupled to a camera platform motion compensator 450.
  • the video is also fed to a variable delay block 452 and fed to a reference frame buffer 454 coupled to a similar camera platform motion compensator 456.
  • the cu ⁇ ent video frame stored in buffer 448 is compared to the reference video frame stored in buffer 454 in a frame subtraction unit 458.
  • Frame subtraction unit 458 determines any movement of any objects in input scene 434 based on a pixel-by-pixel subtraction of two consecutive frames.
  • the data indicating which pixels have changed state between the cu ⁇ ent frame and a reference frame is transmitted to object unit 460 that is configured to determine the number of blobs in the image.
  • a blob is a region of connected pixels that show similar properties, such as color and light intensity, and dissimilar properties from its neighboring pixels. If two blobs are considered "connected" based on unit 460, then the two blobs are merged into a large blob.
  • the similarity between pixels is calculated against predefined constants. Changing the values of these constraints in unit 460 restricts or loosens the connectivity between the blobs.
  • a wide FOV camera such as camera 424 results in proportionally small objects.
  • the threshold parameters defining the difference between noise and objects (blobs) is also adjustable within tracking system 420.
  • Unit 460 performs standard segmentation on the blobs to form object(s) so that various rectangles are formed around groups of similar pixels and a set of rectangles typically creates an "object".
  • Tracking system 420 calculates a point representing the local center of gravity co ⁇ esponding to each of the rectangular boxes, and then a global center of gravity based on the local center of gravity points to form a global centroid coordinate of what is then defined as the "object”. If a single object is defined, na ⁇ ow FOV cameras 428 are moved to the global centroid coodinate of the object.
  • both cameras 428 may be moved to the global centroid coordinate or each of the cameras may be moved to a portion of the image by essentially dividing the single object in half.
  • Unit 460 calculates whether multiple objects exist in input scene 434 and the co ⁇ esponding coordinates based on the interconnectivity between the blobs. In order to split input scene 434 to allow na ⁇ ow FOV cameras 428 to track multiple blobs, unit 458 calculates the distance between the two farthest points included in the segmentation (xi, yi) and (x 2 , y 2 ). Unit 460 also calculates the coordinates co ⁇ esponding to the center of the line (x c , y c ) formed between the two farthest points included in the segmentation.
  • unit 460 calculates the center of the line (x ⁇ , y * ⁇ ) and (x * 2 , y 2) between the center of the line and each of the respective end points [(x c , y c ) and (xi, yi)] and [(x c , y c ) and (x 2 , y 2 )].
  • the coordinates (x i, y i) and (x 2 , y 2 ) are sent to each of the two na ⁇ ow FOV cameras 428 to track the two objects.
  • Unit 460 further iterates the division of input scene 434 based on either the line division algorithm described above, determining multiple coordinates based entirely on centroid analysis or using convex hull theory in computational geometry to find the smallest polygon that covers all of the areas of the blobs.
  • the convex hull method includes finding either the smallest polygon or alternatively fitting a circle around the pixels in a blob. In this case, the largest distance between any two points on the circle is calculated. The third point that defines the minimum area circle is calculated by unit 460. The largest angle of the triangle formed by these three points is then calculated (e.g., the opposite angle across from the largest length of the triangle). Finally, the midpoint of the segment defined by this largest angle approximates the separation point for the region governed by the two na ⁇ ow FOV cameras 424. Assuming the "fish eye" lens are used in both wide FOV cameras 424, the polar coordinates are preprocessed prior to unit 458 into rectangular coordinates using the relationships:
  • unit 460 may also include a first object recognition filter for applying various computational and pattern recognition techniques and condition tables to determine whether any objects of any significance have entered into the wide FOV camera's 424 view.
  • the feature set defining an object includes object size, object direction, average speed of object, path of object, and location of object in relation to specialized areas.
  • Unit 460 compares the feature set parameters associated with each of the objects to a condition table to determine whether the particular object(s) are of any interest to tracking system 420. Assuming at least some of the objects are of interest to system 420, the coordinates of each object are fed from a block 462 to a zoom control block 464.
  • zoom control block 464 moves na ⁇ ow FOV cameras 428 to the coordinates received from object coordinate unit 462 and receives high-resolution video of the particular object of interest.
  • tracking system 420 automatically determines in a second object recognition unit 466 whether the objects of interest are significant objects that should be tracked based on a comparison of the objects feature set to a condition table. If unit 466 determines there are significant objects to be tracked, the video signal is forwarded to a video multiplexer 468 that is coupled to other na ⁇ ow FOV cameras in a block 470 to divide the tracking time of individual tracking cameras 428 between objects. In particular, if there are more significant objects that must be tracked than the number of na ⁇ ow FOV cameras 428, video multiplexer unit 468 divides the amount of camera time spent tracking significant objects between multiple objects depending on how many cameras are available.
  • FIGURE 33 illustrates the autonomous operation of tracking system 420 in connection with the tracking example mentioned above. After system 420 is set up at a start block 480, all sensors 14, 16 and 18 are in a sleep mode 482.
  • Every sensor including motion detector 436 is tested in a block 484 and a block 486 determines if motion detector 436 has been activated. As long as motion detector 436 is not activated, tracking system 420 remains in “sleep” mode. However, as soon as motion detector 436 is activated, tracking system 420 switches from “sleep” mode into a "YELLOW” alarm mode in a block 488.
  • an object block 490 two frames of the video are subtracted from one another to determine the pixels that have changed and the blobs are segmented to form objects based on the centroids of the segments.
  • Noise is filtered out of the picture in a block 492 and a first level object- recognition is applied in a block 494 to the objects identified in block 490.
  • an alarm condition is set in a block 496 conditioned upon whether the object is recognized as a significant object that should be tracked. If block 496 determines the object is not significant, control passes back up to block 484 to wait for additional objects to pass in front of wide FOV camera 424. On the other hand, if an object is determined in block 496 to be significant, tracking system 420 switches from a YELLOW alarm to a RED alarm in a block 498.
  • tracking system 420 As tracking system 420 is iterating through each object to determine whether a RED alarm condition should be set, tracking system 420 also passes control from block 494 to an object coordinate unit 500 to determine the object coordinates of each of the objects identified in block 490.
  • the coordinates from block 500 are fed to a block 502 to move na ⁇ ow FOV cameras 428 to the coordinates of each of the objects.
  • tracking system 420 performs a second object recognition on each of the objects to determine whether a significant object is detected.
  • a block 504 tests whether each of the objects is a significant object. If the particular object is not significant, control passes back to object recognition block 494 to continue to iteratively filter though each of the objects until no more objects are left to test.
  • tracking system 420 switches to the "RED" alarm in block 498 and narrow FOV cameras 428 in a block 506 track each of the significant objects by sending video of the objects in a block 508 to control module 56.
  • the video of the significant objects can be sent real-time to display 224 (FIGURE 15) in a block 510 or the video can be stored for future reference in a clip 512.
  • the video of the significant event can be sent to decision-making unit 514 for the user to take over manual ove ⁇ ide control of system 420.
  • Block 516 tests whether the significant object is out of the field of view. If the object is not out of the field of view, signal continues to be sent at block 508. On the other hand, as soon as the significant object moves out of the field of view, the RED alarm is canceled in a block 518, the YELLOW alarm is canceled in a bock 520 and control passes to the next sensor in a box 522 and then to block 484 to continuously check all of the sensors.
  • FIGURE 34 summarizes the object tracking algorithm of tracking system 420. In order to determine relative object movement between frames, image subtraction is performed on a pixel-by-pixel basis in a step 524.
  • Objects are defined in a block 526 and a first level object recognition is performed on the blobs from step 526 in a step 528. If the first level object recognition determines the object is not significant, decision block 530 passes control back to frame subtraction block 524. If, however, a significant object is going to be tracked, control passes from block 530 to a block 532 to calculate the center of gravity for each object.
  • a na ⁇ ow FOV camera 428 is assigned to each of the objects in a step 534 and a second level object recognition is applied in a step 536 to determine whether na ⁇ ow FOV cameras 428 should track the objects in a decision block 538. If the objects are significant and tracked, video is transmitted back to control module 56 in a step 540. On the other hand, if the object is not significant, control passes from decision block 538 back to frame subtraction block 524.
  • FIGURE 35 illustrates a pair of blobs 542 and 544 including rectangular segments 548 with each segment including a center of gravity point 550.
  • Noise artifacts 552 are eliminated by tracking system 420 and narrow FOV cameras 428 can track blobs 542 and 544 based on a global center of gravity points 554 and 556, respectively.
  • Narrow FOV cameras may also track blobs 542 and 544 using points 558 and 560, respectively, based on taking a center point 562 between a first end 564 and a second end 566, and then dividing formed between end points 564 and 566 and center point 562, respectively, in half to obtain points 558 and 560.
  • Na ⁇ ow FOV cameras 428 track objects 542 and 544 based on the coordinates co ⁇ esponding to points 558 and 560. Alternatively, depending on the number of cameras and the number of objects, cameras 428 can multiplex between multiple objects.
  • heterogeneous sensors 14, 16, 18 connected to one another in multimedia sensor network 12 transmit sensed data to control module 56 in the form of homogeneous, TCP-IP packets through bridge 30 to user network 38 (e.g., the Internet).
  • the sensed data is fused together by implementing a neurofuzzy network relying on gateway software agents 58 and host software agents 70 to form fuzzy sentence outputs that require significantly less bandwidth than transmitting the unprocessed, sensed data from each sensor 14, 16, 18 directly to control module 56.
  • Compression module 24 coupled to sensor network 12 relies on bit selective e ⁇ or co ⁇ ection module 26 and intelligent agent module 28 to compress the sensed data into packetized digital streams and integrate the data through intelligent agent module 28.
  • Sensor network 12 processes and transmits the sensed data nearly autonomously until compression module 24 is unable to fuse together the sensed data because the data from a particular set of sensors contradicts the data from a different set of sensors. In this case, sensor fusion defect module 20 requests user intervention to resolve the contradictory data.
  • the homogenized data is transmitted through bridge 30 that maintains the synchronization between the highly-synchronous, sensed data (e.g., multimedia streams) and the inherent asynchronous nature of user network 38 (e.g., the Internet).
  • the ability to maintain the synchronous nature of the data despite the inconsistent network resources is based on QoS manager 34 that guarantees bandwidth for particular processes based on variations in network resources and buffer manager 36 that implements circular buffer 364 in NIC encoder buffer 206 and NIC decoder buffer 210.
  • buffer manager 36 dynamically and interactively adjusts buffer parameters (timing of read/write processes, buffer width, etc.) based on prioritization and network conditions that maximize resources. While the buffer is waiting for network resources to transmit data, the write process continues to write data into the buffer so that when network resources permit, data is transmitted and the read process catches up to the write process, thereby maintaining the synchronous transfer of data over asynchronous user network 38.
  • One application of network 10 implements sensor network 12 as a network that tracks moving objects.
  • Motion detection sensor module 422 "wakes" up wide field-of-view camera 424 when motion is detected in a particular monitoring area.
  • Wide field-of-view camera 424 transmits data to control module 56 that, based on a first level of object recognition, determines whether any of the identified objects in the monitoring area are significant. If any significant objects are detected, na ⁇ ow field-of-view camera 428 similarly moves to the object's coordinates determined by wide field-of-view camera 424 and transmits additional data to control module 56.
  • real-time video of the object is transmitted from control module 56 to user network 38 over bridge 30 if the object is determined to be significant.
  • Na ⁇ ow field-of-view camera 428 continues to track the significant object until the object is outside of the particular monitoring area.
  • a user can access either the real-time video of the object over user network 38 (e.g., the Internet) or real-time high quality still images of particular scenes.
  • the temporal and spatial data can also be stored in flash memory and viewed on an as- needed basis.
  • the compression that enables the transmission of the still images that are interpreted from the video is based on the meaningful insertion of an I-frame at each scene change in the video. Therefore, the I-frames contain data representative of an entire scene in a video because the I-frames are not simply inserted at predetermined intervals regardless of the content of the video (which is standard practice in prior art compression methods).
  • sensor network 12 can be implemented to fulfill a variety of business-to-business (B2B) and/or business-to- consumer (B2C) models (e.g., visual shopping, distance learning, etc.).
  • B2B business-to-business
  • B2C business-to- consumer
  • a camera could be mounted on a remote robot and a consumer shopping on Internet 38 would receive streaming video over lines 32 and 40 from sensor network 12.
  • sensor network 12 could support interactive video with the consumer on Internet 38.
  • sensor network 12 could support interactive video of a professor in a classroom teaching students over Internet 38. Therefore, the configurations shown and described are not limited to the precise details and conditions disclosed. Furthermore, other substitutions, modifications, changes and omissions may be made in the design, operating conditions and a ⁇ angement of exemplary embodiments without departing from the spirit of the invention as expressed in the appended claims.

Abstract

La présente invention concerne un réseau (10) multimédia qui comprend un réseau (12) de capteurs, une passerelle de communication (30) et un réseau d'utilisateurs (par exemple l'Internet). Ce réseau de capteurs comprend un ensemble de capteurs interconnectés relié à un module (56) de commande. Ce module de commande reçoit un ensemble de données captées de ces capteurs et génère un flux de données homogénéisées à partir de ces données captées. La passerelle de communication est reliée au réseau de capteurs et sert de circuit tampon au flux de données homogénéisées. Le réseau d'utilisateurs est relié à la passerelle de communication et reçoit ce flux de données homogénéisées du réseau de capteurs. Le réseau d'utilisateurs retransmet les données au module de commande via la passerelle de communication.
PCT/US2001/031799 2000-10-16 2001-10-11 Reseau de capteurs multimedia WO2002033558A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002213121A AU2002213121A1 (en) 2000-10-16 2001-10-11 Multimedia sensor network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US69014900A 2000-10-16 2000-10-16
US09/690,149 2000-10-16

Publications (1)

Publication Number Publication Date
WO2002033558A1 true WO2002033558A1 (fr) 2002-04-25

Family

ID=24771291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/031799 WO2002033558A1 (fr) 2000-10-16 2001-10-11 Reseau de capteurs multimedia

Country Status (3)

Country Link
AU (1) AU2002213121A1 (fr)
TW (1) TW569570B (fr)
WO (1) WO2002033558A1 (fr)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10301457A1 (de) * 2003-01-10 2004-07-29 Vcs Video Communication Systems Ag Aufzeichnungsverfahren für Video-/Audiodaten
WO2005036853A1 (fr) * 2003-10-13 2005-04-21 Koninklijke Philips Electronics N.V. Reseau et element de reseau, et procede d'exploitation associe
CN100342410C (zh) * 2005-06-06 2007-10-10 重庆大学 无线生理信息传感器网络的时间同步方法与装置
WO2008039653A2 (fr) * 2006-09-27 2008-04-03 Schlumberger Canada Limited communication d'enregistreur et de capteur
EP1981243A1 (fr) 2007-04-13 2008-10-15 E-Senza Technologies GmbH Système de réseau de communication de données pour communication de données sans fil bidirectionnelle multicanaux
EP2009876A2 (fr) * 2007-06-29 2008-12-31 Honeywell International Inc. Systèmes et procédés de publication de données de capteur altérées sélectivement en temps réel
US7783930B2 (en) 2003-01-10 2010-08-24 Robert Bosch Gmbh Recording method for video/audio data
CN102932217A (zh) * 2012-11-20 2013-02-13 无锡成电科大科技发展有限公司 家庭物联网系统
US8391563B2 (en) 2010-05-25 2013-03-05 Sony Corporation Using computer video camera to detect earthquake
WO2015187832A1 (fr) * 2014-06-04 2015-12-10 Life Technologies Corporation Procédés, systèmes et supports lisibles par ordinateur pour la compression de données de séquençage
CN106170022A (zh) * 2016-08-31 2016-11-30 上海交通大学 一种分布式多媒体传感器控制系统
US9756570B1 (en) 2016-06-28 2017-09-05 Wipro Limited Method and a system for optimizing battery usage of an electronic device
US9846747B2 (en) 2014-01-08 2017-12-19 Tata Consultancy Services Limited System and method of data compression
CN107613413A (zh) * 2012-03-21 2018-01-19 鲍尔卡斯特公司 具有开关和插座控制的无线传感器系统、方法和装置
US10108462B2 (en) 2016-02-12 2018-10-23 Microsoft Technology Licensing, Llc Virtualizing sensors
CN108789456A (zh) * 2017-05-02 2018-11-13 北京米文动力科技有限公司 一种远程视频传输方法及系统
CN110536372A (zh) * 2019-07-17 2019-12-03 长春工业大学 一种基于模糊控制的环形无线传感器网络非均匀分簇算法
CN110780590A (zh) * 2018-07-27 2020-02-11 菲尼克斯电气公司 用于为机器的多通道控制提供安全控制参数的技术
WO2020146036A1 (fr) * 2019-01-13 2020-07-16 Strong Force Iot Portfolio 2016, Llc Procédés, systèmes, kits et appareils pour surveiller et gérer des réglages industriels
US10778547B2 (en) 2018-04-26 2020-09-15 At&T Intellectual Property I, L.P. System for determining a predicted buffer condition based on flow metrics and classifier rules generated in response to the creation of training data sets
CN112783671A (zh) * 2021-01-20 2021-05-11 中国兵器工业集团第二一四研究所苏州研发中心 一种适用于图像语音及数据传输的融合系统
CN113141380A (zh) * 2016-12-14 2021-07-20 微软技术许可有限责任公司 用于经模糊处理的媒体的编码优化
WO2021178145A1 (fr) * 2020-03-06 2021-09-10 Butlr Technologies, Inc. Surveillance de l'emplacement, de la trajectoire et du comportement humains à l'aide de données thermiques
CN113489952A (zh) * 2021-06-30 2021-10-08 电子科技大学 一种面向室内三维场景的视频监控设施布设方法
CN113485274A (zh) * 2021-07-28 2021-10-08 燕山大学 面向工艺过程的数据感知与动态优先级传输联合调度方法
CN115150559A (zh) * 2022-09-06 2022-10-04 国网天津市电力公司高压分公司 具有采集自调整计算补偿的遥视系统及计算补偿方法
CN115796249A (zh) * 2022-11-22 2023-03-14 辉羲智能科技(上海)有限公司 面向chiplet互连的神经网络芯片层切换映射方法
US11694072B2 (en) 2017-05-19 2023-07-04 Nvidia Corporation Machine learning technique for automatic modeling of multiple-valued outputs
US11774292B2 (en) 2020-03-06 2023-10-03 Butlr Technologies, Inc. Determining an object based on a fixture

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7561877B2 (en) 2005-03-18 2009-07-14 Qualcomm Incorporated Apparatus and methods for managing malfunctions on a wireless device
US8595504B2 (en) 2008-08-12 2013-11-26 Industrial Technology Research Institute Light weight authentication and secret retrieval
US10516415B2 (en) * 2018-02-09 2019-12-24 Kneron, Inc. Method of compressing convolution parameters, convolution operation chip and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3604556A (en) * 1970-01-14 1971-09-14 Louis E Schwartz Tape cassette holder
US5861804A (en) * 1997-07-10 1999-01-19 Bakson, Inc. Computer controlled security and surveillance system
US5982418A (en) * 1996-04-22 1999-11-09 Sensormatic Electronics Corporation Distributed video data storage in video surveillance system
US5987519A (en) * 1996-09-20 1999-11-16 Georgia Tech Research Corporation Telemedicine system using voice video and data encapsulation and de-encapsulation for communicating medical information between central monitoring stations and remote patient monitoring stations
US6271752B1 (en) * 1998-10-02 2001-08-07 Lucent Technologies, Inc. Intelligent multi-access system
US6310548B1 (en) * 2000-05-30 2001-10-30 Rs Group, Inc. Method and system for door alert

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3604556A (en) * 1970-01-14 1971-09-14 Louis E Schwartz Tape cassette holder
US5982418A (en) * 1996-04-22 1999-11-09 Sensormatic Electronics Corporation Distributed video data storage in video surveillance system
US5987519A (en) * 1996-09-20 1999-11-16 Georgia Tech Research Corporation Telemedicine system using voice video and data encapsulation and de-encapsulation for communicating medical information between central monitoring stations and remote patient monitoring stations
US5861804A (en) * 1997-07-10 1999-01-19 Bakson, Inc. Computer controlled security and surveillance system
US6271752B1 (en) * 1998-10-02 2001-08-07 Lucent Technologies, Inc. Intelligent multi-access system
US6310548B1 (en) * 2000-05-30 2001-10-30 Rs Group, Inc. Method and system for door alert

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7813221B2 (en) 2002-11-22 2010-10-12 Westerngeco L.L.C. Sensor and recorder communication
US7783930B2 (en) 2003-01-10 2010-08-24 Robert Bosch Gmbh Recording method for video/audio data
DE10301457A1 (de) * 2003-01-10 2004-07-29 Vcs Video Communication Systems Ag Aufzeichnungsverfahren für Video-/Audiodaten
US8051336B2 (en) 2003-01-10 2011-11-01 Robert Bosch Gmbh Recording method for video/audio data
WO2005036853A1 (fr) * 2003-10-13 2005-04-21 Koninklijke Philips Electronics N.V. Reseau et element de reseau, et procede d'exploitation associe
CN100342410C (zh) * 2005-06-06 2007-10-10 重庆大学 无线生理信息传感器网络的时间同步方法与装置
WO2008039653A3 (fr) * 2006-09-27 2008-07-24 Schlumberger Ca Ltd communication d'enregistreur et de capteur
EP2631677A1 (fr) * 2006-09-27 2013-08-28 Geco Technology B.V. Communication d'enregistreur et de capteur
WO2008039653A2 (fr) * 2006-09-27 2008-04-03 Schlumberger Canada Limited communication d'enregistreur et de capteur
EP1981243A1 (fr) 2007-04-13 2008-10-15 E-Senza Technologies GmbH Système de réseau de communication de données pour communication de données sans fil bidirectionnelle multicanaux
EP2009876A3 (fr) * 2007-06-29 2009-03-11 Honeywell International Inc. Systèmes et procédés de publication de données de capteur altérées sélectivement en temps réel
EP2009876A2 (fr) * 2007-06-29 2008-12-31 Honeywell International Inc. Systèmes et procédés de publication de données de capteur altérées sélectivement en temps réel
US8391563B2 (en) 2010-05-25 2013-03-05 Sony Corporation Using computer video camera to detect earthquake
CN107613413B (zh) * 2012-03-21 2021-05-04 鲍尔卡斯特公司 具有开关和插座控制的无线传感器系统、方法和装置
CN107613413A (zh) * 2012-03-21 2018-01-19 鲍尔卡斯特公司 具有开关和插座控制的无线传感器系统、方法和装置
CN102932217A (zh) * 2012-11-20 2013-02-13 无锡成电科大科技发展有限公司 家庭物联网系统
US9846747B2 (en) 2014-01-08 2017-12-19 Tata Consultancy Services Limited System and method of data compression
WO2015187832A1 (fr) * 2014-06-04 2015-12-10 Life Technologies Corporation Procédés, systèmes et supports lisibles par ordinateur pour la compression de données de séquençage
US10254242B2 (en) 2014-06-04 2019-04-09 Life Technologies Corporation Methods, systems, and computer-readable media for compression of sequencing data
US10108462B2 (en) 2016-02-12 2018-10-23 Microsoft Technology Licensing, Llc Virtualizing sensors
US9756570B1 (en) 2016-06-28 2017-09-05 Wipro Limited Method and a system for optimizing battery usage of an electronic device
CN106170022B (zh) * 2016-08-31 2024-03-29 上海交通大学 一种分布式多媒体传感器控制系统
CN106170022A (zh) * 2016-08-31 2016-11-30 上海交通大学 一种分布式多媒体传感器控制系统
CN113141380B (zh) * 2016-12-14 2024-04-30 微软技术许可有限责任公司 用于经模糊处理的媒体的编码优化
CN113141380A (zh) * 2016-12-14 2021-07-20 微软技术许可有限责任公司 用于经模糊处理的媒体的编码优化
CN108789456A (zh) * 2017-05-02 2018-11-13 北京米文动力科技有限公司 一种远程视频传输方法及系统
US11694072B2 (en) 2017-05-19 2023-07-04 Nvidia Corporation Machine learning technique for automatic modeling of multiple-valued outputs
US10778547B2 (en) 2018-04-26 2020-09-15 At&T Intellectual Property I, L.P. System for determining a predicted buffer condition based on flow metrics and classifier rules generated in response to the creation of training data sets
CN110780590B (zh) * 2018-07-27 2023-06-30 菲尼克斯电气公司 用于为机器的多通道控制提供安全控制参数的技术
CN110780590A (zh) * 2018-07-27 2020-02-11 菲尼克斯电气公司 用于为机器的多通道控制提供安全控制参数的技术
WO2020146036A1 (fr) * 2019-01-13 2020-07-16 Strong Force Iot Portfolio 2016, Llc Procédés, systèmes, kits et appareils pour surveiller et gérer des réglages industriels
CN110536372A (zh) * 2019-07-17 2019-12-03 长春工业大学 一种基于模糊控制的环形无线传感器网络非均匀分簇算法
CN110536372B (zh) * 2019-07-17 2022-05-31 长春工业大学 一种基于模糊控制的环形无线传感器网络非均匀分簇方法
US11774292B2 (en) 2020-03-06 2023-10-03 Butlr Technologies, Inc. Determining an object based on a fixture
US11644363B2 (en) 2020-03-06 2023-05-09 Butlr Technologies, Inc. Thermal data analysis for determining location, trajectory and behavior
GB2612680A (en) * 2020-03-06 2023-05-10 Butlr Tech Inc Monitoring human location, trajectory and behavior using thermal data
WO2021178145A1 (fr) * 2020-03-06 2021-09-10 Butlr Technologies, Inc. Surveillance de l'emplacement, de la trajectoire et du comportement humains à l'aide de données thermiques
US11959805B2 (en) 2020-03-06 2024-04-16 Butlr Technologies, Inc. Thermal data analysis for determining location, trajectory and behavior
CN112783671B (zh) * 2021-01-20 2024-01-26 中国兵器工业集团第二一四研究所苏州研发中心 一种适用于图像语音及数据传输的融合系统
CN112783671A (zh) * 2021-01-20 2021-05-11 中国兵器工业集团第二一四研究所苏州研发中心 一种适用于图像语音及数据传输的融合系统
CN113489952A (zh) * 2021-06-30 2021-10-08 电子科技大学 一种面向室内三维场景的视频监控设施布设方法
CN113489952B (zh) * 2021-06-30 2022-03-22 电子科技大学 一种面向室内三维场景的视频监控设施布设方法
CN113485274B (zh) * 2021-07-28 2022-07-29 燕山大学 面向工艺过程的数据感知与动态优先级传输联合调度方法
CN113485274A (zh) * 2021-07-28 2021-10-08 燕山大学 面向工艺过程的数据感知与动态优先级传输联合调度方法
CN115150559A (zh) * 2022-09-06 2022-10-04 国网天津市电力公司高压分公司 具有采集自调整计算补偿的遥视系统及计算补偿方法
CN115796249A (zh) * 2022-11-22 2023-03-14 辉羲智能科技(上海)有限公司 面向chiplet互连的神经网络芯片层切换映射方法

Also Published As

Publication number Publication date
TW569570B (en) 2004-01-01
AU2002213121A1 (en) 2002-04-29

Similar Documents

Publication Publication Date Title
WO2002033558A1 (fr) Reseau de capteurs multimedia
Radha et al. Scalable internet video using MPEG-4
US5140417A (en) Fast packet transmission system of video data
Kishino et al. Variable bit-rate coding of video signals for ATM networks
US6680976B1 (en) Robust, reliable compression and packetization scheme for transmitting video
USRE39955E1 (en) Multiple encoder output buffer apparatus for differential coding of video information
US5758194A (en) Communication apparatus for handling networks with different transmission protocols by stripping or adding data to the data stream in the application layer
Verscheure et al. User-oriented QoS in packet video delivery
FR2851397A1 (fr) Procede et dispositif d'analyse de sequences video dans un reseau de communication
Jacobs et al. Providing video services over networks without quality of service guarantees
Eleftheriadis et al. Dynamic rate shaping of compressed digital video
Wakeman Packetized video—options for interaction between the user, the network and the codec
Dasen et al. An error tolerant, scalable video stream encoding and compression for mobile computing
Parthasarathy et al. Design of a transport coding scheme for high-quality video over ATM networks
Morrison et al. Two-layer video coding for ATM networks
WO1999005602A1 (fr) Schema de codage de compression et de mise en paquets robuste, fiable, utile pour la transmission de videos
Fankhauser et al. The WaveVideo system and network architecture: Design and implementation
Khilenko et al. Improving the Quality of Automated Vehicle Control Systems Using Video Compression Technologies for Networks with Unstable Bandwidth
Sharon et al. Modeling and control of VBR H. 261 video transmission over frame relay networks
Jacobs et al. Adaptive video applications for non-QoS networks
Esteve et al. A flexible video streaming system for urban traffic control
JP5159084B2 (ja) 有効な帯域幅管理を用いる監視システム用のオープンシステムアーキテクチャ
Kassler et al. Classification and evaluation of filters for wavelet coded videostreams
Sharon et al. Accurate modeling of H. 261 VBR video sources for packet transmission studies
Sharifinejad The quality of service improvement for multimedia over high-speed networks

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP