MXPA99004778A - System and method for scheduling and processing image and sound data - Google Patents

System and method for scheduling and processing image and sound data

Info

Publication number
MXPA99004778A
MXPA99004778A MXPA/A/1999/004778A MX9904778A MXPA99004778A MX PA99004778 A MXPA99004778 A MX PA99004778A MX 9904778 A MX9904778 A MX 9904778A MX PA99004778 A MXPA99004778 A MX PA99004778A
Authority
MX
Mexico
Prior art keywords
data
temporal
time
temporary
image
Prior art date
Application number
MXPA/A/1999/004778A
Other languages
Spanish (es)
Inventor
Brandon Dalbert
Original Assignee
America Online Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by America Online Inc filed Critical America Online Inc
Publication of MXPA99004778A publication Critical patent/MXPA99004778A/en

Links

Abstract

A system and method for creating and processing a unified stream of data that includes both temporal and non-temporal data, preferably in a compressed format. A scheduler (602) takes temporal data (e. g. sound data) and non-temporal data (e.g. image data) and interleaves them together to form a unified data stream (610). A processor is included that decompresses compressed image data and produces an output image from both decompressed image data and any uncompressed image data. The processor also plays the temporal data (604) while concurrently decompressing the compressed temporal data. Temporal data (604) may be in any format, including voice data and MIDI files, as well as image data, including videos and still images. A computer slide show may be formed by scheduling and playing video and sound data (including MIDI and voice), in which the video and sound data are interleaved into a unified data stream (610).

Description

SYSTEM AND METHOD TO PROGRAM AND PROCESS IMAGE AND SOUND DATA Background of the Invention 1. Field of the Invention The present invention defines a system of programming and processing of images and sounds that allows to interlace images and sound in a unified data stream and allows the processing of the unified data stream. Compendium v Background of the Invention The science of converting both images and sound into appropriate data indicative of the images and sound is called multimedia, transmitting the data through a channel to a final destination. As the bandwidth of the channel becomes more limited, it becomes more desirable to further compress the shape of the images and sound. Images and sounds are commonly transmitted via wireline transmission, such as through a telephone line using a modem. The compression of information increases the amount of data that can be sent by this limited bandwidth channel. Various techniques for compressing sound and image data are known in the art. For example, the image data may be stored as GIF "or" JPEG "images, as is known in the art.The images may also be compressed using the techniques described in the pending United States patent applications with Nos. Series 08 / 636,170; 08 / 545,513; and 08 / 275,945, all of which have been assigned to the assignee of the present invention Sound compression is also well known A particularly preferred form of sound compression uses a vocoder to compress voice sounds Vocoder technology is well established Music can also be transmitted in MIDI format MIDI sounds are transmitted as a series of notes versus times, all notes being reproduced together in order to form The final sound The success of many compression techniques is based on the similarity between information in two different temporal instances. The compression techniques are based on the model that real-world information does not change much in the short term. Therefore, in time, many images will have more similarities than differences. Also, many animation sequences have more similarities than differences between subsequent sequences. The sounds can also be compressed, because the short-term changes are extremely small. The sound can be further compressed by modeling the sound using various well-known techniques. An example of a sound compression technique is disclosed in U.S. Patent Application Serial No. 08 / 545,487, pending, assigned to the assignee of the present invention. In general, sounds and images must be compressed using vastly different technologies. Although it is possible to compress both sounds and images using the same technology, the resulting compression is not optimized for sounds or images. Still, it would be desirable that the information of sounds and images be sent jointly by a common channel. Accordingly, it is an object of the present invention to define special ways of processing sounds and images in order to facilitate their sending by a common channel. The present invention includes special processing techniques that make the processing of sounds and images more efficient. In a first embodiment, the present invention is a sound transmission system that operates in a channel of limited bandwidth. The system includes a first element, configured to receive a sequence of sounds to be encoded. The system also includes an analyzer element, which checks a portion of the sound sequence, and analyzes the sequence of sounds to determine a quantity of data that can be delivered by the limited bandwidth channel. Finally, the system includes a calculating element that calculates a quantity of data that can be transmitted and determines a scarcity of data that can lead to a possible stoppage of the system.
The calculating element includes a system forecaster and can detect a stop, and if a stop is detected, it can find out if there is a break point in the sound sequence and can stop the sequence of sounds at the break point. In another embodiment, the present invention is a method for processing image data and sound data in a unified data stream. The method includes the following steps: (a) interleaving image data and sound data to form a unified data stream, at least some of the image data being in a compressed image format and at least some of the sound data being in a compressed sound format; and (b) process the unified data stream. In step (b), the image data in the compressed image format is decompressed and an output image is produced from the uncompressed image data and any uncompressed image data, the sound data being reproduced simultaneously while they are being decompressed. The details of the preferred embodiment of the present invention are indicated in the accompanying drawings and the description that follows. Once the details of the invention are known, numerous changes and additional innovations will be obvious to one skilled in the art. BRIEF DESCRIPTION OF THE DRAWINGS These and other aspects of the invention will now be described in detail with reference to the accompanying drawings, wherein: Figure 1 shows a data format according to the present invention. Figures 2A and 2B are block diagrams of two embodiments of a player for the data format of Figure 1. Figure 3 is a block diagram showing in greater detail the player of Figure 2. Figure 4 It is a flow chart that shows the process by which the player operates. Figure 5 is a flow chart showing the process by which a MIDI file is played. Figure 6 shows a programmer or glider that interleaves information of sounds and images in a single stream of data according to the present invention. Figure 7 is a flow diagram showing how interlaced data of sounds and images are programmed. Figure 8 is a flow chart of an integrated player system for a progressive slide show player in accordance with the present invention. Figure 9 shows a programmer or glider that intertwines the constituent data of a progressive slide in a single data stream according to the present invention.
Fig. 10 is a flow chart showing the process by which the programmer or glider of Fig. 9 minimizes stoppages in the progressive slide show. Reference numbers and similar designations in the various drawings indicate similar elements. Detailed Description of the Invention Throughout this description, the preferred embodiment and the examples shown should be considered as exemplary, rather than as limitations of the present invention. The preferred embodiment processes the data from various sources and collects this data together in special ways that are described herein. In accordance with the present invention, the voice data can take any encoded voice form. The voice data is then encoded as a series of packets. The first portion of the packet indicates the packet length, typically between 1 and 256 bytes. In this embodiment, each packet represents approximately 240 ms of sound. A more accurate playing time can be determined by queuing the voice code with the packet. The first portion of the package also includes additional information about the nature of the package. Natural voice often includes natural cuts, such as pauses between sentences, within sentences, or even pauses within a word. The first portion also described these natural breakpoints in the coded voice information. As will be described later in greater detail, these breakpoints provide a special advantage that is used in the present invention. The system of the present invention looks ahead to determine how much data can be delivered by the limited bandwidth channel. There may be a point where the forecast indicates a possible stoppage of the system, ie there is not enough data to continue without guaranteed uninterrupted reproduction. Such a stop can produce a very unnatural sound. In the system of the present invention, if a stop is detected by the system forecast, that stop causes a break at a natural breaking point. This minimizes the unnatural feeling in the sound. A final goal of this invention is to combine temporal data with non-temporal data in a unified data stream. The term "temporary data" refers to information that must be reproduced in an uninterrupted manner and includes data that must be delivered at a particular time, although not necessarily temporarily executed. Examples of temporary data include sound, MIDI, video and some commands. The term "non-temporal data" refers to information that does not need to be reproduced in an uninterrupted manner, including image data and certain command data, such as commands that affect the representation of an image.
For convenience, in some parts of this description the temporary data will be described strictly as sound data, and the non-temporal data will be described strictly as image data. However, it will be understood that temporary and non-temporal data are not limited in this way and must be interpreted to have their broadest possible meaning. The image information is preferably obtained in the form of a marked run length. The brand includes a header that describes the length and type of information. For example, the information may be compressed in any of several different ways, including Huffman, table form, VQ, DCT or the like. The initial package can also provide additional description of the information. The data of sounds and images are formed, each one, in segments. Each segment can contain many packets, which, as described above, can be from 1 to 256 bytes in length. An image segment is typically much larger than a segment of sound. An image segment can vary between about 1 and 32 kb. Sound packages can be collected in a segment that can also vary between about 1 and 32 kb. However, it should be understood that the length of the segments noted herein is only exemplary and that segments of other lengths may be employed. In this invention, temporary data packets (e.g., sounds) are interleaved between non-temporal data packets (e.g., image data), as described in more detail below. This packer of combined images and sounds is referred to in this description as the "programmer or glider". The programmer or glider will also be described in more detail below. The resulting data format 100 is shown in Figure 1, which shows temporary (eg, sound) data packets 102 interleaved between non-temporal data segments (e.g., images) 104. Each temporary data packet 102 includes a header portion 110, which may include information such as the "first portion" described above and preferably provides information about the type of temporary data contained in the packet (eg, sound, MIDI, control data). Each temporary data packet 102 also includes data sections, which include sound data 172 separated by 180 length bytes of run. Each non-temporal or image segment 104 also includes a header portion 120 and image data 122. The markers 130, 132 are placed at various times within the temporary data packets 102. In addition, a start header 150 may be used. to set various parameters for the data format 100. The markers 130, 132 and the start header 150 will be described later in detail. It will be understood that the temporary data packets 102 are not limited to packets encoded as voice, but may also include video and musical sounds, either compressed or uncompressed. Also, any known technique can be used to incorporate the sounds, including MIDI, FM synthesis or any other technique, and sounds can be reproduced by any known technique. The player operation is described with reference to FIG. 2A, which shows a system of passage through it 200. A particularly important aspect of the technique of the present invention takes advantage of the way in which the decoding software has often operated. images. Most decoders of this type will decode only those images whose format the decoder recognizes. The decoder will discard any data that does not fit the criteria for the preferred data. The information of combined images and sounds 100 of the form shown in Figure 1 is passed to an image reproduction 202. The image player 202 produces the output image 204 and ignores the sound data 206. However, the image player 202 is of the pass type through it. Accordingly, the sound data 206 is a pin to a dedicated sound player module 210. The dedicated sound player module 210 operates as described herein, to additionally reproduce the sound information. It will be recognized that the system of the present invention can be formed with several hardware modules (ie, as a "multiple thread" system) using query tables and other hard coding. The system is implemented with greater preference in software, in which case all the coding modules would be incorporated as APIs in dynamic link libraries ("DLLs"), in a multi-threaded operating system. The system can also be incorporated as a "single thread" system. A second embodiment of the player is a non-pass system (or single wire) 220, as shown in Figure 2B. In this embodiment, the application is constructed to know which parts of the data stream 100 are sound, and which parts are not. The sound portions are sent directly to the player 222, which operates as described herein. The preferred sound player 300 is shown in Figure 3. The compressed sound data 301 is data as input to a series of input buffers 302, 304, and 306. Each input buffer 302, 304, 306 stores a certain amount sound data 301. A player logic element 310 controls the operation of all input buffers 302, 304, 306, by controlling a multiplexing protocol to select data from the next buffer that needs to be emptied. Compressed data of the input buffers 302, 304, 306 are outputted by the player logic element 310 to a decoder 312. The decoder 312 uses the coding system opposite that used to encode to produce a modulated sound data stream. , pulse code ("PCM") 315. It should be understood that the PCM sound data may be of any type of audio or voice data. The output PCM sound data 315 is sent from the decoder 312 to a plurality of output buffers 322, 324, 326. Output buffers 322, 324, 326 respectively store the output PCM sound data 315. The reproduction operation is commanded by a global command module 330 which produces a playback command 332. A playback controller 335 determines which of the output buffers 322, 324, 326 is the next buffer to be processed. The information of the next output buffer is obtained and outputted as output PCM information 340. An additional 350 format translation system (e.g., a PCM sound card) converts the PCM 340 information to the desired format, eg format of FM synthesis sound card, operator synthesis, or MIDI. An important aspect of the sound player operation shown in Figure 3 is the ability to decompress when continuing its operation; that is, the 300 sound player decompresses data on the fly. The multiple shock absorber structure of the sound player 300 facilitates the decoder 312 decoding the content of an input buffer and outputting that content to a different vacuum output buffer. The empty output buffer can then be processed appropriately. Importantly, there is no need to decompress the entire file before playing the sound, because the information is decompressed on the fly. As part of this ability to decompress on the fly, moreover, the 300 sound player must have some intelligence. The player logic element 310 must determine if it can continue to play based on the number of sounds in an input buffer 302, 304, 306. The sound in the input buffer is limited by the amount of sounds that can be transmitted by a limited bandwidth channel. The operation continues to determine a graphic relationship between the amount of data and its playing time. A flow chart of the operation of the sound player 300 is shown in Figure 4. In step 401, the player logic element 310 investigates the input buffers 302, 304, 306 to determine if there is a predetermined amount of information in the shock absorbers. This predetermined amount of information is labeled "pre-initial delay". The pre-initial delay is a quantity of data that is set as sufficient to allow safe operation of the sound player 300. In this description, several synchronizations are considered in accordance with the present invention with reference to after filling in the pre-initial delay. . This pre-initial delay is used to provide a buffer after which reproduction may occur. An end-of-file indication ("EOF") will always start the operation of the sound player 300, even if the amount of data stored in the input buffer is not equal to the pre-initial delay. If the pre-initial delay or EOF is determined in step 401, the operation of the player defines time zero (T = 0), shown as step 402. Step 404 analyzes the sound sequence in the input buffers 302, 304, 306, looking forward to a breaking point in the sound data. If a breakpoint is found, the sound reproduction begins at step 406, the sound is reproduced until it reaches the breaking point. If, on the other hand, breakpoint is not found in step 404, the operation proceeds to step 408, which determines whether the amount of data in the input buffers 302, 304, 306 is greater than a maximum allowable delay of according to the present invention ("maxbacklog"). If not, the control goes back to step 404 to look for a break point forward in the sound data. If a breakpoint is determined at 404, or the maximum delay is determined in step 408, the stored sound is reproduced in step 406. The reproduction must continue until a breakpoint is found. Step 410 determines a stop condition. The stop condition is caused by a loss of data without a breakpoint in which to interrupt. This condition causes the reproduced sound to stop at an unnatural location. The detection of a stop in step 410 indicates that the character of the data is such that a stoppage is likely to occur. This is handled by replacing the maximum delay with a re-start delay value in step 415. The re-start delay increases the amount of data that needs to be in the input buffer delay, to make it less feasible for another unemployment. An important aspect of the present invention is the ability to measure the actual baud rate at which sound data is being transmitted. Referring to figure 1, the markers 130, 132 are placed at various times within the sound packets 102. The distance between the markers 130, 132 indicates a specific amount of time of sound data. The time in which each marker 130, 132 is received is noted. The time between marker receptions is determined, and that time is divided by the actual reproduction time between bookmark receptions. This allows the determination of the baud rate for the sound data, even when the image data has been mixed with the baud rate for sound. The start header 150 can be used to set the original parameters, such as original delay, maximum delay and restart delay. This allows these parameters to be encoded separately for each slice in the system. It also allows different types of data to have different delays. MIDI represents a preferred type of sound that is used in accordance with the present invention. MIDI files are well used in the field of computers. A Windows MIDI file includes a number of information items, including sounds and timings for each instrument. Each track of the MIDI file represents an instrument and includes a plurality of times and other information messages. In the state of the art, because the sound is a combination of several MIDI tracks, it was necessary to obtain all the tracks and join them together before any of them could be reproduced. In this way, each track can include many messages, each including time and data. All these MIDI file information pieces must be correlated with each other to form the overall sound of the instrument. It is standard in the field, therefore, to receive the entire MIDI file and correlate it together before anything is reproduced. The present invention describes MIDI processing techniques to allow MIDI playback on the fly. The standard format MIDI file according to the present invention is pre-processed by a special translator which translates the standard MIDI format into a special MIDI format. This special format groups all the messages for a certain time. Once all the messages for the specific time are read, that portion of the MIDI file can be "played back." This allows part of the MIDI file to be played before the entire MIDI file is received.Figure 5 shows the process for playing MIDI files on the fly Step 500 gets the entire MIDI file, including all the MIDI messages described above Step 502 classifies the MIDI file by time, so that the messages for a specific time are kept together This classified file is something longer than a normal MIDI file, but it is classified according to time.As the classification is by time, a specific time can be played before the entire MIDI file is received.The step 504 creates a stream of MIDI data sequenced Then, in step 506, the special MIDI data stream is compressed, preferably in a format that allows current to be sent to the compressor. Step 508 produces compressed packets of MIDI data, which are interleaved with other data packets in the unified data stream. An important question is how to program or plan this interlaced information in a unified stream of sound and image data. This is done by the improved scheduler or glider of the present invention. Figure 6 is a block diagram of the preferred glider or scheduler 602. It should be understood that the programmer or glider 602 is preferably formed of software modules, although the programmer or glider 602 may be formed of hardware devices that perform the same functions. The programmer or glider 602 receives temporary information 604, i.e. information that is important to reproduce in an uninterrupted manner, such as sound. The programmer or glider 602 also receives non-temporal information 606, such as image data. The scheduler 602 interleaves all this information together to create streams of programmed information 610, as shown in FIG. 6. The programming operation is carried out as follows. First, in this example, the programmer or glider 602 decides that track 0 will be the voice. The programmer or glider 602 places a speech segment 1 612 at time zero (t0) and a MIDI segment 1 614 at time t0 + V0 where V0 is download time 616. Note that the speech segment 1 612 and the MIDI Block 1 614 have playback times that are longer than their download times. Thus, the play time 618 for the MIDI segment 1 614 is greater than the download time 617. In a similar manner, the speech segment 1 612 has a playtime 620 that is greater than its download time 616. The playing time defines when the next voice / MIDI value will be needed - at time 622. The playing time for audio segment 1 630 ends at time 624. The programmer or glider 602 places a delay free space 626 between the end of the download for the next audio segment 640 and the end 624 of the playing time 618, 620 for the audio segment 630. Accordingly, the programmer or glider 602 establishes a minimum start time 622 for the next segment 640. It was recognized, however, that such programming or planning leaves open spaces between the audio segments 630, 640. Moreover, the minimum reproduction times are the last possible times when the audio segments 630, 640 may be produced. The audio segments 630, 640 may be downloaded at any time before the minimum reproduction times, as necessary. Free spaces between audio segments are used to store image segments, such as II 650. If there is an overlap of programming or planning between an image segment and an audio segment, the audio segment is moved to a longer time early in order to leave more space for the image segment. In addition, if necessary, the audio segments can be divided into breakpoints. The overall operation of the programmer or glider 602 follows the flow chart of figure 7. 'Step 700 places the information of the highest priority on a first track ("track 0"). Step 702 graphs the download times and playback times to determine a later start time ("st") for the next audio segment, according to the following equation: (1) stn = ptn-1 - max gap - dt max backlogn In equation (1), "n" the current block, "nl" is the previous block, "dt" is the download time and "pt" is the playback time. Gap is the free space and backlog is the delay. All temporary information, ie all the information that needs to be reproduced without interruption, is placed in its last possible location in step 703 to form an initial map. In step 704, the image information is adjusted in the free spaces between adjacent audio segments. Step 706 determines any overlaps between any information. If any overlap is found in step 706, the audio segments are moved back in step 708, to avoid overlapping. Otherwise, the process starts again in step 700. After all overlaps are removed, step 710 determines if any adjacent audio segment can be placed next to another without any free space between them. If so, such adjacent audio segments are placed together in step 710 to preserve the header space that would otherwise be necessary for two separate segments. Step 710 also removes all dead spaces between audio and image elements by sliding back the elements, that is, programming the elements for an earlier download. The start of the scheduler or glider 602 is taken at time zero (t0) once the original delay has been captured in some buffer. Recall, as before, that the original delay is a value that has been established, but can be reset for any information, as desired. The put and slide algorithm described above with respect to step 710 operates by determining the last possible point for information and sliding time back to accommodate it. Various refinements are also contemplated, such as separating at breakpoints and other techniques. An integrated, exemplary slide show player system 800 is shown in FIG. 8. A data stream 802 of slide show information is given as input to a demultiplexer ("demux") 804, preferably in a known protected format. in the present as the "art format". The demux 804 analyzes the data stream 802 to determine its component parts, which, as in the example of FIG. 8, may include the 806, MIDI 808 and speech / voice 810 image. A set of commands 812 is also divided from the data stream 802 and output to a control element 814 for processing. The control element 814 preferably uses the previously described techniques, including determining the delay and the reproduction time, as part of the reproduction sequence. The control element 814 uses the commands 812 to form a table of events 816. The table of events 816 is of the format that includes a specific time and a list of events that occur at that moment. The contents of the event table 816 is used to generate processing commands 818 that are sent by the control element 814 to other elements in the player 800, as will be described later. Event table 916 is also used to form connections between selected elements, as will also be described later. The control element 814 also includes a feedback line 820 which is used to send programming commands back to demux 804. Although the demux 804 has some inter-built intelligence, allowing it to discern between various types of data and commands, it may not be able to carry out other functions with his inter-built intelligence. For example, the demux 804 may be unable to split instances within an output data stream. By adding the 820 feedback line, demux 804 added intelligence can be given. In particular, the feedback line 820 allows the author of a slide show to program the demux 804, as desired, by inserting commands into the data stream 802 that sends the feedback line 820 to the demux 804 to control certain functionality of the demux 804 The demux 804 also sends image data 806, MIDI data 808 and speech / voice data 810 to several respective data processing chains 830, 840, 850. As shown in Fig. 8, each data processing chain 830 , 840, 850 may include chains of multiple instances (ie, may be of multiple threads). For example, the image data processing chain 830 and the voice data processing chain 840 may include, each one, multiple instance chains of "active" (Ia, Ib, etc., for the image data). 806 and Va, Vb, etc., for voice data 810). Each of the image asset instances controls an asset of the image data 806, and each of the speech asset instances controls an asset of the voice data 810. Similarly, the MIDI chain 840 has multiple chains of instances of "short" (Ma, Mb, etc.), each of which controls a short of the MIDI file. The author of the slide show can define a first image asset (Ia) to be the background of the image, a second image asset (Ib) to be a foreground element of the global image, and so on. The first element in each of the data processing chains 830, 840, 850 is a buffer 832, 842, 852, respectively, which are preferably software buffers. (For simplicity, the remainder of the description will be limited to the image chain 830, unless otherwise indicated.) Asset buffer 832 stores a predetermined amount of image information that forms a block of image information. When the (preferably) compressed data is stored, the event table 816 sends a start command 833 that initiates a decoder instance element ("DI") 834, forming a connection between the asset buffer 832 and the DI 834 The DI 834 receives the compressed image data from the asset buffer 834 and decompresses the data to output pixel data. After the DI 834 decodes the pixel data, it can be stored in a decoded image buffer 835 and / or further processed. Once the pixel data comes out of the DI 834, the DI 834 can be released for another use. An important aspect of the player 800 of the present invention is its ability to operate in a progressive presentation environment. This is achieved, for example, by setting the DI 834 to decode on the fly, ie while the asset buffer 832 is being filled with image data. In addition, the art format according to the present invention can operate by initially sending a low resolution (or splash) version of the video image, which is followed by additional details about the image. The integrated slide show player 800 of the present invention and other systems described herein allow to first send a splash image in the data stream 802, representing the splash, and later sending more information about the image in the data stream 802. This can be done while other information, including voice data 810 and MIDI data 808, is being sent simultaneously. In addition, the progressive display capability of the player 800 allows an observer to change forward, pause, etc., even though the player 800 has not received all the data from the data stream 802. In such case, for example, If the observer has paused at time t = x within the slide show, the image will be presented as the time progresses. This may occur at any given point within the slide show, regardless of the amount of data 802 received by the player 800. An image processor instance ("IMI") 836 may also be initiated by the event table 816, the which causes the control element 814 to send an IMI command 837 to the IMI 836. IMIs 836 of multiple threads are used to make various changes in the pixel data, such as moving the position of the pixel data, tilting the pixel data , or change aspects (for example, color or size) of the pixel data. Each IMI processes its respective global image assets to be represented. Depending on how the slide show is performed, at a different point in time in the slide show, different image assets can be represented. Moreover, assets can be added to a base of small pieces, creating a "montage" effect. Alternatively, or in addition, the resolution of the animated images can be improved upon arrival of more image data. Thus, in accordance with the present invention, the author "of the slide show may have complete control over the display of any portion of the animated images, including the synchronization to represent various image assets and the quality (or resolution) of its representation, after the IMI 836 completes its operations, IMI 836 can be removed from the processing chain to release its resources for use by other units. The image data output of all the IMIs 836 (ie, Ia, Ib, Ic, etc.) is sent to a master renderer 838 and / or to another buffer (not shown), which sends the image represented and composed to a screen. For MIDI and voice data, the IMIs 846, 856 output the data to a master player 848, 858, which can receive commands 849, 859 of the control element 814 to control the playback of sound and MIDI shorts. The multithreaded environment means that many of these instances and these processes may be running concurrently. The number of processes that are running simultaneously is only limited by system resources.
Importantly, the same image can remain in the 835 decoded image buffer, even if the DI 834 and / or IMI 836 have completed their operations. As a result, those same data can be re-processed by a new ID and processed in image in a different way. The 834 DIs that are not being used, however, are removed to free system resources. As described earlier, commands 833, 837 create links between the various processors. Although not shown, an asset buffer 832 may be coupled to any or a plurality of non-temporal instances 834. Accordingly, for example, a command may create a link between DI 834 in the active chain Ia and the buffer active 832 in the asset chain Ib. Another command 837 may then be sent to -IMI 386 in the active chain Ic, meaning that the IMI must carry out scaling or some other operation on the image data in the active chain Ic. The master renderer 838 determines the position of the image, for example that the image must be placed in the position (x, y) on a screen (not shown). Once the image is on the screen, it can be left as such, removing the intensive processing resources in use of the computer. The image can also be rotated, scaled, altered in color or similar, re-initiating the IMI 836. Moreover, as described above, a relatively low level animation can be carried out by improving the data on the fly with more resolution. Because the decoded image buffer 835 can be maintained, additional information about the image can also be obtained. Figure 9 shows a slide show programmer or glider 902 according to the present invention. Temporal data 904 and non-temporal data 906 are given as input to the programmer or glider 902, which programs both types of data and interleaves them into a unified stream of slide show data 910. In the example of figure 9, the programmer or glider 902 has placed six video animation data packets (I1-I6) 915-920 in the stream of data 910 adjacent to each other. Prior to the video image packets, the programmer or glider 902 has placed a first voice data packet (Vx) 912 and a first MIDI data packet (MIDI 914 in the 910 data stream before the 915 image packets). -920 Each package related to sound, V? 912 and MIDI] _ 914, has a playback time of T = 4, and each of the image packages I? -I5 has a playback time of T = 5, while I6 has a reproduction time of T = 6. The method for programming or planning will be described in detail below, however, it is seen in Figure 9, that download times 922, 924, respectively, for I4 e I5 are after the reproduction time T = 5 has already expired.This will cause unnatural breaks or stoppages in the video portion of the slide show.Therefore, the present invention provides a mechanism by which such stoppages can be minimized. , or at least reduce in an Figure 10 is a flow diagram showing the method by which the programmer or glider 902 schedules the data packets within a slide show to minimize stoppages. First, step 1002 assigns to each data packet within the slide show a playback time, depending, for example, on the desired resolution of an image at a particular time. As noted above with reference to Figure 9, multiple packages may have the same reproduction time, depending on the wishes of the author. For example, if a video image consists of 15 total packets, the first five image packets can be assigned a particular playing time and the other ten packets another. As a resultAt the time of reproduction for the first five packets, the image data will be displayed ", and at the time of reproduction for the next ten packets, the images will be completed, then step 1004 classifies the packets by the time of assigned reproduction, grouping packages with the same reproduction time together, then, in step 1006, for a given reproduction time, the packages are classified sequentially, for example, five image packages belonging to a single image asset, all of which have the same playing time, will be classified sequentially, ie from 1 to 5 in order .. Step 1008 calculates a pre-delay value, which defines a minimum amount of data downloaded to the player to ensure that the player The step 1010 then calculates the download time for each data packet in stream 910. Next, step 1012 places Any data packets that are downloaded after their playing time has expired and determines the largest delta between subsequent play and download time. An example of such a delta is shown in Figure 9, which shows the delta 926 between the playback time and the download time of the I5 data packet, a packet whose download is completed long after its playing time has expired. . Step 1014 does a reverse calculation to determine the number of bytes in the largest delta at a given baud rate. Alternatively, the number of bits or some other data measure can be used. Step 1016 assigns the calculated number of bytes to be a "pre-load byte" value, which defines a minimum number of bytes that the player must have received to reproduce the data. If the baud rate remains constant, at the baud rate used in step 1014, stoppages will not occur. Various embodiments of the present invention have now been described. However, it will be understood that various modifications can be made without departing from the spirit and scope of the invention. Accordingly, it will be understood that the invention should not be limited by the specific embodiment illustrated, but only by the scope of the appended claims.

Claims (40)

  1. CLAIMS 1. A sound transmitting system that operates through a limited bandwidth channel, comprising: (a) a first element, which receives a sequence of temporal data to be encoded; (b) an analyzing element, which reviews a portion of the temporal data sequence, and analyzes the sequence of temporal data to determine a quantity of data that can be delivered by the limited bandwidth channel; and (c) a calculation element, which calculates a quantity of data that can be transmitted and determines a scarcity of data that can lead to a possible stoppage of the system, the calculation element including a system forecast that can detect a stoppage, and If a stop is detected, you can find a breakpoint in the sequence of temporary data and you can stop the sequence of temporary data at the breakpoint.
  2. 2. A method for programming or planning and processing temporary data and non-temporal data in a unified data stream, comprising: (a) interlacing temporal and non-temporal data to form a unified data stream, at least some of the data not temporary being in a temporary compressed format and at least some of the non-temporal data being in a compressed non-temporal format; and (b) process the unified data stream, including: (1) uncompress the non-temporal data in the non-temporal compressed format and produce an output image from at least some of the uncompressed non-temporal data and any data non-temporal non-compressed, and (2) simultaneously reproduce the temporary data while the temporary data is decompressed in the temporary compressed format. The method of claim 2, wherein the step of simultaneously reproducing the temporary data while decompressing the temporary data includes: (a) inputting at least some of the temporary data from the unified data stream to an input buffer; (b) decompress the temporary data in the input buffer to obtain uncompressed temporal data; (c) input uncompressed temporary data to an output buffer; (d) input additional temporary data held in the output buffer while decoding additional temporary data. The method of claim 3, wherein the step of reproducing decompressed temporal data includes: (a) determining whether the temporary data held in the output buffer is equal to or greater than a predetermined amount of data known as a pre-delay initial, or if an end-of-file flag ("EOF") is maintained in the output buffer, the EOF flag indicating the end of the temporary data contained in the unified data stream; (b) if the pre-initial delay or the EOF indication is in the output buffer, look for a break point in the temporal data contained in the output buffer; (c) if a breakpoint is found, reproduce the temporary data in the output buffer until the breakpoint is reached; (d) if no breakpoint is found, determine whether the amount of temporary data in the output buffer is greater than the maximum delay, the maximum delay representing a minimum amount of temporary data required to be in the output buffer to reproduce when there is no breaking point; and (e) if the amount of temporary data in the output buffer is greater than the maximum delay, reproduce the temporary data in the output buffer. The method of claim 4, wherein the step of reproducing the decompressed temporal data further includes: (a) detecting whether a stop occurred during the reproduction of the temporary data; and (b) if a stop condition has occurred, replace the maximum delay with a re-start delay, restart delay by increasing the amount of temporary data required to be in the output buffer to play when it is not found breaking point. The method of claim 2, further comprising: (a) establishing a plurality of markers in the temporal data at various times in the unified data stream; (b) verifying the unified data stream to determine a time when each marker of the plurality of markers is received by a player; and (c) dividing an actual reproduction time between selected markers between the time between receipt of selected markers to determine a baud rate for the temporary data. The method of claim 5, further comprising: (a) establishing the maximum delay, the restart delay, and the pre-initial delay; (b) insert the maximum delay established, the delay of re-start established, and the pre-initial delay established in a start header; and (c) place the input header at the beginning of the unified data stream. The method of claim 2, wherein at least a portion of the temporary data is a MIDI file, the MIDI file including a plurality of tracks, each track having a plurality of messages, each message including time data, the method comprising further classifying the MIDI file using the time data to group messages together to form a plurality of MIDI file portions, each portion of MIDI file representing a specific time. The method of claim 8, further comprising: (a) placing each portion of the MIDI file in a respective audio segment in the unified data stream, each audio segment having a download time and a playback time; (b) placing at least one non-temporal data segment between a first audio segment and a second audio segment; (c) locating a delay data free space after the end of the playback time of the first audio segment and the end of the download time of the second audio segment; and (d) locating at least one non-temporal data segments after the delay data free space. The method of claim 9, further comprising: (a) dividing an audio segment into a breakpoint in the audio segment to form a first divided audio segment and a second divided audio segment; and (b) placing at least one non-temporal segment between the first divided audio segment and the second divided audio segment. The method of claim 2, wherein the interleaving step includes: (a) dividing the temporal data into a plurality of audio segments; (b) plotting a download time and a playback time of a selected audio segment and a next audio segment to determine a later start time for the next audio segment; (c) placing all audio segments in the later place possible within the unified data stream to form an initial map that includes the temporary data; and (d) adjusting at least one non-temporal data segment in any free spaces that exist between adjacent audio segments in the initial map and that are large enough to accommodate a non-temporal data segment. The method of claim 11, wherein the interleaving step further includes: (a) detecting whether any overlaps exist between any adjacent audio segments or between any audio segment and any non-temporal segment or between any non-temporal segments; and (b) if any overlap is detected, move at least one audio segment to eliminate the overlap. The method of claim 12, wherein the interleaving step further includes: (a) after removing any overlaps, determining whether any adjacent audio segments can be placed next to each other without any free space between the adjacent audio segments; and (b) re-locating such audio segments without free space to eliminate free space in order to conserve space in a header located in the unified data stream. The method of claim 13, wherein the interleaving step further includes removing any free spaces between adjacent audio segments and non-temporal segments. The method of claim 2, wherein at least some of the non-temporal data is image data, the method further comprising: (a) sending a low resolution version of the image data in the unified data stream before send additional resolution of image data; (b) represent the low resolution version; and (c) send the additional resolution of the image data after rendering the low resolution version. The method of claim 15, further comprising sending temporal data while the low resolution version of the image data is being sent and displayed. The method of claim 2, wherein the temporal data includes video data, and wherein the step of decompressing the non-temporal data and producing the output image includes: (a) separating the video data into a plurality of blocks of data including a set of commands and a set of compressed data; (b) form a table of events from the set of commands, the table of events having a plurality of contents, each content having a time and a list of events that occur over time; (c) storing a compressed information block of the compressed data set; (d) decompressing the compressed block of information in response to a command in the event table, thereby obtaining output pixel data; and (e) decoding output pixel data. 18. The method of claim 17, and wherein the step of decompressing the non-temporal data and producing the output image further includes changing the output pixel data in response to a command from the event table. 19. A system for programming or planning and processing non-temporal data and temporal data in a unified data stream, comprising: (a) a scheduler or glider, configured to interlace non-temporal data and temporal data to form a unified data stream , at least some of the non-temporal data being in a compressed image format and at least some of the temporary data being in a temporary compressed format; and (b) a processor that decompresses non-temporal data in the non-temporal compressed format, produces an output image from at least some of the uncompressed non-temporal data and uncompressed non-temporal data, if any, and simultaneously reproduces the temporary data while decompressing the temporary data in the temporary compressed format. The system of claim 19, wherein the processor includes: (a) an input buffer that receives at least some of the temporary data from the unified data stream; (b) an element that decompresses the temporary data in the input buffer to obtain uncompressed temporal data; (c) an output buffer, which receives the decompressed temporal data; (d) the input buffer, which receives additional temporal data from the unified data stream; (e) a player, which reproduces the decompressed temporal data held in the output buffer while the additional temporary data is decoded. The system of claim 20, wherein the processor is configured to: (a) determine whether the temporary data held in the output buffer is equal to or greater than a predetermined amount of data known as a pre-initial delay, or if a EOF flag is maintained in the output buffer, the EOF flag indicating the end of the temporary data contained in the unified data stream; (b) if the pre-initial delay or the EOF indication is in the output buffer, look for a break point in the temporal data contained in the output buffer; (c) if a breakpoint is found, reproduce the temporary data in the output buffer until the breakpoint is reached; (d) if no breakpoint is found, determine whether the amount of temporary data in the output buffer is greater than the maximum delay, the maximum delay representing a minimum amount of temporary data required to be in the output buffer to reproduce when 'there is no breaking point; and (e) if the amount of temporary data in the output buffer is greater than the maximum delay, reproduce the temporary data in the output buffer. 22. The system of claim 21, wherein the processor is further configured to: (a) detect if a stop occurred during playback of the temporary data; and (b) if a stop condition has occurred, replace the maximum delay with a re-start delay, the restart delay by increasing the amount of temporary data required to be in the output buffer to play when no point is found. breaking off. The system of claim 19, wherein the scheduler or glider is configured to: (a) establish a plurality of markers in the temporary data at various times in the unified data stream; (b) verifying the unified data stream to determine a time when each marker of the plurality of markers is received by a player; and (c) dividing an actual reproduction time between selected markers by the time between receipt of the selected markers to determine a baud rate for the temporary data. The system of claim 22, wherein the scheduler or glider is further configured to: (a) set the maximum delay, the restart delay, and the pre-initial delay; (b) insert the maximum delay established, the delay of re-start established, and the pre-initial delay established in a start header; and (c) place the input header at the beginning of the unified data stream. The system of claim 19, wherein at least a portion of the temporary data is a MIDI file, the MIDI file including a plurality of tracks, each track having a plurality of messages, each message including time data, and where the programmer or glider is configured to classify the MIDI file using the time data to group messages together to form a plurality of MIDI file portions, each portion of MIDI file representing a specific time. The system of claim 25, wherein the programmer or glider is further configured to: (a) place each portion of the MIDI file in a respective audio segment in the unified data stream, each audio segment having a download time and a time of reproduction; (b) placing at least one segment of image data between a first audio segment and a second audio segment; (c) place a delay data free space after the end of the play time of the first audio segment and the end of the download time of the second audio segment; and (d) placing at least one segment of image data after the delay data free space. 26. The system of claim 26, wherein the scheduler or glider is further configured to: (a) divide an audio segment into a breakpoint in the audio segment to form a first divided audio segment and a second audio segment; split audio; and (b) placing at least one time segment between the first divided audio segment and the second divided audio segment. The system of claim 19, wherein the scheduler or glider is configured to: (a) divide the temporary data into a plurality of audio segments; (b) plotting a download time and a playback time of a selected audio segment and a next audio segment to determine a later start time for the next audio segment; (c) placing all audio segments in the later place possible within the unified data stream to form an initial map that includes the temporary data; and (d) adjusting at least one non-temporal data segment in any free spaces that exist between adjacent audio segments in the initial map and that are large enough to accommodate a non-temporal data segment. The system of claim 27, wherein the scheduler or glider is further configured to: (a) detect whether any overlaps exist between any adjacent audio segments or between any audio segment and any non-temporal segment or between any non-temporal segments; and (b) if any overlap is detected, move at least one audio segment to eliminate the overlap. The system of claim 28, wherein the scheduler or glider is further configured to: (a) after removing any overlaps, determine if any adjacent audio segments can be placed next to each other without any free space between the segments of adjacent audio; and (b) re-locating such audio segments without free space to eliminate free space in order to conserve space in a header located in the unified data stream. 30. The system of claim 29, wherein the scheduler or glider is further configured to eliminate any free spaces between adjacent audio segments and non-temporal segments. 31. The system of claim 19, wherein the non-temporal data includes image data, the system further comprising: (a) a transmission element that allows sending a low resolution version of the image data in the unified data stream before send additional resolution of image data; (b) a screen, which represents the low resolution version; and (c) the transmission element sending the additional resolution of the image data after the low resolution version is rendered. 32. The system of claim 31, wherein the transmission element sends temporal data while the low resolution version of the image data is being sent and displayed. The system of claim 19, wherein the temporal data includes video data, and wherein the processor is configured to: (a) separate the video data into a plurality of data blocks, including a set of commands and a set of compressed data; (b) form a table of events from the set of commands, the table of events having a plurality of contents, each containing a time and a list of events that occur in time; (c) storing a block of compressed information from the compressed data set; (d) decompressing the compressed block of information in response to a command in the event table, thereby obtaining output pixel data; and (e) decoding the output pixel data. 34. The system of claim 33, wherein the processor is further configured "to change the output pixel data in response to an event table command." 35. A method for determining a baud rate for temporal data in a unified data stream, including temporal and non-temporal data, the method comprising: (a) interlacing temporal and non-temporal data to form the unified data stream, (b) establishing a plurality of markers in the temporal data at various times in the unified data stream, (c) verifying the unified data stream to determine a time when each marker of the plurality of markers is received by a player, and (d) dividing an actual playing time between selected markers among the time between receiving the selected markers to determine a baud rate for temporary data 36. A method for minimizing Paros in images displayed in a computer animated slide show, comprising: (a) interlacing temporal and non-temporal data to create a computer slide show incorporated in a unified data stream, the unified data stream including a plurality of image packets and a plurality of non-temporal data packets; (b) assign reproduction times to each of the image packages; (c) classifying the image packets by playing time; (d) calculate a download time for each image pack; (e) placing a later image pack that has a download time that is completed the later time after the playback time for that image data; (f) calculating a quantity of data between the download time and the playback time of the later image pack; and (g) reproducing the unified data stream only when at least the amount of data has been received. 37. A method for programming or planning and processing a computer slide show, the slide show including image data and sound data in a unified data stream, the method comprising: (a) interlacing image data, data from sound, and command data to form a slide show incorporated in a unified data stream; and (b) process the unified data stream, including: (1) separate image data, sound data, and command data; (2) generating a plurality of time-event commands from the command data, each event including time and event information for the slide show, (3) separating the image data into a plurality of image elements, each element including at least one image data packet, (4) separating the sound data into a plurality of sound elements, each element including at least one sound data packet, (5) decoding each of the audio packets. image data in response to a corresponding time-event command, and (6) decoding each of the sound data packets in response to a corresponding time-event command. 38. The method of claim 37, wherein the processing step further includes: (a) processing in images each of the image data packets in response to a corresponding time-event command, if any; and (b) process in sound each of the sound data packets in response to a corresponding time-event command, if any. 39. The method of claim 38, wherein the processing step further includes: (a) storing at least one of the decoded image packets so that the stored image pack can be processed essentially at any time; and (b) storing at least one of the decoded sound packets so that the stored sound packet can be processed essentially at any time. 40. The method of claim 38, further comprising simultaneously carrying out steps (1) to (6) of the processing step.
MXPA/A/1999/004778A 1996-11-25 1999-05-24 System and method for scheduling and processing image and sound data MXPA99004778A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08755586 1996-11-25

Publications (1)

Publication Number Publication Date
MXPA99004778A true MXPA99004778A (en) 2000-05-01

Family

ID=

Similar Documents

Publication Publication Date Title
US5951646A (en) System and method for scheduling and processing image and sound data
JP3739609B2 (en) Method and apparatus for adaptive synchronization of digital video and audio playback in multimedia playback systems
JP4429643B2 (en) Method and system for processing digital data rates and directed playback changes
US7076535B2 (en) Multi-level skimming of multimedia content using playlists
US7237254B1 (en) Seamless switching between different playback speeds of time-scale modified data streams
JP5266327B2 (en) Synchronization of haptic effect data in media transport streams
JP2008136204A (en) System and method for streaming, receiving and processing flex mux stream
US6414972B1 (en) Signal decoding method, signal decoding apparatus, signal multiplexing method, signal multiplexing apparatus, and recording medium
JPH08237650A (en) Synchronizing system for data buffer
JP2001086460A (en) Method and device for accelerating transcoding
EP1049984A4 (en) Method and system for client-server interaction in interactive communications
MXPA99004778A (en) System and method for scheduling and processing image and sound data
CN111866542B (en) Audio signal processing method, multimedia information processing device and electronic equipment
WO1998001970A1 (en) Data multiplexing method, data multiplexer using the multiplexing method, multiple data repeater, multiple data decoding method, multiple data decoding device using the decoding method, and recording medium on which the methods are recorded
KR100984915B1 (en) Method and apparatus for decoding a data stream in audio video streaming systems
GB2348069A (en) Representation of a slide-show as video
KR100336501B1 (en) A method and decoder for synchronizing textual data to MPEG-1 multimedia streams
US20210118273A1 (en) Systems, devices, and methods for encoding haptic tracks
JP2005159878A (en) Data processor and data processing method, program and storage medium
JP2005176094A (en) Data processor, data processing method, program and storage medium