GB2465772A - Analysing memory accessed by an application - Google Patents
Analysing memory accessed by an application Download PDFInfo
- Publication number
- GB2465772A GB2465772A GB0821735A GB0821735A GB2465772A GB 2465772 A GB2465772 A GB 2465772A GB 0821735 A GB0821735 A GB 0821735A GB 0821735 A GB0821735 A GB 0821735A GB 2465772 A GB2465772 A GB 2465772A
- Authority
- GB
- United Kingdom
- Prior art keywords
- memory
- application
- access attribute
- page
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 28
- 238000003860 storage Methods 0.000 claims abstract description 11
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000008901 benefit Effects 0.000 abstract description 4
- 238000012545 processing Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 7
- 230000002093 peripheral effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000002195 synergetic effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 239000002245 particle Substances 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 101100442582 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) spe-1 gene Proteins 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241000533950 Leucojum Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002311 subsequent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1405—Saving, restoring, recovering or retrying at machine instruction level
- G06F11/141—Saving, restoring, recovering or retrying at machine instruction level for bus or memory accesses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1416—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
In a method of analysing memory accessed by a first application, the access attribute of each page of memory under analysis is set to be read-only, so as to force an interrupt if the first application attempts to modify the contents of a memory page, said interrupt passing control from the first application to a second application. The second application duplicates the contents of the memory page to storage and then sets the access attribute of the memory page to allow the first application to modify its contents. In this way, only those pages of memory that have been changed are archived, thereby significantly reducing memory and computational load. The supervisory computational load is further reduced as it takes advantage of existing read/write checks. A record of the current and previous memory states of the memory system is thus obtained for a significantly smaller load on resources.
Description
APPARATUS AND METHOD OF ANALYSIS
The present invention relates to an apparatus and method of analysis.
In conventional computer software debugging and analysis tools, the state of a computer system and executable code under analysis is determined in one of two ways: either a step-by-step trace of the system state is performed, or a so-called core-dump' of the system memory is performed, either upon program failure or, where such failure might prevent the implementation of a core dump, periodically during execution of the code.
Such activities can be highly invasive on the operation of the code, requiring interruption by supervisory software, and can impose a significant computational and memory cost in the monitoring and/or duplication of system memory.
This cost becomes more relevant still in the case of applications that interact with hardware in a time-critical fashion. For example, a video game may be required to output a screen frame refresh at 60Hz in synchrony with a television screen, and may be utilising a high proportion of system resources to achieve this frame rate. The debugging and analysis tools described above may then impose an unacceptably high additional load on the system, making it difficult for the system to maintain the required frame rate.
The present invention seeks to mitigate or alleviate the above problem.
In a first aspect of the present invention, a method of memory analysis for memory accessible by a first application comprises the steps of setting the access attribute of a page of the memory under analysis to be read-only, so as to force an interrupt if the first application attempts to modify the contents of the memory page, the interrupt passing control from the first application to a second application, and the second application duplicating the contents of the memory page to storage and then setting the access attribute of the memory page to allow the first application to modify its contents.
In another aspect of the present invention, apparatus for analysing memory accessible by a first application comprises a memory comprising one or more memory pages, the or each memory page comprising an access attribute, access attribute setting means operable to set the access attribute of the or each page of a memory under analysis to be read-only, so as to force an interrupt if the first application attempts to modify the contents of such a memory page, program interrupt means operable to pass control from the first application to an archive means, the archive means being operable to store a duplicate of the contents of a memory page to which the first application attempted to write and operable to then instruct the access attribute setting means to set the access attribute of that memory page to allow the first application to modif' its contents.
Advantageously, by only archiving those pages of memory that are about to be modified by an application, the computational load is reduced and is distributed over the course of operation of the application. Moreover, by appropriating an existing system of error interrupts through the non-intuitive technique of denying the application memory access by default, the ongoing load of the supervisory software is significantly reduced.
Further respective aspects and features of the invention are defined in the appended claims.
Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which: Figure 1 is a schematic diagram of an entertainment device; Figure 2 is a schematic diagram of a cell processor; Figure 3 is a schematic diagram of a video graphics processor; Figures 4A-D are a schematic diagram of current and archived memory pages; and Figure 5 is a flow diagram of a method of analysis.
An apparatus and method of analysis are disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to a person skilled in the art that these specific details need not be employed to practise the present invention.
Conversely, specific details known to the person skilled in the art are omitted for the purposes of clarity where appropriate.
In an example embodiment of the present invention, an entertainment device such as the Sony� Playstation 3� entertainment device or a development kit substantially replicating the functionality of the Playstation 3 entertainment device comprises system memory that is divided into so-called pages of typically 4096 bytes. Each page has associated with it an attribute, such as read, write or execute, indicating the type of memory it is currently operating as. Periodically, for example after every screen frame update (e.g. when a render buffer is flipped), every page attribute is set to read-only. Consequently, when the program under analysis attempts to write to a page of the system memory this causes an error interrupt to the operating system. The operating system then copies that particular page to a store and changes the page attribute to write', allowing the original update to then proceed. In this way, only those pages of the memory that have been changed are archived, thereby significantly reducing memory and computational load, and the supervisory computational load is further reduced as it takes advantage of existing read/write access checks, using the non-intuitive method of making all the memory read-only. In this way, a record of the current and previous memory states of the system memory are obtained for a significantly smaller load on resources.
As noted above, in embodiments of the present invention the Sony� Playstation 3� is a suitable entertainment device. Figure 1 schematically illustrates the overall system architecture of the Sony� Playstation 3�. A system unit 10 is provided, with various peripheral devices connectable to the system unit.
The system unit 10 comprises: a Cell processor 100; a Rambus� dynamic random access memory (XDRAM) unit 500; a Reality Synthesiser graphics unit 200 with a dedicated video random access memory (VRAM) unit 250; and an I/O bridge 700.
The system unit 10 also comprises a Blu Ray� Disk BD-ROM� optical disk reader 430 for reading from a disk 440 and a removable slot-in hard disk drive (HDD) 400, accessible through the I/O bridge 700. Optionally the system unit also comprises a memory card reader 450 for reading compact flash memory cards, Memory Stick� memory cards and the like, which is similarly accessible through the I/O bridge 700.
The 110 bridge 700 also connects to four Universal Serial Bus (USB) 2.0 ports 710; a gigabit Ethernet port 720; an IEEE 802.llb/g wireless network (Wi-Fi) port 730; and a Bluetooth� wireless link port 740 capable of supporting up to seven Bluetooth connections.
In operation the I/O bridge 700 handles all wireless, USB and Ethernet data, including data from one or more game controllers 751. For example when a user is playing a game, the I/O bridge 700 receives data from the game controller 751 via a Bluetooth link and directs it to the Cell processor 100, which updates the current state of the game accordingly.
The wireless, USB and Ethernet ports also provide connectivity for other peripheral devices in addition to game controllers 751, such as: a remote control 752; a keyboard 753; a mouse 754; a portable entertainment device 755 such as a Sony Playstation Portable� entertainment device; a video camera such as an EyeToy� video camera 756; and a microphone headset 757. Such peripheral devices may therefore in principle be connected to the system unit 10 wirelessly; for example the portable entertainment device 755 may communicate via a Wi-Fi ad-hoc connection, whilst the microphone headset 757 may communicate via a Bluetooth link.
The provision of these interfaces means that the Playstation 3 device is also potentially compatible with other peripheral devices such as digital video recorders (DVRs), set-top boxes, digital cameras, portable media players, Voice over IP telephones, mobile telephones, printers and scanners.
In addition, a legacy memory card reader 410 may be connected to the system unit via a USB port 710, enabling the reading of memory cards 420 of the kind used by the Playstation� or Playstation 2� devices.
In the present embodiment, the game controller 751 is operable to communicate wirelessly with the system unit 10 via the Bluetooth link. However, the game controller 751 can instead be connected to a USB port, thereby also providing power by which to charge the battery of the game controller 751. In addition to one or more analogue joysticks and conventional control buttons, the game controller is sensitive to motion in 6 degrees of freedom, corresponding to translation and rotation in each axis. Consequently gestures and movements by the user of the game controller may be translated as inputs to a game in addition to or instead of conventional button or joystick commands. Optionally, other wirelessly enabled peripheral devices such as the Playstation Portable device may be used as a controller. In the case of the Playstation Portable device, additional game or control information (for example, control instructions or number of lives) may be provided on the screen of the device. Other alternative or supplementary control devices may also be used, such as a dance mat (not shown), a light gun (not shown), a steering wheel and pedals (not shown) or bespoke controllers, such as a single or several large buttons for a rapid-response quiz game (also not shown).
The remote control 752 is also operable to communicate wirelessly with the system unit 10 via a Bluetooth link. The remote control 752 comprises controls suitable for the operation of the Blu Ray Disk BD-ROM reader 430 and for the navigation of disk content.
The Blu Ray Disk BD-ROM reader 430 is operable to read CD-ROMs compatible with the Playstation and PlayStation 2 devices, in addition to conventional pre-recorded and recordable CDs, and so-called Super Audio CDs. The reader 430 is also operable to read DVD-ROMs compatible with the Playstation 2 and PlayStation 3 devices, in addition to conventional pre-recorded and recordable DVDs. The reader 430 is further operable to read BD-ROMs compatible with the Playstation 3 device, as well as conventional pre-recorded and recordable Blu-Ray Disks.
The system unit 10 is operable to supply audio and video, either generated or decoded by the Playstation 3 device via the Reality Synthesiser graphics unit 200, through audio and video connectors to a display and sound output device 300 such as a monitor or television set having a display 305 and one or more loudspeakers 310. The audio connectors 210 may include conventional analogue and digital outputs whilst the video connectors 220 may variously include component video, S-video, composite video and one or more High Definition Multimedia Interface (HDMI) outputs. Consequently, video output may be in formats such as PAL orNTSC, or in 72Op, 1080i or lO8Op high definition.
Audio processing (generation, decoding and so on) is performed by the Cell processor 100. The Playstation 3 device's operating system supports Dolby� 5.1 surround sound, Dolby� Theatre Surround (DTS), and the decoding of 7.1 surround sound from Blu-Ray� disks.
In the present embodiment, the video camera 756 comprises a single charge coupled device (CCD), an LED indicator, and hardware-based real-time data compression and encoding apparatus so that compressed video data may be transmitted in an appropriate format such as an intra-image based MPEG (motion picture expert group) standard for decoding by the system unit 10. The camera LED indicator is arranged to illuminate in response to appropriate control data from the system unit 10, for example to signify adverse lighting conditions. Embodiments of the video camera 756 may variously connect to the system unit 10 via a USB, Bluetooth or Wi-Fi communication port. Embodiments of the video camera may include one or more associated microphones and also be capable of transmitting audio data. In embodiments of the video camera, the CCD may have a resolution suitable for high-definition video capture. In use, images captured by the video camera may for example be incorporated within a game or interpreted as game control inputs.
In general, in order for successful data communication to occur with a peripheral device such as a video camera or remote control via one of the communication ports of the system unit 10, an appropriate piece of software such as a device driver should be provided.
Device driver technology is well-known and will not be described in detail here, except to say that the skilled man will be aware that a device driver or similar software interface may be required in the present embodiment described.
Referring now to Figure 2, the Cell processor 100 has an architecture comprising four basic components: external input and output structures comprising a memory controller 160 and a dual bus interface controller 170A,B; a main processor referred to as the Power Processing Element 150; eight co-processors referred to as Synergistic Processing Elements (SPEs) 1 1OA-H; and a circular data bus connecting the above components referred to as the Element Interconnect Bus 180. The total floating point performance of the Cell processor is 218 GFLOPS, compared with the 6.2 GFLOPs of the Playstation 2 device's Emotion Engine.
The Power Processing Element (PPE) 150 is based upon a two-way simultaneous multithreading Power 970 compliant PowerPC core (PPU) 155 running with an internal clock of 3.2 GHz. It comprises a 512 kB level 2 (L2) cache and a 32 kB level 1 (LI) cache. The PPE 150 is capable of eight single position operations per clock cycle, translating to 25.6 GFLOPs at 3.2 GHz. The primary role of the PPE 150 is to act as a controller for the Synergistic Processing Elements 11 OA-H, which handle most of the computational workload.
In operation the PPE 150 maintains a job queue, scheduling jobs for the Synergistic Processing Elements 1 1OA-H and monitoring their progress. Consequently each Synergistic Processing Element 1 bA-I-I runs a kernel whose role is to fetch a job, execute it and synchronise with the PPE 150.
Each Synergistic Processing Element (SPE) 1 bA-H comprises a respective Synergistic Processing Unit (SPU) 1 20A-H, and a respective Memory Flow Controller (MFC) 140A-H comprising in turn a respective Dynamic Memory Access Controller (DMAC) 142A-H, a respective Memory Management Unit (MMU) 1 44A-H and a bus interface (not shown).
Each SPU 120A-H is a RISC processor clocked at 3.2 GHz and comprising 256 kB local RAM 1 30A-H, expandable in principle to 4 GB. Each SPE gives a theoretical 25.6 GFLOPS of single precision performance. An SPU can operate on 4 single precision floating point members, 4 32-bit numbers, 8 16-bit integers, or 16 8-bit integers in a single clock cycle. In the same clock cycle it can also perform a memory operation. The SPU 1 20A-H does not directly access the system memory XDRAM 500; the 64-bit addresses formed by the SPU 120A-H are passed to the MFC 140A-H which instructs its DMA controller 142A-H to access memory via the Element Interconnect Bus 180 and the memory controller 160.
The Element Interconnect Bus (BIB) 180 is a logically circular communication bus internal to the Cell processor 100 which connects the above processor elements, namely the PPE 150, the memory controller 160, the dual bus interface 170A,B and the 8 SPEs 1 bOA-H, totalling 12 participants. Participants can simultaneously read and write to the bus at a rate of 8 bytes per clock cycle. As noted previously, each SPE 11OA-H comprises a DMAC 142A-H for scheduling longer read or write sequences. The EIB comprises four channels, two each in clockwise and anti-clockwise directions. Consequently for twelve participants, the longest step-wise data-flow between any two participants is six steps in the appropriate direction. The theoretical peak instantaneous EIB bandwidth for 12 slots is therefore 96B per clock, in the event of full utilisation through arbitration between participants. This equates to a theoretical peak bandwidth of 307.2 GB/s (gigabytes per second) at a clock rate of 3.2GHz.
The memory controller 160 comprises an XDRAM interface 162, developed by Rambus Incorporated. The memory controller interfaces with the Rambus XDRAM 500 with a theoretical peak bandwidth of 25.6 GB/s.
The dual bus interface 1 70A,B comprises a Rambus FlexIO� system interface 172A,B. The interface is organised into 12 channels each being 8 bits wide, with five paths being inbound and seven outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) between the Cell processor and the 110 Bridge 700 via controller I 70A and the Reality Simulator graphics unit 200 via controller 1 70B.
Data sent by the Cell processor 100 to the Reality Simulator graphics unit 200 will typically comprise display lists, being a sequence of commands to draw vertices, apply textures to polygons, specify lighting conditions, and so on.
Referring now to Figure 3, the Reality Simulator graphics (RSX) unit 200 is a video accelerator based upon the NVidia� G70/7 1 architecture that processes and renders lists of commands produced by the Cell processor 100. The RSX unit 200 comprises a host interface 202 operable to communicate with the bus interface controller 1 70B of the Cell processor 100; a vertex pipeline 204 (VP) comprising eight vertex shaders 205; a pixel pipeline 206 (PP) comprising 24 pixel shaders 207; a render pipeline 208 (RP) comprising eight render output units (ROPs) 209; a memory interface 210; and a video converter 212 for generating a video output. The RSX 200 is complemented by 256 MB double data rate (DDR) video RAM (VRAM) 250, clocked at 600MHz and operable to interface with the RSX 200 at a theoretical peak bandwidth of 25.6 GB/s. In operation, the VRAM 250 maintains a frame buffer 214 and a texture buffer 216. The texture buffer 216 provides textures to the pixel shaders 207, whilst the frame buffer 214 stores results of the processing pipelines. The RSX can also access the main memory 500 via the EIB 180, for example to load textures into the VRAM 250.
The vertex pipeline 204 primarily processes deformations and transformations of vertices defining polygons within the image to be rendered.
The pixel pipeline 206 primarily processes the application of colour, textures and lighting to these polygons, including any pixel transparency, generating red, green, blue and alpha (transparency) values for each processed pixel. Texture mapping may simply apply a graphic image to a surface, or may include bump-mapping (in which the notional direction of a surface is perturbed in accordance with texture values to create highlights and shade in the lighting model) or displacement mapping (in which the applied texture additionally perturbs vertex positions to generate a deformed surface consistent with the texture).
The render pipeline 208 performs depth comparisons between pixels to determine which should be rendered in the final image. Optionally, if the intervening pixel process will not affect depth values (for example in the absence of transparency or displacement mapping) then the render pipeline and vertex pipeline 204 can communicate depth information between them, thereby enabling the removal of occluded elements prior to pixel processing, and so improving overall rendering efficiency. In addition, the render pipeline 208 also applies subsequent effects such as full-screen anti-aliasing over the resulting image.
Both the vertex shaders 205 and pixel shaders 207 are based on the shader model 3.0 standard. Up to 136 shader operations can be performed per clock cycle, with the combined pipeline therefore capable of 74.8 billion shader operations per second, outputting up to 840 million vertices and 10 billion pixels per second. The total floating point performance of the is RSX 200 is 1.8 TFLOPS.
Typically, the RSX 200 operates in close collaboration with the Cell processor 100; for example, when displaying an explosion, or weather effects such as rain or snow, a large number of particles must be tracked, updated and rendered within the scene. In this case, the PPU 155 of the Cell processor may schedule one or more SPEs 11OA-H to compute the trajectories of respective batches of particles. Meanwhile, the RSX 200 accesses any texture data (e.g. snowflakes) not currently held in the video RAM 250 from the main system memory 500 via the element interconnect bus 180, the memory controller 160 and a bus interface controller 170B. The or each SPE 1 1OA-H outputs its computed particle properties (typically coordinates and normals, indicating position and attitude) directly to the video RAM 250; the DMA controller 142A-H of the or each SPE 1 IOA-H addresses the video RAM 250 via the bus interface controller 1 70B. Thus in effect the assigned SPEs become part of the video processing pipeline for the duration of the task.
In general, the PPU 155 can assign tasks in this fashion to six of the eight SPEs available; one SPE is reserved for the operating system, whilst one SPE is effectively disabled. The disabling of one SPE provides a greater level of tolerance during fabrication of the Cell processor, as it allows for one SPE to fail the fabrication process. Alternatively if all eight SPEs are functional, then the eighth SPE provides scope for redundancy in the event of subsequent failure by one of the other SPEs during the life of the Cell processor.
The PPU 155 can assign tasks to SPEs in several ways. For example, SPEs may be chained together to handle each step in a complex operation, such as accessing a DVD, video and audio decoding, and error masking, with each step being assigned to a separate SPE.
Alternatively or in addition, two or more SPEs may be assigned to operate on input data in parallel, as in the particle animation example above.
Software instructions implemented by the Cell processor 100 and/or the RSX 200 may be supplied at manufacture and stored on the HDD 400, and/or may be supplied on a data carrier or storage medium such as an optical disk or solid state memory, or via a transmission medium such as a wired or wireless network or internet connection, or via combinations of these.
The software supplied at manufacture comprises system firmware and the Playstation 3 device's operating system (OS). In operation, the OS provides a user interface enabling a user to select from a variety of functions, including playing a game, listening to music, viewing photographs, or viewing a video. The interface takes the form of a so-called cross media-bar (XMB), with categories of function arranged horizontally. The user navigates by moving through the function icons (representing the functions) horizontally using the game controller 751, remote control 752 or other suitable control device so as to highlight a desired function icon, at which point options pertaining to that function appear as a vertically scrollable list of option icons centred on that function icon, which may be navigated in analogous fashion. However, if a game, audio or movie disk 440 is inserted into the BD-ROM optical disk reader 430, the Playstation 3 device may select appropriate options automatically (for example, by commencing the game), or may provide relevant options (for example, to select between playing an audio disk or compressing its content to the HDD 400).
In addition, the OS provides an on-line capability, including a web browser, an interface with an on-line store from which additional game content, demonstration games (demos) and other media may be downloaded, and a friends management capability, providing on-line communication with other Playstation 3 device users nominated by the user of the current device; for example, by text, audio or video depending on the peripheral devices available. The on-line capability also provides for on-line communication, content download and content purchase during play of a suitably configured game, and for updating the firmware and OS of the Playstation 3 device itself. It will be appreciated that the term "on-line" does not imply the physical presence of wires, as the term can also apply to wireless connections of various types.
A snapshot of the current memory state may be taken periodically or following a particular event, either in hardware operation, data input or software execution. Often software execution itself is cyclic and so a periodic software event may be used.
For example, in a videogame application a periodic software event is the flipping of the render buffer. This is when a video image has been prepared in off-screen video memory, and then the displayed screen is updated by panning or flipping to that memory area. In a video game this typically indicates that the job of updating the game is complete for one cyclic execution, or period. Thus a snapshot immediately afterward may be appropriate in preparation for the next period.
Depending on the display format, typically this update occurs with a periodicity of 50 or 60Hz.
Within the ensuing 1150th or 1160th of a second after such an update, the videogame evaluates the game state, reads inputs (such as user inputs or network inputs), updates the game state accordingly (e.g. various health statistics, the positions of various characters and of the virtual camera viewpoint, or the addition or removal of characters, objects and locations), and generates a display list for the video hardware (such as the RSX 200) to use in order to prepare a rendering of the updated scene, which is then flipped to at the next 1150th or 1160th of a second.
Any software bugs that occur in this process may not reveal themselves in the first period; for example they may be dependent upon a particular combination of circumstances such as player in-game health, in-game locations or events, use of a particular in-game resource, or certain interactions with specific computer controlled characters. In a videogame such circumstances may be difficult to create repeatably or predictably, and so an ongoing memory snapshot process is desirable. But as noted previously, a full memory dump every or 11601h of a second may impose an unacceptable load on the entertainment device while it is running a test version of the game.
Referring now to Figures 4A-D, in an embodiment of the present invention the memory comprises pages 501-507 that are typically 4096 bytes in size, and optionally the hardware is designed to allow rapid transfer of such memory pages (as is the case with the PS3 entertainment device's hardware). Each page is logically distinct and has an associated attribute 510 that may indicate that the memory page is read-only memory (i.e. cannot be updated by the game application), writable memory (i.e. can be updated by the game application) or optionally execute memory (i.e. contains all or part of the game application).
The pages may be contiguous, separate, or for three or more pages a mix of the two.
In Figure 4A, a snapshot of the current memory state at the start of a new period N (for example, a snapshot of the state of the memory following the preceding cyclic processing S period N-I that ended with a flip of the render buffer, as described above) does not involve dumping a copy of the memory to storage as in the prior art. Instead, the snapshot involves setting all the existing memory page attributes to read only' (i.e. the respective attribute is set to R' in figure 4A), using the memory controller 160, the processor, or similar, according to the system architecture. This preserves the current content of the memory at the moment of the snapshot, as explained below.
Referring to Figure 4B, assume that during the game state update of period N, at some point the game application attempts to write to a memory page 502 that would have had a write attribute but for the snapshot process setting it to read only.
This attempt to write to a read-only memory page causes an error interrupt to the operating system (OS). Upon such an interrupt, the OS is arranged to copy the current contents of the respective memory page (being effectively the state of the memory at the end of period N-i) to an archive (forming an archive page 502'), and then to set the respective memory page attribute to write' (W' in figure 4B) before returning control to the game application. The game application can then write to the memory page as intended, thereby creating a modified memory page 502A representing the state of the memory page during current period N. Figure 4C shows a further attempt to write to a second memory page 506 during the same period N, similarly resulting in the generation of a second archive page 506' and the setting of the respective memory page attribute to write'.
This therefore means that only those pages that are modified by the game application get archived, and that the archival process is distributed over the course of a game state update period. Moreover, the overhead due to supervision of the archival process is relatively low as it takes advantage of the existing error interrupt system by deliberately creating the circumstances that will later give rise to such an error at the moment that the snapshot is effectively taken.
This provides at least the following advantages: Firstly, the additional memory resources required to archive the pages will be less than those required for storing the entire memory.
Secondly, the time required to archive an individual page is much smaller than that required to copy the entire memory, reducing the chances of conflicts with other hardware events; and, due to the distributed nature of such archiving events, increasing the chance that any such conflict with hardware is recoverable during the remainder of the period.
S Thirdly, the computational overhead imposed on a game application that may already be close to the resource limit of the entertainment device is significantly less than if an additional supervisory program was running in parallel with the game application arid the OS.
Referring to Figure 4D, in an embodiment of the present invention, for the snapshot at the next period N+1 the attributes are again set back to read only, thereby preserving the state of the modified memory as it was at the end of period N.. Meanwhile, the archived memory pages 502', 506' containing the state of the memory at the end of period N-i are discarded (for example by removing file allocation table data or pointer data, which is faster than actual file deletion).
In this way, the system can record the memory states of the previous and current periods in an ongoing fashion until some event causes the application to crash, at which point the contents of the previous period's memory (comprising un-modified read-only memory pages and any archived memory pages) and the current period's memory (comprising un-modified read-only memory pages and any modified memory pages) can be inspected. For example, if the application crashed towards the end of period N, then memory pages 501, 502', 502A, 503-505, 506', 506A and 507 would be available for inspection.
In another embodiment of the present invention, archived pages are not immediately discarded upon the next snapshot, but are associated with a period number and retained, typically for a set number of periods (for example between 2 and 10) but potentially for as many periods as an allocation of storage will allow.
In this way, a reconstruction of the memory for the last K periods (where K is the retention length) is possible. This may be useful, for example, where an error only becomes apparent when in some circumstance the application attempts to properly access data that has been corrupted at some point in the past by a bug.
Additionally, in an optional embodiment of the present invention, one or more memory pages/areas can be marked as not being part of the preceding memory analysis scheme, for example where they are used by an application other than the one under analysis.
This can be achieved for example either by additional attribute bits (flag bits) or by a look-up table accessible by the OS and snapshot process.
Alternatively or in addition, in an embodiment of the present invention, when the snapshot occurs immediately after flipping the render buffer or some other predetermined point, the system registers are also archived (for example using a conventional technique, as they are relatively small) and the graphics hardware andlor other selected hardware components (e.g. audio hardware, network hardware) are placed in a predetermined state. In this way the full system state of the device can be reconstructed from the combination of the predetermined hardware state, the system registers and the memory archive. In a similar manner to that noted above in which archived pages are associated with a period number, optionally the system registers can also be respectively associated with and stored over K periods, allowing a system state reconstruction over the last K periods.
In addition, other aspects of the system could be recorded to improve the fidelity of the reconstruction, such as thread schedules, user and network input, etc. In principle the additional load from these activities will be offset by the reduced load from archiving the memory itself in the manner described herein.
In this way, an improved reconstruction of the events that lead up to a crash can potentially be analysed in detail from the start of a period up until the instruction that initiated the crash, and then optionally events can also be wound back from that moment or just prior to it, to determine the cause.
It will be appreciated that the above methods are applicable to any suitable paged memory, or any memory that is logically divisible based upon read and write attributes.
Potentially this includes memory in the SPE, PPU or main memory.
It will similarly be appreciated that pages can be archived to any suitable storage means, whether this is a memory area not currently under analysis, a hard drive or a flash drive, subject to any write speed constraints set by a user of the snapshot tool.
It will also be appreciated that whilst the above embodiments refer to a game application, they may be used as applicable with any analysed application.
Likewise it will also be appreciated that whilst the interrupt will typically pass control to the operating system, in principle it could be arranged to pass control to any second application arranged to archive the memory pages in the manner described herein.
Referring now to Figure 5, a method of analysis for memory accessible by a first application comprises: in a first step slO, setting the access attribute of a page of the memory under analysis to be read-only, so as to force an interrupt if the first application attempts to modify the contents of the memory page; in a second step s20, the interrupt passing control from the first application to a second application; in a third step s30, the second application duplicating the contents of the memory page to storage; and in a fourth step s40, setting the access attribute of the memory page to allow the first application to modify its contents. I0
It will be apparent to a person skilled in the art that variations in the above method corresponding to operation of the various embodiments of the apparatus described above are considered within the scope of the present invention, including but not limited to: -performing the snapshot in response to a particular software or hardware event; -performing the snapshot (i.e. setting the access attribute of the memory page to be read only) periodically; -the period being a specific point in a cyclical software application, such as when a render buffer is flipped; -associating the archived memory page with the period number in which it was archived; -marking memory pages as out of bounds for the above method by use of a flag in or mirroring the page or by use of a lookup table; -placing one or more items of hardware in a predetermined state immediately after the snapshot (typically before the game application resumes control); -archiving the contents of registers during the snapshot; and -recording thread schedules and other environmental factors that may influence operation of the game application.
Finally, it will be appreciated that the methods disclosed herein may be carried out on conventional hardware such as the Sony � P53 �, suitably adapted as applicable by software instruction or by the inclusion or substitution of dedicated hardware.
Thus the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a computer program product comprising processor implementable instructions stored on a data carrier such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or transmitted via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these of other networks, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable circuit suitable to use in adapting the conventional equivalent device.
Claims (15)
- CLAIMS1. A method of memory analysis for memory accessible by a first application, the method comprising the steps of: setting the access attribute of a page of the memory to be read-only, so as to force an interrupt if the first application attempts to modify the contents of the memory page; the interrupt passing control from the first application to a second application, and the second application duplicating the contents of the memory page to storage; and then io setting the access attribute of the memory page to allow the first application to modify its contents.
- 2. A method of memory analysis according to claim 1, in which the access attribute of the memory page is set to be read-only periodically.
- 3. A method of memory analysis according to claim 2, in which the access attribute of the memory page is set to be read-only at a specific point in a cyclic execution of the first application.
- 4. A method of memory analysis according to claim 2 or claim 3, in which the duplicated content of the memory page is associated with a respective period associated in turn with a respective periodic setting of the access attribute of the memory page to be read-only.
- S. A method of memory analysis according to any one of the preceding claims, comprising the step of marking one or more memory pages are as being excluded from the memory analysis by one or more selected from the list consisting of: i. a flag associated with the memory page, andii. a look-up table.
- 6. A method of memory analysis according to any one of the preceding claims, comprising the step of setting selected hardware to a predetermined state when the access attribute of the memory page is set to read-only.
- 7. A method of memory analysis according to any one of the preceding claims, comprising the step of duplicating system register values to storage when the access attribute of the memory page is set to read-only.
- 8. A method of memory analysis according to any one of the preceding claims, in which the access attribute of the memory page is set to read-only immediately after a render buffer is flipped to update a displayed state of the first application.
- 9. A method of memory analysis according to any one of the preceding claims, comprising the step of duplicating thread schedules to storage.
- 10. A computer program for implementing the method of any one of the preceding claims.
- 11. Apparatus for analysing memory accessible by a first application, comprising: a memory comprising one or more memory pages, the or each memory page comprising an access attribute; access attribute setting means operable to set the access attribute of the or each page of a memory under analysis to be read-only, so as to force an interrupt if the first application attempts to modify the contents of such a memory page; program interrupt means operable to pass control from the first application to an archive means; the archive means being operable to store a duplicate of the contents of a memory page to which the first application attempted to write, and operable to then instruct the access attribute setting means to set the access attribute of that memory page to allow the first application to modify its contents.
- 12. Apparatus according to claim 11, in which the access attribute setting means periodically sets the access attribute of the or each memory page to be read-only.
- 13. Apparatus according to claim 12 in which the duplicated content of a memory page to which the first application attempted to write is associated with a respective period.
- 14. Apparatus according to any one of claims 11 to 13, in which the archive means is operable to store a duplicate of the system register values at the time that the access attribute of the or each memory page is set to read only.
- 15. Apparatus according to any one of claims 11 to 14, in which access attribute setting means sets the access attribute of the or each memory page to read only immediately after a render buffer is flipped to update a displayed state of the first application.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0821735A GB2465772A (en) | 2008-11-27 | 2008-11-27 | Analysing memory accessed by an application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0821735A GB2465772A (en) | 2008-11-27 | 2008-11-27 | Analysing memory accessed by an application |
Publications (2)
Publication Number | Publication Date |
---|---|
GB0821735D0 GB0821735D0 (en) | 2008-12-31 |
GB2465772A true GB2465772A (en) | 2010-06-02 |
Family
ID=40230963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0821735A Withdrawn GB2465772A (en) | 2008-11-27 | 2008-11-27 | Analysing memory accessed by an application |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2465772A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105204980A (en) * | 2014-05-26 | 2015-12-30 | 腾讯科技(深圳)有限公司 | Method for testing virtual engine software and testing equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6622263B1 (en) * | 1999-06-30 | 2003-09-16 | Jack Justin Stiffler | Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance |
GB2434890A (en) * | 2006-02-01 | 2007-08-08 | Avaya Tech Llc | Software duplication |
CN101251822A (en) * | 2008-03-11 | 2008-08-27 | 中兴通讯股份有限公司 | Supervising method of internal memory being rewrited |
-
2008
- 2008-11-27 GB GB0821735A patent/GB2465772A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6622263B1 (en) * | 1999-06-30 | 2003-09-16 | Jack Justin Stiffler | Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance |
GB2434890A (en) * | 2006-02-01 | 2007-08-08 | Avaya Tech Llc | Software duplication |
CN101251822A (en) * | 2008-03-11 | 2008-08-27 | 中兴通讯股份有限公司 | Supervising method of internal memory being rewrited |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105204980A (en) * | 2014-05-26 | 2015-12-30 | 腾讯科技(深圳)有限公司 | Method for testing virtual engine software and testing equipment |
CN105204980B (en) * | 2014-05-26 | 2018-10-19 | 腾讯科技(深圳)有限公司 | A kind of test method and test equipment of illusory engine software |
Also Published As
Publication number | Publication date |
---|---|
GB0821735D0 (en) | 2008-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8705845B2 (en) | Entertainment device and method of interaction | |
US9048859B2 (en) | Method and apparatus for compressing and decompressing data | |
JP5746916B2 (en) | Data processing | |
TWI469811B (en) | Method, apparatus, and computer for loading resource file of game engine | |
JP5149985B2 (en) | Graphics processing system with function expansion memory controller | |
EP2422319B1 (en) | Entertainment device, system, and method | |
EP2306398B1 (en) | Image processing method, apparatus and system | |
EP2306399B1 (en) | Image processing method, apparatus and system | |
EP2157545A1 (en) | Entertainment device, system and method | |
US8943130B2 (en) | Method and apparatus for transferring material | |
US10032257B2 (en) | Super resolution processing method, device, and program for single interaction multiple data-type super parallel computation processing device, and storage medium | |
US8360856B2 (en) | Entertainment apparatus and method | |
US20020122058A1 (en) | Information processing system, entertainment system, startup screen display method and information recording medium | |
US8269691B2 (en) | Networked computer graphics rendering system with multiple displays for displaying multiple viewing frustums | |
GB2486663A (en) | Audio data generation using parametric description of features of sounds | |
US20120106930A1 (en) | Shared surface hardware-sensitive composited video | |
GB2473263A (en) | Augmented reality virtual image degraded based on quality of camera image | |
GB2465772A (en) | Analysing memory accessed by an application | |
EP2169622A1 (en) | Apparatus and method of image analysis | |
WO2010139984A1 (en) | Device and method of display | |
WO2021152777A1 (en) | Image processing apparatus and image processing method | |
JP5020443B2 (en) | Method and apparatus for accessing shared resource | |
CN118301405A (en) | Recording method and related equipment | |
CA2398773A1 (en) | Information processing system, entertainment system, startup screen display method and information recording medium | |
AU2001236016A1 (en) | Information processing system, entertainment system,startup screen display method and information recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |