GB2477431A

GB2477431A - Audiotactile vision system

Info

Publication number: GB2477431A
Application number: GB1101732A
Authority: GB
Inventors: David Charles Dewhurst
Original assignee: Individual
Current assignee: Individual
Priority date: 2010-02-01
Filing date: 2011-02-01
Publication date: 2011-08-03
Anticipated expiration: 2031-02-01
Also published as: GB201101732D0; GB2477431B

Abstract

A vision substitution system for communicating audio and tactile representations of features of visual representations 10 includes processing lineal features into the form of apparently-moving audiotactile effects 6 of particular timbre (for example buzzing effects), and outputting said effects to audio and/or tactile displays 24 and 28; with additional effects 16 clarifying shape, slope, and location; and further distinct effects highlighting point-like features 8 and separate effects providing vision data 32 and 34; optionally providing facilities for blind people to create and access audiotactile images, data, graphs and waveforms; and to perceive features produced via computer vision processing, for example the shapes of areas of common properties, the nature of areas of movement, paths followed by moving entities, and symbolic paths representing identified entities such as people's faces.

Description

TITLE: IMPROVED AUDIOTACTILE VISION SYSTEM

BACKGROUND -FIELD OF THE INVENTION

This invention relates to improvements to a vision system method and device for communicating audio and tactile representations of features of visual representations.

BACKGROUND -DISCUSSION OF PRIOR ART

Devices have previously been invented that substitute for aspects of vision with another sense, particularly hearing and touch, and can be useful to blind and partially sighted people. Fournier dAlbe's 1914 Reading Optophone presented the shapes of letters by scanning lines of type with a column of five spots of light, with each spot controlling the volume of a different musical note, producing characteristic sounds for each letter. The invention described in U.S. Pat. No. 5,097,326 and "The vOICe" vision substitution system present live images via sound, and U.S. Pat. No. 6,963,656 describes using musical sequences to convey particular features of images. Similar approaches have been used to "sonify" the lines on a two-dimensional line graph. Typically height is mapped to pitch, intensity to volume (either dark-or light-sounding), with a left-to-right column scan normally used. Demonstrations of"3D" Page 2 sound" environments can include the simulated sounds of a "buzzy" insect in flight, and the motion of the insect can be visualised.

The applicant's UK Pat. No. GB244 1434, entitled "Audiotactile Vision Substitution System", discloses a system for presenting apparently-moving speech-like sounds and corresponding tactile effects (referred to as "tracers") that trace out the shapes of lineal features (e.g. shapes, medial-lines etc.) present in visual representations (e.g. images, abstract shapes, maps etc.), at the same time as presenting information related to those shapes. The main inventive step over prior art was the addition of distinct audiotactile indicator effects (referred to as "corner indicium effects") to highlight corners within shapes and other lineal features. These corner indicator effects included pauses in the movement of the "tracers", and distinct audio and tactile effects such as beeping noises or tactile jolts, to represent corners. Highlighting corners clarified the perception of shapes, particularly for shapes for which corners are essential features but are not sharp angles (e.g. octagons).

The present invention is an improvement of the invention disclosed in UK Pat. No. GB244 1434.

The description and drawings of UK Pat. No. GB244 1434 are incorporated by reference, and copies are readily obtainable from the Internet and elsewhere.

Certain features covered in this application have previously been disclosed before the priority date in published papers, notably: -object tracer paths (including symbolic tracer paths); object-related layouts creating pre-defined guides; processing simple images; and aspects of"polytracers".

One weakness of the system disclosed in UK Pat. No. GB244 1434 was that the shape perceived by users was not always clearly defined if presented via moving speech-like sounds alone. Highlighting corners greatly improves matters (particularly in the tactile modality), but extra cues are needed in the audio modality. Another weakness was that the speech-like sounds were also presenting additional information via volume changes -slow changes to present the size (e.g. width) and other quantities, and a more rapid "flutter" to convey the "texture" of an area -and these distortions could make the speech more difficult to comprehend.

Other "optophone"-like systems typically use a systematic left-to-right "scanning" action (i.e. mapping horizontal location to time), which gives "time-after-start" cues to the horizontal location of material within images. However such cues are not always present in the invention disclosed in UK Pat. No. GB244 1434, as the audiotactile "tracer" can move in any direction when presenting the path of a lineal feature. As a result, users had to rely on stereophonic binaural effects to understand the horizontal location of the tracer, and these effects can be weak. (Audio vertical positioning is better defined, as the pitch of the sound clearly indicates the height within the image.) The weakness in the perception of horizontal location was less of an issue with moving tactile effects, as, for example, a moving force-feedback joystick handle gave clear proprioceptive cues about the location. While the tactile modality is effective, it has the disadvantage that it typically requires the user to hold or touch a tactile display of some kind, and such devices are not always conveniently available.

(The audio modality also has disadvantages, notably interfering with other sounds in the users' Page 3 environment. Hence the audiotactile nature of the system, which allows the user to choose their preferred mode of use.) The prior art systems had limited user interaction, and were primarily intended to present existing images etc. to blind people (albeit with them able to control aspects of the presented visual information).

Blind users could not use them to create their own images (except possibly to monitor" images that they have created using conventional means, such as via a pen and paper).

SUMMARY

In accordance with preferred embodiments a vision system for communicating audio and tactile representations of features of visual representations includes processing lineal features into the fonn of apparently-moving audiotactile effects of particular timbre (for example buzzy effects), and outputting said effects to audio and/or tactile displays; with additional effects clarifying shape, direction and location, and further distinct effects highlighting point-like features, and separate effects providing vision data; optionally providing facilities for blind people to create and access audiotactile images, data, graphs and waveforms, and to perceive features derived from computer vision processing, such as the paths followed by moving entities.

(Note that the names "Microsoft", "Windows", "Visual Basic", "Directlnput", "DirectSound", "Kinect", "Logitech", "Novint", "VirtualBox", "Vmware", and "Photoshop" used in this specification may be Registered Trade Marks.)

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments of the invention will be described with reference to the accompanying drawings in which:-Fig 1 illustrates conceptually the system.

Fig 2 illustrates how image features can be presented via other senses.

Fig 3 shows a low-cost embodiment, which uses only standard hardware components.

Fig 4 shows an example main graphical user interface (GUI) for an embodiment.

Fig 5 shows two shapes that may be confused using audio effects.

Fig 6 illustrates volume profiling that may be applied to produce "pillar" and "layer" effects.

Fig 7 shows further GUT controls used for the "buzz track" feature.

Fig 8 illustrates the paths that region tracers can follow when presenting region layouts.

Fig 9 shows examples the different types of object tracer paths, including outlines, frame, medial and symbolic object tracer paths.

Fig 10 shows several examples of symbolic object tracer paths.

Fig 11 shows several examples of "Object-related Layouts", and how the "segments" and "sub-panels" of an irregularly shaped object are shaped to convey equal areas.

Fig 12 illustrates examples of how "Region Layouts" can be configured from "panels" of "segments".

Page 4 Fig 13 shows how an 8 by 4 layout can be presented via real or coded words (or braille).

Fig 14 illustrates "contoured polytracers".

Fig 15 illustrates "parallel polytracers".

Fig 16 illustrates polytracers based on "circuit medials".

Fig 17 illustrates "rectangular polytracers".

Fig 18 illustrates "branching medial" tracers and polytracers.

Fig 19 show the GUI controls used for the "polytracer" feature.

Fig 20 illustrates the processing of live images into a "guide" prior to presentation.

Fig 21 shows how an image can be marked with "objects" for presenting.

Fig 22 shows an example section of a text file that specifies the objects and markup colours etc. for a pre-defined guide.

Fig 23 illustrates how the text file and bitmaps are combined to produce a guide, which can be bound to a media file.

Fig 24 show how several shades can be processed into a single "blob".

Fig 25 shows detected area of motion.

Fig 26 shows how flow lines can be used to detect direction of movement.

Fig 27 illustrates "CamShift" tracking.

Fig 28 shows the GUI used for marking up images and providing a "drawing/markup" facility.

Fig 29 shows example paths and indicator effect points of audiotactile graphs and charts.

Fig 30 shows a spreadsheet for controlling presented audiotactile graphs and charts.

Fig 31 shows example GUI controls for presenting graphs, charts and waveforms.

Fig 32 shows a spreadsheet for controlling a presented audiotactile waveform that illustrates a Fourier series.

Fig 33 shows a "viewfinder" facility which is used to snap sections of a computer desktop.

DETAILED DESCRIPTION

The embodiments of the present invention address the highlighted prior art issues by adding a separate non-speech soundtrack of particular timbre, that is easier to mentally position" in stereo "soundspace" than speech-like sounds alone, and allows more accurate perception of shape. One effective timbre was a "buzzy" sound, with a clearly defined pitch. The extra non-speech sound track will be referred to as a "buzz track", but the sound timbre can also be of other waveforms such as square or "sawtooth", although buzzy sounds tend to be easier to mentally position in "soundspace" Location and direction cues, and timbre-conveyed and volume-conveyed information, can be added to the buzz track, which can be presented at the same time as speech-like sounds and other information-conveying effects.

Corresponding tactile effects can also be presented. Volume effects can be applied to the buzz track in order to avoid distorting the speech sounds (as was previously the case).

This application discloses an accessible drawing facility that allows blind people to create or add to Page 5 images, by using a standard computer mouse (or joystick) to "draw" shapes etc., and while they are doing so receive feedback from the system that uses similar conventions to those used by the system to present images, so that users may know the location of a "virtual pen' at any moment Facilities for blind people to create and access audiotactile data, charts, graphs and waveforms, and for defining and capturing material visible on a computer screen, are disclosed, as well as a facility for blind people to perceive the visual features derived from computer vision processing, such as the paths followed by moving entities.

Note that this description does not repeat all of the detail contained in UK Pat. No. GB244 1434, which describes construction methods in detail. This description should be read in conjunction with the description and drawings of UK Pat. No. GB244 1434 (with appropriate modifications made where necessary). Copies of UK Pat. No. GB244 1434 are readily obtainable from the Internet and elsewhere.

(Note that UK Pat. No. GB244 1434 and the priority applications are of different titles; and use the term "pre-determined" or "pre-processed" to refer to what is herein referred to as "pre-defined"; and use the term "indicium effects" to refer to what are herein referred to as "indicator effects".) This description will summarise the system, then describe in detail the improvements that constitute the present invention.

Overview and definitions The system aims to simulate the way that sighted people perceive visual features, and the approach is conceptually illustrated in Figs. 1 & 2, with a GUI for controlling the system shown in Fig 4.

-A "viewzone" 2 is the (usually rectangular) part (or whole) of the image being conveyed. The viewzone can be moved to particular areas of an image and resized, and the spatial resolution can be changed. Viewzones are usually square or rectangular, but could be circular, elliptical, or other shapes.

-Entities 4 within the viewzone are presented by the system via audiotactile effects.

-Entities conveyed can be "Objects" 4 or Regions' 46.

-"Objects' are particular entities, such as blobs" identified objects, or movements, that are identified within the viewzone.

-"Regions" (previously called "areas" or sometimes "layouts") are regular rectangular regions for which the system conveys systematic descriptions of the properties, and arrangement of the properties, of the viewzone.

-The audiotactile effects presented by the system are apparently-moving effects and other effects.

-The apparently-moving effects 6 follow paths and are known as "tracers". The corners 8 and other point-like features (such as points on a line graph data) within tracers can be emphasised via distinct effects. The shape can be clarified by using a separate tracer of "buzzy" effects of particular timbre, which can include distinct location-and direction-conveying effects. This is 1own as a "buzz track".

-For Objects, the tracers follow paths that relate to the paths of lineal features of the objects, for Page 6 example their perimeter, medial line 12 or framing rectangle. Alternatively a "symbolic' tracer path can present lines and corners that symbolise the classification of particular objects.

-For Regions, the tracers follow paths that systematically cover the area of the viewzone, and are arranged so that they are moving over the part of the viewzone whose properties are being presented at any moment.

-Separate effects can present visual properties of the entities (whether objects or regions), such as colours or other properties.

-Corners and other point-like ("dimensionless") features are represented by short audiotactile indicator effects.

-The volume of the moving effects (e.g. buzz track) can present additional information via volume changes -a "flutter" to convey the "texture" of an area, and slower changes to present the size -for example the width of an elongated object can be conveyed by altering the volume to correspond to the width at any point along its medial path.

-"Layouts" are effects which convey the spacial arrangement of properties for entiries (whether objects or regions). Layouts can be "Object-related Layouts", or "Region Layouts".

-Object-related layouts can convey the arrangements of properties within the object, or the location of the object within the viewzone, and other arrangements.

-Properties and layouts are presented via speech, braille, tactile impulses, and other audiotactile effects. They can be displayed as a part of an apparently-moving tracer (e.g. moving speech sounds) or be presented separately (e.g. on a braille display). The speech sounds can be words of a natural language, or coded, for example shortened "English" format (e.g. "boo-wuy" or "b-uy" for "blue and white"), or the "International" format that uses "CV" (Consonant-Vowel) -fonnat syllables that are found in most languages.

-The "Playtime" 102 Fig 4 is a user-controlled period of time that gives the time allowed for conveying the contents of the viewzone before the next image is processed, and this may have an impact on how much visual information can be presented.

-Apparently-moving audiotactile tracers are either effects which physically move (e.g. the handle of a force-feedback joystick 14); or effects which appear to move, for example a sequentially-presented group of effects, the individual effects in the group presented time sequentially, each in slightly different locations, so that they collectively produce the impression of a single moving effect.

-Multiple tracers can be used to present the detail of an area in an intuitive manner. They are referred to as "polytracers" and are described in section 4.4 below.

Discussion For any image, or section of an image, the property content (colour, texture etc.) of regular "Regions" (e.g. regular rectangular regions covering all or part of the image) can be presented; or the properties of identified "Objects". (The term "object" is used to refer to a specific entity that is being presented, for Page 7 example a person, a persons face, part of a diagram, a found coloured "blob" etc., whether found by the system, or highlighted in material prepared by a human designer.) As described above, for both "Objects" and "Regions", the information can be presented via apparently-moving audiotactile effects referred to as "Tracers". By smoothly changing the binaural positioning of the sounds and pitching them according to height, they can be made to appear to move in "sound space", whether following a systematic path within a Region, or following the path of a specific shape or feature of an item in an image. In the tactile modality, tracer location and movement can be presented via force-feedback devices such as joysticks (or special apparatus) that move/pull the user's hand and arm. For Objects the path presents the shape, size, location and (if known) the identity of the objects, and for Regions, the tracer path conveys the location and extent of the Regions. "Layouts" present the arrangement of (usually two) properties within an Object or Region, and normally use a regular grid-like format Fig. 12.

In both modalities, the tracers can describe the shape, size and location (and possibly identity) of the Objects or Regions. As the system outputs both audio and tactile effects, users can choose which modality to use; or both modalities can be used simultaneously.

The properties (colours, textures, types etc.) of the Objects or Regions are either presented within the audiotactile tracers, or separately. In the audio modality, speech-like sounds generally present categorical properties (e.g. "boo-wuy" or "b-uy" for "blue and white"). In the tactile modality, Morse-code like "taps" can be presented on a force-feedback device, or alternatively a separate braille display can be used XX Fig. 1. The "layout" (i.e. arrangement) of properties is best presented on a braille display, though there are practical ways of presenting certain object layouts via speech or taps. (Possible codings/mappings for speech etc. are described in UK Pat. No. GB244 1434).

A key feature of the system is the highlighting of corners and other point-like features within shapes, which tests show to be very importing in conveying the shape of an object. Corners are highlighted via audiotactile effects that are included at appropriate points in the shape-conveying tracers.

A key feature of the present invention are apparently-moving effects of particular timbre, especially (but not only) "buzzy" audiotactile effects, known as "buzz tracks", which improve the perception of shapes and other lineal features.

Although one possible tracer path for presenting an object's shape is the object's outline Fig. 1, other paths such as medial lines and frames can be used Fig. 9. "Symbolic Object Paths" are found to be effective, as they present the location, size, orientation and type of object via a single tracer path.

As the system outputs both audio and tactile effects, users can spread the information load to suit their abilities and circumstances: they can choose which modality to use; or both modalities can be used simultaneously, allowing more information to be presented during a certain period of time.

The embodiments can be used by partially-sighted, blind, deafblind and colour-blind people. They may be used as vision substitution systems, as mobility aids, or to find out particular pieces of visual information such as colours, or shapes and corners. They can be used to present shapes to sighted people in various applications, for example as part of a training aid; game; toy or puzzle. The embodiments can Page 8 convey a prepared programme of pre-defined material, and the sounds and tactile effects produced can be used for artistic purposes, and can be recorded or broadcast. Several special applications will be

described later in this description.

Several preferred embodiments will be described. Preferred embodiments can be constructed using bespoke hardware and software, or can be created using existing components with bespoke software. The embodiments use several methods to substitute for aspects of vision, and there is some interaction between the methods. Hence there will be some overlap between topics, and this description contains some repetition and cross-references. Numerous similar methods can be devised, and the scope of the invention is not limited to the examples described herein.

This description includes the following sections, which are numbered so that they can be cross- referenced: - 1. SUMMARY OF METHOD AND DEVICE, AND THEIR OPERATION

2. DESCRIPTION OF PHYSICAL COMPONENTS

2.1 EXAMPLE PROCESSING PLATFORMS 2.2 FORCE-FEEDBACK DEVICES

3 DESCRIPTION OF SOFTWARE

4 KEY FEATURES 4.1 IMPROVING THE PERCEPTION OF SOUND TRACERS ("BUZZ TRACKS") 4.2 TRACER PATHS 4.3 IMPROVING THE PERCEPTION OF IMAGE LAYOUT 44 "POLYTRACERS" 4.5 IMPLEMENTING "BUZZ TRACKS" AND "POLYTRACERS"

S APPLICATIONS

5.1 PRE-DEFINED AND FOUND FEATURES 5.2 CREATING AND USING A PRE-DEFINED "GUIDE" 5.3 USING COMPUTER VISION 5.4 CREATING AND ACCESSING AUDIOTACTILE IMAGES 5.5 CREATING AND ACCESSING DATA, GRAPHS, CHARTS AND WAVEFORMs 5.6 INTERFACING WITH EXTERNAL SYSTEMS 5.7 USING A "VIEWFINDER" TO CAPTURE IMAGES 6 OTHER FEATURES Page 9 1. SUMMARY OF METHOD AND DEVICE, AND THEIR OPERATION With reference to Fig which conceptually illustrates the system, the method and apparatus for communicating aspects of visual representations comprises:-a) Acquiring (or acquiring means for acquiring) lineal features and other data related to said visual representations. The lineal features are line-like features such as lines (of whatever shape; curves; zigzags; etc.); edges/perimeters of shapes 6; lines of symmetry; "medial-lines" 12; etc., whose shapes are to be presented to the user. The data can be features such as colours, and arrangements of colour; corners 8 and other point-like ("dimensionless") entities (such as the data points in a line graph), and other basic visual components; and details of recognised entities such as text, objects etc. The visual features can be acquired by doing optical processing of the visual representations; or by acquiring pre-defined features that have been previously decided, for example by a sighted designer selecting key features in images for presentation, or acquiring features provided by an external application.

Additionally, the area represented by a visual representations, or parts of a visual representations, can be divided into a matrix of elements, for example in the form of columns 18 and/or rows (referred to as "Pillars" and "Layers" respectively), and the points at which the paths of the lineal features cross the borders between the matrix elements (e.g. "Pillars" and/or "Layers") (i.e. the intersection) can be acquired as data.

The visual representations will typically be images of some kind, though they could be for example the descriptions of a shape and its corners, e.g. as provided by a set of coordinates, without taking the form of an image. Other visual representations might be used, for example data that can be presented visually, graphs, charts, maps, paths that moving entities follow, etc. Any type of visual representation is generally suitable. If the visual representations are in the form of images 10, they might be provided by a live video signal (for example from a camera 20, broadcast, Internet transmission etc.) filed images (e.g. held on a computer, storage media or the Internet), for example images in one of the many formats used on computers (e.g. bitmap or JPEG format) frames of a movie read from a movie file (e.g. in.AVI or MPEG format) or from a DVD; etc. The visual representations can be "snaps" of parts of a computer "desktop", or provided by a media player. Zooming" and "panning" facilities can be provided' 104 Fig 4 to allow the user to select areas of the image prior to processing.

The lineal features and data of visual representations can alternatively be acquired from by an external application, or be in the form of a description of a lineal feature and data. For example if a standard shape is to be presented (e.g. the circular paths used to simulate a pie chart XX Fig XX) the lineal features and point-like features can be acquired in the form of data that describes, or gives the parameters of, the sections of the circular path and/or data points.

The lineal features can be the paths followed by moving entities within visual representations, so that the movement of the entities can be communicated.

Page 10 b) Processing (or processing means for processing) the lineal features into the form of at least one apparently-moving effect of particular timbre (e.g. buzzy), and other distinct indicator effects, and other separate property effects. The processing is typically performed by a processor such as a computer 22 (e.g. portable or desktop) or embedded processor, and is described further in UK Pat. No.

GB244 1434 and throughout this description.

c) Display means (or providing display means) comprising audio and/or tactile displays. The provided output apparatus can be audio and/or tactile display apparatus suitable for presenting audio and/or tactile effects. In the audio modality, standard audio equipment can be used, for example an amplifier and stereophonic headphones 24 or loudspeakers (and associated software). These can present, for example, buzzy apparently-moving shape effects to convey the shapes presented 6 (by continuously changing the frequency and binaural positioning of the buzzy sounds); short distinct audio effects to represent corners 8 etc., and separate encoded categorical sounds such as speech 26 to convey other visual properties.

In the tactile modality, a force-feedback type joystick 28 can also be used as flexible tactile display, being suitable for presenting shapes and corners (by moving in the path of the required shapes and triggering distinct effects to represent corners), and categorical features encoded as Morse code-like impulse effects 32, as well as allowing the user to indicate and select parts of the images and communicate with the embodiment. It can also present "buzzy" effects similar to the audio buzzy effects e.g. of frequency relating to the pitch of the audio buzzy effects. Standard or bespoke force-feedback devices can be used. (An "Optacon"TM or similar device can also present shapes.) A braille display 34 can present features, such as colours and entity descriptions, and "layouts".

d) Outputting (or outputting means for outputting) the effects of particular timbre (e.g. buzzy effects), the distinct indicator effects, and the separate property effects, to the audio and/or tactile displays. The effects are output to the audiotactile displays. The "outputting means" can be the hardware/software combination that causes the effects to appear on the displays, for example a computer sound card and the software that controls it, or the software that causes tactile displays to exhibit the required effects.

The shapes, and "Pillar" 18 and "Layer" border crossing points within the shapes, can be presented by continuously moving the position of the effects, and presenting indicator effects at points where border crossing occur (i.e. the intersection of the tracer path and the Pillar or Layer border).

A simple non-coded "novice" mode can be provided whereby colours and "layouts" are spoken directly ("direct description"). This was found to be effective for beginners to use. For colours, standard colour terms can be used; but for layouts short descriptive words can be assigned to small groups of "blobs".

As an option, the sound volume of the words or phonemes used to present colours etc. can be altered to correspond to the amount of each colour present. For example if a blue and red object is being presented and there is more blue then red present, then the sound volume of the words or phonemes representing Page 11 "blue" can be presented louder than those representing "red".

When recognised objects are presented, for examples people's faces, then the actual object description can optionally be presented, rather than the coded object classification.

When a tracer is following a path, the colours (and other features) presented can optionally correspond to the colours sampled adjacent to tracer as it moves along the route of the path.

(Improved effects and new features are described later in this description.) See UK Pat. No. GB244 1434 for additional details of graphical user interfaces (GUTs); speech, braille and coded impulse encoding; activity-related processing; coded impulses; optical processing; communicating lineal features and corners; pre-defined features; communicating texture; timing effect output; "viewzones", zooming and moving; speech synthesis; communicating change and movement; presenting entities; presenting objects and structures; and miscellaneous other features.

An Example of Feature Encoding and Presentation Categorically-described visual properties are normally presented to the user via groups of"CV" (Consonant-Vowel) syllables; via Morse code-like impulses; and via braille.

With reference to Fig 2 the image 40 is reduced to 8 by 8 "segments" 41. The segments in each square of 4 by 4 segments (known as a "panel") are each set to one of the two shades that the system calculates best represent the panel 42. Then the image is presented via audio 43 and two tactile methods 44 & 45.

For each panel, one CV syllable conveys the two selected shades; and two CV syllables convey the arrangement ("Layout") of those two shades, to the level of detail shown in the segmented image 42.

For the top right "panel" 46 in the segmented image 42, the coded CV syllable "WWAE" conveys the two colour shades "white and black", and the two CV syllables "LLXR-RROR" present the "layout" of the two colour shades as 4 by 4 segments. The whole image is conveyed by the four spoken "words" shown 43, and by the corresponding 12 braille cells 44, both of which fully describe the 8 by 8 segments shown in 42. The coded Morse code-like impulses 45 exhibited on the force-feedback joystick 49 present the colour shades.

In this description an area of a viewzone whose colours are usually described by a single pair of colours that the system calculates best represent it is referred to as a "Panel". The pixels within a panel are arranged into equal-sized areas referred to as "Segments", and the arrangement of the colours (selected from the pair of colours) that best represents the segments is referred to as a "Layout".

The shape 47 Fig 2 has been identified by the system as significant, and its main corners 48 have been located. It can be presented to the user by moving the audio effects and the joystick 49 in the path of the shape 47 and outputting corner-representing effects at the appropriate times, whilst the categorical effects relating to the area concerned are being presented (via speech, braille or impulses).

The categorical effects can be for the whole of the panel as shown, or they can describe the properties of the shape (e.g. its colour).

Page 12 To represent corners or other point-like features (such as the data points of a line graph, or the intersection of the path and a Pillar border), distinct indicator effects are output while the shape tracer is being presented, the indicator effects being presented at the moment when the tracer is passing the location of the corner or other point-like feature. Extra corner effects can be output if they produce a better impression of an entity. For example if an "X"-shape is being presented as two sloping lines, then the point of intersection of the lines can also have an indicator effect added, even though each tracer is not changing direction at that point in its path. (The point at which a tracer crosses the boundary between two "pillars" or "layers" can also be considered a "point-like feature".) The simplest such indicator effect is to momentarily stop the apparent movement of the audio and/or tactile tracer (or change its apparent speed), or jolts or other short impulses can be applied to the force-feedback device. Alternatively a short characteristic vibration or similar effect can be used. Audio indicator effects can include brief changes in volume and/or frequency; short, sharp noises; distinctive noises; etc. Indicator effects that comprise speed reductions to tracers other than completely stopping can be provided. Furthermore the speed at which a tracer stops or slows down, and restarts, can be tapered.

The tracer could speed up for the indicator effect. The indicator effects can be symbolised by a momentary lack of audiotactile tracer effects; or by the change in nature of the tracer effects being used to present lineal features, so that the change in the nature highlights the existence of a point-like feature at the point of change of nature.

The indicator effects can be presented on other devices and in different modalities to the tracer.

The corner etc. indicator effects described above can additionally be applied to linear features presented using standard optophone-style image mapping; or when "polytracers" are being presented.

Continuous "buzzy" effects can improve the perception of the tracer shape. This is described further in section 4.1 below.

The textures of an area or entity can be conveyed via small fluctuations in the volume of the tracer sounds. These volume effects combine the effects of changes in brightness, colour etc., to give a single volume-conveyed texture" effect. Similar effects can be induced on the force-feedback devices.

(Volume is also used to convey other properties, such as the width of an entity, and to convey larger general areas of "change") Volume effects are best applied to the buzz track rather than speech sounds, so that the speech sounds are not distorted.

Tactile Effects and User Interaction The system's audio effects have tactile equivalents, which can be presented by using standard force-feedback devices to convey location and shape; and braille or coded impulse methods to convey categorical properties.

If 16 consonants and 16 vowel sounds are used, 256 (i.e. 16 x 16) combinations of CV syllables are available. This is the number of different dot-patterns that can be displayed on a programmable 8-dot braille cell. 44 Fig 2 shows one way in which the information conveyed by the coded spoken sounds Page 13 could also be displayed on 12 braille cells.

A low cost alternative method of conveying categorical information can be implemented by inducing patterns of Morse code-like impulses 45 on a standard force-feedback device 49. It will not add to the hardware costs of a system that in any case uses a force-feedback device.

Sections of an image can be selected by the user via the pointer / joystick, so that only those parts are presented by the audiotactile effects (such areas arc known as a "viewzones"). The user can instruct the system to zoom in' to present a smaller area, but in more detail, as well as to zoom out' to present a features of the whole image. The viewzone can be positioned as required.

A force-feedback joystick can also be moved by the system, pushing and pulling the user's hand and arm, both to convey any shapes that are to be presented (by tracing them out), and to indicate the area within an image that is currently being described via the audiotactile effects. (The user can override the joystick forces at any time, for example if they wish to change the section of the image that is being presented.)

2. DESCRIPTION OF PHYSICAL COMPONENTS

2.1 EXAMPLE PROCESSING PLATFORMS An embodiment can be created by installing standard image-gathering, sound-generating and speech-synthesising software (and any necessary hardware) on a non-portable computer or portable computer or "wearable" computer, developing appropriate software and installing it on the computer; providing force-feedback devices; and providing standard blindness aids, such as a braille display.

Fig 3 shows the physical appearance of a low-cost preferred embodiment, which uses only standard hardware components. The laptop computer 50 uses Microsoft'sTM "Windows"TM operating system, on which the bespoke application software for the embodiment is shown running 51. The visual representations are provided by the webcam 53; the video tuner 54; the inbuilt laptop DVD player 55; from files held on the laptop computer 50; or from demonstration shapes and corners; etc. provided by the bespoke software 710 Fig 12. The force-feedback joystick 56, for example Microsoft's "Sidewinder Force Feedback 2"TM, is used by the user to indicate areas of the image and control the software, and by the system to present the Morse code-like impulses that can be felt and heard. The force-feedback mouse 57, for example Logitech'sTM "Wingman Force Feedback Mouse"TM, is being used by the system to present shapes. If one-handed use is preferred then the tactile effects and user control functions can be combined on one of the force-feedback devices. The laptop's internal loudspeakers output the audio effects. Alternatively separate loudspeakers or headphones can be used (not shown). All of the peripheral devices shown in Fig X can be connected to the laptop 50 via its USB ports (connecting cables not shown).

Alternatively a portable preferred embodiment can be used (not shown), in a similar manner to that described and illustrated in UK Pat. No. GB244 1434. This is suitable if only the audio effects are required, as the only hardware required is a standard portable computer with headphones.

Page 14 2.2 FORCE-FEEDBACK DEVICES A standard force-feedback joystick or force-feedback mouse can be used as a tactile display and to control the system i.e. the user can indicate areas of the image with it but it can also move independently, controlled by the system, moving the users hand and arm both to indicate a position in space, and to convey shapes (and corners etc.). UK Pat. No. GB244 1434 describes suitable force-feedback devices.

The devices found to be effective for use in connection with the example embodiments include Microsoft5TM "Sidewinder Force Feedback Pro"TM; "Sidewinder Force Feedback 2"TM; and Logitech'sTM "Wingman Force Feedback Mouse"TM (not shown). All of these devices have been found to work effectively on the Windows 98TM and Windows XPTM operating systems. However Windows VistaTM and Windows 7 do not support garneports and so does not support Microsoft5TM "Sidewinder Force Feedback Pro"TM; and, on Vista and Windows 7, Logitech'sTM "Wingman Force Feedback Mouse"TM has been successfully used to output forces after adjusting the standard parameter ranges, but input from the mouse has not yet been successfully read. However Microsoft'sTM "Sidewinder Force Feedback 2"TM works well on Windows Vista, and on Windows 7. Novint'sTM low-cost "Falcon" haptic device may also be suitable, but at the time of writing has not been assessed by the applicant.

Two (or more) force-feedback devices can be used as shown in Fig 3. Microsoft's Directlnput allows such devices to be "enumerated", so that signals can be sent to particular devices. For example, the main joystick 56 could be used as a pointer by the user; and by the system to indicate the location and size of an entity (which may be very small); the other device, for example a force-feedback mouse 57 (or second joystick) could be used to convey the shape of the entity, the tracer being expanded in size to better convey the detail of the shape.

3 DESCRIPTION OF SOFTWARE

UK Pat. No. GB244 1434 describes one approach to developing the software, and a similar approach can be used for this invention. This description outlines the processes that are followed when the system is operating, and, when combined with UK Pat. No. GB244 1434, can be regarded as an outline functional specification of the software i.e. the software specification takes the form of a description of its function. The software functionality description is spread throughout this description. The precise software design will depend on the processing hardware used and the preferred programming methods of the constructor. Software development is a large subject and well documented elsewhere, but the data and processing required will be described in sufficient detail (when combined with UK Pat. No. GB244 1434) to enable software to be developed by people who are skilled in the art of software development, including its application to areas such as image processing, sound processing, speech synthesis, communication protocols and man-machine interfacing.

A high-level programming language such as "Visual Basic"TM or "C++" will allow linkage to the operating system's multimedia facilities.

The application software should be designed to be accessible to blind and deafblind people. Methods Page 15 for doing this (for example, using speech input and/or output) are widely documented elsewhere.

As with most standard commercial software intended for general use, the user should be able to alter the parameters that control the software. These should be optionally linked to particular Activities so that the options and parameters appropriate for a particular Activity can be rapidly selected.

"Virtualisation" software can be used to run the system from a "guest' operating system run on a "host" operating system. For example the Wingman Force Feedback mouse is difficult to operate within Windows Vista or Windows 7. In tests, the system was installed on a Windows 2000 guest operating system run on Sun Microsystem'sTM "VirtualBox"TM and on VMwareTM Inc.'s Workstation" and "Player" software, run on a Windows 7 host computer, and the Wingman Force Feedback Mouse could then be used. A similar approach might be used to allow the system to mn on other operating systems, for example Linux, SolarisTM or AppleTM's Mac Q5TM* Image acquisition can be performed within the guest system, or alternatively the "clipboard" facility can be used when the system is presenting clipboard contents -the images being acquired in the host computer.

4 KEY FEATURES 4.1 IMPROVING THE PERCEPTION OF SOUND TRACERS ("BUZZ TRACKS") As described in the prior art description above, the perception of horizontal location is weak if only binaural cues are provided; and fluctuations in the volume of the speech sounds can make them less easy to understand. These weaknesses are addressed by using a separate audio (or tactile) buzz track", as described below.

Adding a "buzz track" A second audio tracer 6 Fig 1 (known as a "buzz track") can be used that is easier to "mentally position" in "soundspace" than speech-like sounds.

For example in presenting a particular entity (e.g. object or abstract shape), a speech tracer can present categorically-perceived properties of the entity, for example colour and object type; while the buzz track tracer, optionally presented at the same time as the speech track tracer, can present properties of the same entity, for example volume-conveyed properties, as well as presenting the shape and position more clearly than the speech tracer alone.

The sounds of several different waveform "timbres" were tested and are provided.

One of the most effective sounds was a "buzzy" sound, but with a clearly defined pitch (i.e. a "voiced hiss" resembling the sound of a flying insect or bee). (Similar "moving" sounds are often used to demonstrate "3D sound" environments, indicating that such sounds are effective for conveying location in "soundspace".) Buzzy sounds can be generated by using "random" level "hissy" "white noise" to produce a sample of sound waveform of time length equal to that of one cycle of the required pitch, said sample being repeated for as many times as is necessary to produce the required length of sound. This approach produces a sound that might be described as "buzzy", but with a clearly defined pitch. The Page 16 buzzy sound is usually (but optionally) played at the same time as the corresponding speech tracer, and pitched and positioned in soundspace in the same way as is used for the speech sounds. Said extra non-speech sound track will be referred to as a "buzz track", but the sound timbre need not necessarily be buzzy-sounding, although such sounds tend to be easier to mentally position in "soundspace". Sounds that are of a particular timbre and that are used for buzz track-like purposes (including "polytracers"), but which are not necessarily buzzy sounding, will be referred to as "humming" sounds. When such sounds, and corresponding tactile effects (e.g. continuous particular smooth, buzzy, square or sawtooth "rumble" effects) are referred to collectively they will be referred to as "humming effects".

The sound of the buzz track can be generated by the system -for example, "buzzy" sounds as already described, or square, or sawtooth waves. Alternatively recorded sound samples can be used, for example recordings of continuous notes produced by musical instruments; natural sounds; sounds produced by machines; humming sounds; etc. Such different sounds are useful if timbre is being used to convey additional information, as described below.

The relative volumes of the apparently-moving speech sounds and the buzz track can be controlled by the user, for example via a slider 105 Fig 4.

The "buzz track" is optionally played when the speech sounds are played, giving a clearer impression of the shape being presented. Any volume-altering effects (conveying information such as size, texture, width, change etc.; and sawtooth-profile volume effects, as described below) can be applied to the buzz track rather than distorting the speech sounds (distorting the speech can make it more difficult to comprehend).

For example if a medial-line tracer is used to present a shaped entity, then the entities "width" at any point can be conveyed via the volume of the tracer. If a "buzz track" is used, then as well as more-clearly giving the shape of the medial line, the width of the entity at any point can be conveyed by the volume of the buzz track, leaving the speech unaffected.

Both the speech tracer and buzz track can optionally follow the same apparent path at the same time.

However, if small objects are being enlarged to better convey their shape, then as an option either one of the buzz track tracer or speech tracer can be enlarged 546 & 547 Fig 5, rather than both. Doing this will allow one of the tracers to present the shape more effectively, while the other tracer gives the location of the small shape within the image. (If a force-feedback device is being used to present shapes, then it can optionally follow the path of either such tracer.) Optionally the two tracks can be played sequentially.

As an option, the speech can be presented at a constant pitch and volume, with the buzz track tracer having the information-conveying distortions applied to it (e.g. altering the pitch and binaural stereo location to convey shape, and altering volume to convey other quantities). This leaves the speech unmodified and easier to understand. Other information-conveying effects, whether audio or tactile, can optionally be "stationary" or be made appear to move to convey shape.

Effects corresponding to the buzz tracks can also be applied to the tactile display, for example by Page 17 presenting a vibration of frequency corresponding to the current height within the image 580 Fig 7.

Numerous other embodiments of the invention are possible. For example non-speech effects can convey data concerning the image while the buzz tracks are playing, for example via braille, tactile tap codes, or other special effects.

Varying the "timbre" of the buzz track to convey additional information If a "buzz track" is being presented, changes to its timbre can be made in order to convey additional information in a non-linguistic manner. For example, the left-right positioning can be further enhanced by gradually changing from a "buzzy" to "square wave" sound as the apparently-moving tracer sounds move from left to right. The vertical positioning can be emphasised in a similar manner. It is sometimes useftil to be able to emphasise the centre of the image area, by changing to a different timbre when the tracer is approaching the centre of the image. In other systems, timbre is often used to convey colour, although this does not emulate the categorical manner in which people perceive colour. Although timbre could be used for this purpose, instead it can give the colour temperature of the area being presented.

If timbre-altering facilities are provided, other properties or qualities can also be conveyed to the user via the buzz track, for example the features (e.g. roughness or smoothness) of the perimeter/edge of the object or the straightness or otherwise of the line being presented. "Pseudo-timbres" can be provided as options, for example the "quietness" and "loudness" (i.e. the volume).

Fig 7 shows a GUI for controlling the buzz track feature. One approach is to give the timbre to be presented for both ends of a spectrum of properties: for example when horizontal position is being conveyed via timbre, the timbre for "leftward" 582 and "rightward" is given. The system will then vary the timbre as the motion changes.

The system inspects the selected properties that are to be conveyed via timbre, and starts a sound file playing for each such timbre. While they are playing, the system calculates the effect that each component should contribute to the buzz track tracer at each point along the tracer. The relative volume of each timbre track is then altered as the presented quantity changes, so that the overall timbre appears to change as the quantity in question changes. Although the required timbre waveform can be created "on-the-fly" (by the system plotting the required waveform), it was found to be effective to simply use the DirectSound facilities to play two sound files (consisting of the two timbre sounds), and altering the volume of each to give the required intermediate timbre (see section 4.5 below).

A tactile version of the buzz track timbres could also be provided, by using continuously-altering tactile effects 584 Fig 7.

"Pillar" and "layer" matrix effects If buzz tracks and timbre effects are used, or just buzz tracks, it is still sometimes difficult to interpret the shape of the line described by a moving tracer from the audio effects alone. Furthermore for a tracer moving in a mainly upwards direction, it is difficult to determine the direction of the slope (i.e. whether Page 18 to the left or right) from the slowly-changing timbre.

Consider for example the two shapes 546 & 548 Fig 5. Although the two shapes can be distinguished from the buzz track alone with some practice, it is not always clear whether the edges are straight or curved.

In order to improve the perception of shapes, and slopes within shapes, and clarify them, additional effects can be added.

One approach is to divide the image to be presented into a matrix of equal-width columns and/or several equal-depth rows (which may or may not match the grid formed by the segments of the current Layout, if any). Then indicatoreffects can be triggered whenever the tracer moves from one such column to another (referred to as "pillar effects"), and/or from one such row to another (referred to as "layer effects"). (Pillar and/or layer effects will be collectively referred to as "matrix" effects.) (Where appropriate Row/"layer" effects are triggered when the tracer path crosses a boundary between two rows. This effect may be considered less essential, as the frequency generally gives a clear indication of height.

However layer effects may be felt to be more intuitive as they reflect the stratification of content found in many scenes. One suitable layer effect may be to change effect pitchlfrequency on change of layer, so that anywhere within a particular layer is presented at one pitch. The pitches can be set to be musical pitches, producing a musical effect. If volume-based pillar effects are applied, such effects will sound like musical beats. However using this approach will reduce vertical resolution. (Similar effects could be applied directly to the speech effects, producing a "singling" effect.) Using pillar and/or layer effects allow the shape of lines to be perceived more clearly if (as is usually the case) the tracer travels at a constant speed, as then the rate at which the effects are presented will change to reflect the angle of slope. For example if pillar effects are presented then the diamond shape 548 Fig 5 will produce an even rate of effects, while the "concave diamond" shape 546 will produce a changing rate of slope effects, with the rate increasing as the slope becomes more horizontal and decreasing as the slope becomes more vertical (the reverse will occur if layer effects are used).

Note that when pillar effects are presented, different effects can be presented when the tracer moves from left to right, and when it moves from right to left; and likewise different effects can be used to distinguish between upwards and downwards movement when layer effects are used (so that the direction of travel is clear). One effective indicator effect for this purpose is to apply a sawtooth-shaped volume profile Fig 6 to the sounds as they move horizontally (for pillar effects) and/or vertically (for layer effects). If such volume alterations are applied as pillar effects on the buzz track then as the presented location moves horizontally, the volume of the buzz track is adjusted according to the profiler shown Fig 6. The effect of the illustrated profile is that the buzz track presents an effect sounding like "bing-bing-bing" as the tracer moves left to right, characteristic of the "attack-decay" effect heard when a percussion instrument is struck, wherein the volume rises rapidly, then decays relatively slowly; and presents an effect sounding like "nyib-nyib-nyib" as the tracer moves right to left, characteristic of some of the Page 19 sounds heard when a soundtrack is played backwards. The rate at which such effects are heard indicates the slope of the line described by the tracer. (These volume-profiling effects could alternatively be used to highlight corners etc.) Other directional effects can be used other than volume -for example any distinct sound and/or tactile effect can be made on crossing a pillar and/or layer boundary, in a similar way to some of those effects used to highlight corners. A tactile buzz effect having a similar intensity profile to that shown in Fig 6 can be used in the tactile modality.

Whatever effect is used, it is important that the user can distinguish between left-to-right, and right-to-left change, and similarly between directions of vertical change, as a tracer can move in any direction.

The effects applied to each direction can be different in nature, for example using a tactile effect for left-to-right pillar boundary crossing, and an audio effect for a right-to-left pillar boundary crossing; or using more subtle differences.

When pillar and/or layer effects are presented it is helpful if the time used to present the effects is maximised so that the rate of change of effect presentation is clear. Given a certain output "Playtime", it may be advantageous not to use "stopping" or "speed reduction" effects for e.g. corners (or else to use short stopping times), as these will reduce the time left over during which pillar and/or layer effects can be presented -if the average rate of matrix effect presentation is rapid then the differences in rate (conveying shape) are more difficult to follow. Instead, audio or tactile effects that do not take from the tracer even-travel-speed time can be used, for example by using distinct sounds or tactile effects to represent corners, which are played concurrently with the moving tracer sounds.

One convenient way of handling corner, pillar and layer effects is to provide GUI facilities for defining the heard and/or felt effects (including the volume-profiling effects as described above), temporary changes to the apparent speed (including becoming completely stationary for a short period of time) and any other effect that is found to be effective in highlighting point-like features and/or pillars and/or layers. (The effects assigned to each of corners, pillars, and layers should be distinctive so that there is no confusion between them.) Users may prefer to use either pillar or layer effects (i.e. not both) in order to simplify the presented effects.

Further clarity can be given to the horizontal definition of shapes by starting the tracer at, say, the leftmost point of the shape, so that the user knows that any initial horizontal movement will be rightwards. (Sometimes other considerations will override this approach.) When presenting any particular entity, the tracers should normally travel at a constant speed, so that the pillar and layer effects give useful slope cues (though the speed will change, for example, when corners etc. are signalled by changing the speed the tracer). However the speed can vary from entity to entity -for example, the tracer may move more slowly when presenting smaller entities in a scene so that there is time to clearly present speech-conveyed information. As an option, the pillar and layer spacing can vary dynamically from entity to entity, so that a consistent effect frequency is presented for a particular angle of slope.

Page 20 A comprehensive mapping facility can be provided to allow a user to map many available visual properties to many available audio and tactile properties. Fig 7 shows a GUI for allowing users to control such a facility.

Buzz track audio properties such as timbre, volume, pitch & frequency etc. can be set to operate "Boolean"-style (i.e. "on" or "off') or continuously-changing Fig 7. For example the "Boolean" visual properties IsLeft, IsRight, IsHigh IsLow, IsHighLeft, IsHighRight, IsLowRight, IsLowLeft, IsCenter, IsEquator, IsMeridian, IsLeftward, IsRightward, IsUpward, IsDownward IsRed, IsBlue etc. can be used as visual properties to which to map audio and tactile properties Fig 7, which will change suddenly in a Boolean manner. Alternatively effects can change gradually, for example when mapped to deviations from particular locations, for example from the central vertical and horizontal lines (meridian' and equator'), or from the centre of the image.

The visual properties of Speed of travel, Line of slope, Direction of movement (Rightward, Upward, Downward, Leftward), Size, Texture, Colour temperature, Leftness, Rightness, Highness, Lowness, Bigness, Straightness, Curvyness, Roughness, Smoothness, ColorWarmth, ColorCoolness and Distance (and many other possible visual properties) can be mapped to the audio properties of Volume, Pitch, audio pulse frequency (e.g. "beeping" frequency) 586 Fig 7; and to the tactile properties of Intensity, Frequency, and pulse rate 588 Fig 56; and to other audiotactile properties/features. As well as mapping to audio beep frequency or tactile pulse frequency, patterns of audio pulses and tactile pulses can be mapped to particular visual features; or the volume levels of the beeps can be mapped to visual features.

Categorically-perceived timbres could be mapped to particular categorically-perceived colours (or other categorically-perceived properties) for example "Red" or Blue".

Furthermore "Distance from pillar or layer", or e.g. "distance from leftward pillar" can be a visual property -if mapped to volume, it will produce effects similar to the pillar/layer effects described elsewhere.

Optionally each pillar/layer can be assigned a distinct effect.

The effect produced on passing a column can change gradually from pillar to pillar or row to row, and be dependent on direction of travel. For example different pitch effects can be triggered on left movement as on right movement.

Mapped properties can be combined where appropriate. The resultant mappings can be saved as "profiles", so that they can be rapidly selected, and applied for use as buzz tracks when the system is presenting shapes etc., or when user is performing markup/drawing (see section 5.4). They can also be used to give the sounds to be used for "polytracers" (see section 4.4) i.e. the polytracer sounds can refer to a "buzz track" profile to obtain the sound effects to be used (for example the timbre of the buzz track can change to reflect the left-right positioning).

Attack/decay-type effects can be generated on-the-fly and applied to audio andlor tactile effects.

As an option, and where applicable, the audio effects described for buzz tracks can be applied to the Page 21 apparently-moving speech-like sounds directly, i.e. distorting the speech sounds, without a separate "buzz track" being used.

Optionally buzz tracks can be presented alone, without speech-like sounds.

For pillar and br layer effects, directional effects other than the "sawtooth-profile' volume Fig 6 can be used -for example a distinct effect can be output on crossing a pillar boundary in a similar way to some of those effects used to highlight corners. The system could provide a GUI facility for assigning indicator effects to either corners; or to colunms; or rows; etc. Each pillar/layer can be presented with a distinct timber.

Alternatively, the location can be directly described via speech, or via coded speech-like sounds.

As well as pillar and/or layer effects, other matrix arrangements can be used.

4.2 TRACER PATHS Having described the improvement of using a buzz track to clarify the shapes of the features of visual representations, consideration can now be given to the various paths that the audiotactile tracers can follow.

Region tracer paths When regular rectangular "Regions" are being presented 106 Fig 8, the audiotactile tracer's path shape only conveys the extent / area covered by the tracer, and not useful detailed shape information. For example the sound pitch conveys the vertical location, and the range of the sound pitch conveys the vertical extent of the region. As a result two-dimensional paths are generally presented when Region Layout effects are being presented, for example a circle or "stepping" around a rectangular path within or around the edges of the region being presented. Such an approach has the additional advantage that the current tracer location at any moment can indicate which part of a Layout (e.g."top left") is currently being conveyed.

However when pillar and/or layer effects are being output, a one-dimensional tracer path across or up/down the region respectively will exhibit the pillar and layer effects more clearly.

Hence one possible set of region tracer path options would be to provide tracer paths named "Steps" 60 Fig 8, "Middle" 62, "Circle" 64, "Frame" 66, "Across" 68, and "Down" 70 ("Middle" being used to select an unmoving tracer located at the centre of the region being conveyed).

Buzz track effects help to clarify the presented region tracer path shapes.

Object tracer paths When "Objects" are being presented via audiotactile tracers 108 Fig 4, the audiotactile tracer's path can follow one of several routes, as listed below:-a) Object Outlines. The outline/perimeter of the object can be presented, and/or other "keylines" if the Page 22 optical processing component is able to identify such lines 410 Fig 9.

b) Object Centres. The audiotactile path tracer can be stationary for the period presenting the object, being "located' at the centre of the object (not shown).

c) Object Frames. The audiotactile path tracer can follow a path that "frames" the extent of the object.

The frame can be rectangular 412, or be rounded at the corners 411 Fig 9 or ellipse-shaped. The frame will generally be orthogonal/vertical, but can slope to indicate the orientation of an elongated object at an angle to the vertical (not shown). The angle can be decided by the system testing frames to the object with the frame aligned at varying angles to the vertical e.g. every 10 degrees. For each such angle, the system can measure the area and/or perimeter of the frame, and deem the "best" angle to use as being the one resulting in the smallest area or perimeter. (The angle so measured can be one of the selection criteria used for deciding which objects to present.) The "frame" object tracers path can alternatively be non-rectangular, for example a hexagon or other polygon (including irregular polygons) that are effective in presenting the object shape, for example by presenting an enclosing a "convex hull" (not shown). The corners of frames are optionally not emphasised with indicator effects, as such corners do not generally convey useful extra information to users. (However if non-rectangular frames are used then, as an option, corners could be highlighted so that the shape of the polygon is clearly presented to the user.) d) Object Medial Paths. The tracer can follow the "centre-line" of an identified object. This is most effective for elongated objects where the path travels in the general direction of the longest edge, but is positioned at the middle of the content at any point along its route 414 Fig 9. This type of medial path is referred to as a "linear medial". It is not as effective for objects with no clear elongation: for them, a "circuit medial" can be used, wherein the path travels in a 1oop centred on the centre of the object, and is positioned at any point along its route at the middle of the content (or at the middle of the distance to the edge of the content) found between the centre and the edge of the object 415. Optionally the system can be instructed to automatically switch between "linear medials" and "circuit medials" depending on the "aspect ratio" of the object. (The volume of the sounds presented when the medial paths are presented can vary according to the amount of object material at any point.) (Object medial paths can be calculated by using some of the several optical processing methods available that are documented elsewhere.) e) Symbolic Object Paths. For identified objects, the system can present a series of lines and corners that symbolise the classification of particular objects, rather than attempting to present the shape that the object currently forms in the scene. Human figures and people's faces are examples of entities that can be effectively presented via symbolic object paths 416 Fig 9. Symbolic object paths should contain features that make them clearly identifiable to the user as symbolising the object classification being presented -they do not necessarily need to resemble the visual appearance that an instance of the class of object being represented makes in reality. The symbolic paths should include features that distinguish them from shapes that happen to resemble the symbols. For example symbolic object paths could contain Page 23 distinctive corners, loops, curves etc., and should include features that would never be found in the presentation of automatically-detected "blobs", for example movement in the direction opposite to that being followed for automatically-detected blobs, e.g. as anti-clockwise movement.

Image processing software can at the present state of development perform some object identification, for example by using face detection methods such as the well-known Viola and Jones algorithm (see section 5.3). In such cases a standard symbolic shape (for example representing a human figure 420 Fig 10) is presented when the corresponding item is being output. One symbolic shape should be used to represent "unknown" (for example an "X"-shape 422 Fig 10). If symbolic shapes representing "unknown" are used 422, then these can be sub-divided so that unknown entities are classified into sub-groups, for example based on shape and/or aspect ratio, and unique symbolic shapes can be used to present each of these sub-groups. Partially-identified shapes can have their own symbolic shapes assigned to them. For example "spiky/jagged objects" or "smooth objects" or "polygons", can be presented as symbolic shapes. An "automatic" mode can be provided, wherein a symbolic object path mode is presented if an object is identified and a different type of object path is presented if the object is not identified. Alternatively the system can revert to showing the outline or other path when an unrecognised object is presented, with the direction of travel of the tracer clearly distinguishing it from the symbolic shapes.

It was found to be useful to have sub-categories of symbolic shapes that show parts of an object. For example it is useful to provide a shape for the top half of a human figure, head & shoulders, etc., as these are often what is presented in a visual image 426.

Symbolic object paths are generally angled and sfretched to match the angle and aspect ratio of the object being presented (not shown). The processing used to determine the best framing (described above) can also be used to provide the angle and dimensions into which the symbolic object path should be fitted.

Several symbolic shapes can be assigned to one entity to sub-classify it -for example the symbolic face can have versions for "head on" 418, "left-profile" 427, and "right-profile" 428.

Basic symbolic shapes can be assigned to particular classifications/types, and embellishments can be added to represent sub-classifications. For example a shape representing a face 418 can be embellished to include features representing a hat, pair of glasses, moustache, etc. by having additional effects added, for example additional small loops, zigzags, corners etc. (not shown). By using this approach, basic symbolic shapes of common object classifications can be easily recognised by beginners, with sub-classifications recognised by more experienced users.

For symbolic object paths which have a clear "direction", for example an object presented in profile, the system can when appropriate present a mirror image of the symbolic object path (not shown) without causing confusion to experienced users.

It could be argued that symbolic object paths (i.e. special shapes which symbolise recognised objects) Page 24 are unnecessary, as the nature of the object could instead be presented directly to the user, for example via speech synthesis or braille. However it may be that using a non-semantic format is less tiring or distracting in certain circumstances, and more closely resembles the experience of visual object recognition, though this issue has not yet been investigated.

Tracer start point and direction of travel The direction of travel of an object tracer path can give information, for example:- -if anti-clockwise, or has anti-clockwise elements (for example in a figure-of-eight"), it is in some way identified for example designer-marked-up; demonstration shape; or a symbolic object path detected via face-detection methods; and -if clockwise then it is an automatically-detected area.

The starting point of the tracer can give additional information, e.g. if it starts at the top or base or left or right of the object. For example the particular object-related layout method (see section 4.3) currently being used can be indicated by using a special start point -e.g. start at the top of the object to indicate that object content is being presented by the layouts, start at the right to indicate figure/ground layouts methods, start at the base to indicate symbolic object layouts, etc. Alternatively other information can be conveyed by the starting location.

Buzz track effects help to clarify the presented object tracer path shapes.

4.3 IMPROVING THE PERCEPTION OF IMAGE LAYOUT Object-related Layouts When presenting objects, a "layout" related to the object can be presented at the same time, for example by using a braille/tactile display, or by using speech codes. The Object-related Layout content can comprise material selected from the following options:-a) Object content. Because the shape of the object is known, the image content in only the area covered by the object can be presented. The system should spread the content over the area of the Layout, for example by recursively dividing the content into equal areas along axes in alternating directions 430 Fig 11 until the number of areas matches the number of segments in the Layout, and then presenting the content of those areas in their corresponding segments in the Layout.

b) Framed content. The content of the rectangular frame enclosing the object can be presented in the Layout with the content "stretched" if necessary to use the full height and width of the frame (not shown). Alternatively a square frame can be used, wherein the object content is centrally positioned and not stretched (not shown).

c) Figure/Ground format. The content of the frame can be presented using an approach which Page 25 incorporates the perceptual concept of "figure/ground' i.e. the effect whereby objects are perceived as being figures on a background. If one object is being presented 434 then the system can present the layouts as showing:-i) The area covered by the object within the "frame" enclosing the object 434, the object being stretched in one direction, so that it extends to the edges of the Layout. If the object is sloping then the enclosing frame can also be sloping, or alternatively kept orthogonal. Alternatively a square frame can be used (not shown), wherein the object content is centrally positioned and not stretched (not shown).

ii) The location of the object ("Figure") within the whole scene or viewzone ("Ground") can be presented 436. The layout presents the area of the entire scene or viewzone, for example as "blackness", and the figure is shown in the area that it occupies. If the system is presenting colours then if it is presenting two standard colours then:- -the system can present a colour e.g. black for the background, and the overall colour of the object; or -the system can present the two colours that predominate in the object or -the system can present the overall background shade (e.g. the average shade of the background) and the overall object shade (e.g. the average shade) of the object.

The first of these methods gives the least information to the user. All of these approaches can be applied to all of the Figure/Ground object layout methods described above. Other similar conventions can be devised.

If the system is "stepping" round the scene presenting the selected objects (see section 5.2), the object-related layouts will appear and disappear as the corresponding objects are presented, giving the user information about their location, size and colour.

iii) All of the objects being presented within the whole scene or viewzone can be presented 438. The objects are in the same location as for ii), however they are all presented at the same time. This method works best when not too many objects are being presented, so that separate objects can be clearly

perceived against the background.

As an option, if small objects are being presented then the system can be programmed to always present at least one segment of the Layout to represent the object, even if the object would otherwise not be large enough to be represented by a segment of the Layout. Using such an approach ensures that every object being presented is represented by at least one segment in the Layout.

The "figure/ground" object layout methods are generally most effective when a few foreground objects are being presented.

d) Viewzone content. The standard viewzone Region layout as previously described can be presented by the layout-presentation method (e.g. braille or speech codes) at the same time as the object path is presented via the audiotactile tracer.

Page 26 e) Symbolic layout format. If the object has been identified, then symbolic layouts (using a similar concept to the symbolic object tracer paths described previously) can be presented, wherein the arrangement of dots is constant for particular object types (not shown). As with symbolic object tracer paths, a basic layout can be assigned to particular object types, and small embellishments can be added to represent sub-classifications. By using this approach, common symbolic layouts can be easily recognised by beginners, with sub-classifications available for recognition by more experienced users.

Compact object layouts Layouts that are output as speech or morse (whether audio or tap codes) (i.e. not braille) tend to be long-winded. If Object-related Layouts are being presented, particularly if using figure/ground formats, as an option a compact format can be used: only the location of the centre of the object can be presented, via a single "CV", the C & V giving the vertical and horizontal "coordinates" of the centre of the object being presented within the viewzone. For example if "International" coding format speech phonemes (see UK Pat No. 0B244 1434) are being used then the CV can give the location of the centre of the object within a S segment by 5 segment grid, while if "English" coding phonemes are being used then a 15 segment by 15 segment grid can be used. The two systems can made compatible so that the speech codes for positions 2, 5, 8, 11, and 14 for the English range of phonemes are the same as those used for positions 1, 2, 3, 4 and 5 for the International range of phonemes. Optionally a second CV can give the approximate size and/or shape of the object, for example the "C" can present the area occupied by the object e.g. areas of "less than 1 segment", "1 to 4 segments", "5 to 9 segments", "10 to 16 segments" or "17 to 25 segments" for a 5 segment by 5 segment grid (and a similar approach can be used for a 15 segment by 15 segment grid). The "V" can give the approximate shape, for example "blobby", angular, spiky, elongated, "detected face', etc. Widening the object-related layout frame As an option, when layouts are presenting an area covered by a rectangular frame enclosing the object, the framing described above can set wider than the exact extent of the frame enclosing the object being presented, so that the layouts are presenting a larger area. This is effective for when particular colours are being sought and presented, otherwise the typical effect would be for the layout to show mainly the colour found, so not conveying useful information, while if the framing of the layout is set wider, then the context in which the found colour was located can be presented (not shown).

Improved layout coding Several new viewzone Layout configurations have been devised wherein the centre of the viewzone presents higher-resolution information than at the edges. Examples of such viewzones are shown in Fig 12. Each Layout format can have a separate tracer path defined for it -for example more complex formats 80 can have "spiralling" tracer paths defined.

A very simple 8-segment wide by 4-segment deep Layout format 82 Fig 12 is found to be effective Page 27 when braille is used as output, as it can be easily presented via a single line of a standard refreshable braille display, and so it is easy for the user to "read". It can be configured as two panels, either as two 4 by 4 segment panels, or as two 2 by 8 segment panels.

Earlier coded phonetic methods used by the system for presenting the arrangement of properties in a panel used somewhat arbitrary sounds, but a single syllable could present the arrangement of 4 or 8 "blobs' of content. The colours (or other properties) of the areas were also presented in a coded but less arbitrary manner, for example "boo-yow" or "bow" for "blue and yellow". However when tested in a small trial, real-name (non-coded) colours were greatly preferred by participants, and it made the system more accessible to untrained people. The real-name colours could be spoken more quickly by the system, as the user was expecting a colour name, and could "fill in" parts of the speech that they heard less clearly, as occurs in everyday speech -this effect is not available with the theoretically more efficient coded words. Even long colour names such as "DarkPurple" could be spoken rapidly (in about a third of a second) and still be understood.

Given the positive response to using real colour names for colours, the use of non-coded words for layouts has been investigated. Unlike for colours, there are no standard terms for particular arrangements of blobs. However it was straightforward to give reasonably sensible (and easily distinguishable) "real-word" names to 16 layout arrangements, allowing a 4-by-8 layout matrix 90 Fig 13 to be presented to beginners via 8 "real" words 94 in a "column-by-column" arrangement. (Such an arrangement also maps well to a standard 4-dot-high braille display 92.) For example the terms "None", "First", "Next", "Third", "Fourth", "Two", "JVIid", "Pair", "Split", "Dots", "Blobs", "Three", "\Vide", "Gap", "IViost" and "All" could be used to describe the 16 possible arrangements of a column or row of four segments.

A comfortable limit of about 4-6 short words per second is practical. This gives a limit to how much layout information can practically be presented via words. Furthermore, well-known experiments show that about 6 to 8 unrelated "chunks" of information can be comfortably handled in people's short term memory, giving a limit of about 4 to 6 "words" being used to present layout information for any particular area, if colour information is also given.

A modification made to the coded "CV" syllables was to strictly match the consonant to the first half of the layout, and the vowel to the second half 96. This approach was much easier to use than the earlier mappings, which attempted to match the overall amount of darimess, symmetry etc. to similar-sounding phonemes. Of the two coded methods (see UK Pat No. GB244 1434), "International" coding, which uses "consonant-vowel" ("CV") syllables selected from only 4 consonant and 4 vowel sounds, was found to be much easier to use than the theoretically more efficient "English" coding, which uses CV syllables selected from 16 consonant and 16 vowel sounds. The simpler "International" coding could be "spoken" by the system more quickly yet still be understood, so that approximately the same amount of layout information was presented in the same time as when "English" coding was used. However each "CV" syllable of "International" coding only allows 16 different combinations. It may be practical to use "CV" syllables selected from 8 consonant and 8 vowel sounds, allowing a coloured 6-by-8 48-pixel matrix to Page 28 be presented as two areas of 6x4 pixels, in a total of about two seconds.

"Column-by-column" or "row-by-row" (e.g. I -by-4, or I -by-6 etc. blobs) coded (or non-coded/real-word) arrangements may be more straightforward for users to follow than two-dimensional codings (e.g. mapping to 2-by-2 or 2-by-4 arrangements).

Users may decide whether coded 96 or real-word 94 colour and layout presentation is used: using real words may be more distracting to ambient sounds and more difficult for users to temporarily ignore, whereas the coded sounds may be easily ignored when required, without having to mute the system sounds. Furthermore, the codings are not difficult to learn.

With practice, users may become familiar with groups of sounds representing several columns, so that, say, a 4-blobs-by-4 arrangement is immediately understood as a single entity "chunk", rather than having to be mentally "assembled" from the component sounds. (This has not yet been tested.) 4.4 "POLYTRACERS" Improving the perception of layouts with multiple sound tracers ("polytracers") If volume alone is used to convey width, for example if a medial-line tracer is used to present a shaped entity, then the entities "width" at any point can be conveyed via the volume of the tracer, either applied to the speech or to a corresponding "buzz track". However in such cases the detail of the shape of the entity is not conveyed, nor is the "surface detail" of the entity. Only a vague impression of shape of the perimeter can be obtained. This can be rectified by using several shape tracers, that are output simultaneously, and are referred to as "polytracers".

Multiple tracer speech or non-speech polytracers can produce optophone-like effects, which may allow more accurate perception of the distribution of material within entities, and can support Region Layouts and Object-related Layouts.

The use of multiple speech tracers is briefly covered in UK Pat. No. GB244 1434, which describes how a medial tracer can be used with several other simultaneously-presented tracers to convey the shape, detail, and give an impression of the texture of the entity (they were referred to as "Spined Audiotactile Graphics with Multiple Tracers").

Sounds that are of a particular timbre and that arc used for buzz track and polytracer purposes, but which are not necessarily buzzy sounding, will be referred to as "humming" sounds.

If several non-speech audio "humming" tracers are used (instead of, or in conjunction with, several speech-like tracers), then the shape and content of an area can be presented in a more intuitive manner.

The multiple "humming" non-speech sounds may give a more-accurate indication of location in "soundspace", in a similar manner to single buzz track tracers. For example "buzzy" sounds allow the user to mentally-position sounds with accuracy in "soundspace". The selection of sounds for non-speech polytracers can be optimised for positioning accuracy. "Buzzy", "Sawtooth" and "Square" waveforms appear to work well for this purpose, but the extra tracers can alternatively present non-speech "pure" Page 29 tones such as Sine waves in a similar manner to existing optophone-like systems.

Alternatively the extra tracers can also be speech-like, presenting the same speech phonemes as the main tracer, but moving in "soundspace" so that their pitch and binaural location at any moment corresponds to the location of the image matter thai they are representing. The latter approach produces a "choir' of voices that "chant" the words (this effect is referred to as a "chorus"). The humming" non-speech multiple tracers may allow better positioning accuracy, but may be more distracting and less mellifluous than the "chorus" approach. Both options are provided 110 Fig 4 and Fig 4 Fig 19.

The paths that the polytracers follow can either be straight parallel lines, as used in previous optophone-like systems, or if a shaped entity is being presented then the tracers can follow paths that help to convey the overall shape of the entity.

Contoured polytracers One way to include shape detail and to give an impression of the texture of the entity, is for the system to present a coded speech-like medial-tracer with several simultaneously-conveyed tracers that travel in approximately the same direction as the medial-tracer, but vary in the width that they represent Fig 14 so that the shape of the entity is conveyed quickly, and more of the detail and texture is also conveyed 597 Fig 14.

In a simple version, a straight medial line 307 Fig 14 is used and just two extra tracers 303 & 304 convey the outside edge of the entity, but then little detail of the interior of the shape will be presented.

The extra tracers can be non-speech (e.g. buzzy or thne sounds). Alternatively they can present the same speech-like sounds as the main medial tracer, so that a "chorus" effect is produced, with the pitch and binaural positioning of the tracers corresponding to the area of the image being presented at any point.

In a more complex example a contoured polytracer with a curved-medial-tracer 331 and six further tracers 332 which are conveyed in the directions indicated by the arrows. The path of the medial-tracer can be decided in the same way as for standard curved-medial-tracers, and the paths of the two half- outline tracers 333 follow the outer edge of the entity. The half-outline tracers and the simultaneously-conveyed medial-tracers are conveyed at different speeds set so that each is conveyed at an even rate but completes in the same amount of time. If several "rib" lines 334 are plotted from points along the medial-tracer to points along the outer tracers which would be conveyed at the same time, each "rib" line can contain a certain number of equally-spaced points (two in the case in Fig 14) and the path of the additional tracers can be determined by joining the corresponding points in all of the rib lines. All of the tracers should share the same start point 335 and end point 336.

The coded speech sounds (if presented) should reflect the average content of the area swept out by all of the tracers. The medial-tracer may be set louder than the individual non-medial-tracers, whose varying volumes can be set to reflect the content of the area being conveyed by each tracer at any time.

The "contoured" polytracer method works best when the general direction of movement of the tracers is fairly horizontal, as the spread of frequencies helps to convey the width or height of the entity. For Page 30 polytracers orientated with tracers moving in an approximately vertical direction, the width of the entity is not so well conveyed by the varying-width ("contoured") polytracers method, as the binaural spread of sounds becomes the major indicator of width, and this is generally harder to interpret than the spread of frequencies that occurs with horizontally-travelling tracers. As there are a particular number of tracers active at all times, certain parts of the entity will be overemphasised (e.g. where a constriction occurs).

Parallel polytracers To overcome the limitations of contoured" polytracers, equal-width tracers ("parallel polytracers") Fig travel quasi-parallel to the main (i.e. medial) tracer and are output simultaneously, and each tracer can convey the same width within the entity. The number of tracers actively presented at any moment will vary according to the width of the entity. "Parallel" polytracers are effective for presenting fragmented and convoluted objects.

The advantages of this method include:-a) All parts of the entity are conveyed, even if the border swings back from the general direction of flow 338 Fig 15, or parts of the entity are separate from the main body of the entity 337. Because of this the method is effective for conveying fragmented entities, and those without clearly defined borders.

b) While each tracer can vary in volume to reflect the content of the area being conveyed, the number of tracers and hence the overall volume will tend to increase as the width of the entity increases.

The effect will be of a group of equal-width, quasi-parallel tracers travelling in line with the medial-tracer, with the outer-edge tracers being activated and dc-activated according to the width of the entity at any point 598 Fig 15. Hence the changing number of tracers active at any time gives an indication of the shape and width of the entity at different points. The individual tracers can be activated and dc-activated instantaneously, or smoothly (to reflect the amount of object content they are representing at any point).

As an option, for both "contoured" and "parallel" polytracers, if a "chorus" effect is being used, and if coded speech sounds are presenting the layout, the codes spoken can reflect the average content of the area swept out by all of the tracers, but the individual tracer's varying volumes are set to reflect the content of the area being conveyed by each tracer at any time.

The medial line tracer 331 Fig 14 is an effective main tracer on which to base the polytracers, for both "contoured" and "parallel" polytracers. However a "circuit" medial path Fig 16 can also be used, the polytracers presenting the content of the entity along the route of the circuit medial. Other similar arrangements can be devised.

Rectangular polytracers As an option, instead of shaping the tracers' paths, an optophone-like "rectangular" polytracer arrangement can be used, wherein the tracers are straight, parallel, and of equal length, forming a rectangular area Fig 17. This approach is particularly effective when the polytracers are supporting Page 3 1 certain Region Layouts and Object-related Layouts. For example, Object-related Layouts have previously been described showing the locations of entities within an image, so that a perceptual Figure/Ground effect is produced 434, 436 & 438 Fig 11, either emphasising the shape of the object 434 or the location of the object within the scene 438. Such "silhouette"-ljke images are particularly effective when presented via optophone-like polytracers, either as additional humming sounds, or as a "chorus" effect. Symbolic layouts also work well with this approach.

Rectangular polytracer arrangements can effectively present the information presented by the braille display area Fig 13, or they can use their own resolution.

Rectangular polytracer arrangements can be sloped so that they are aligned with the "straight line medial" of an entity 596 Fig 17.

Branching medial tracers and polytracers For certain entities, the system (or a human designer) can determine that neither a single medial line (straight or curved), nor a "circuit medial" Fig 16 is appropriate for presenting the entity. This might be the case for clearly "branching" items Fig 18 or a "Y"-shape. In such cases it may be advantageous for the audio main tracer to split into two main tracers at the branch point 343 Fig 18, the two tracers being simultaneously-conveyed, but each being pitched and binaurally-located to reflect its path. (Such an approach cannot be presented via a moving force-feedback device.).

If "branching-medial-tracers' are used as a basis for area-conveying polytracers, it will be necessary to define which parts of the entity are conveyed by each branch. One approach is to define the border between the areas conveyed by each branch as being the path of the line that runs midway between the medial-tracers for each branch. The dividing line 343 to 346 Fig 18 can be regarded as a dummy "border" when calculating the path and speed of the tracers for each branch. In the example shown in Fig 18 the two outlining tracers will first convey the unbranched-section edges 340 to 341 and 340 to 342, with the central medial-tracer travelling along 340 to 343. At 343 the medial-tracer splits into two branches, 343 to 344 and 343 to 345.

If branching-medials are presenting speech sounds, then the same synchronised sounds should be output by each branch. The outline tracers (whether speech, buzzy sounds etc.) should be timed so that they reach points 341 and 342 at the same time as the medial-tracer reaches 343, from which point on each branch has two outlining tracers i.e. one of the existing outlining tracers and a tracer that travels from the medial branch point 343, along the border between the branches 343 to 346, then along the outside edge of the entity to 344 for one branch and to 345 for the other. The speed of the tracers should be set so that the tracers for each branch complete at the same time as the corresponding medial-tracers, using a similar approach as that described for unbranched entities.

More complex branching situations can be arise 348 Fig 18 and similar approaches can be used to convey them.

Page 32 Polytracer options For all of the polytracer effect arrangements just described, the tracers can be either non-speech "humming" tracers, or "chorus"-like tracers; or both types can be presented together. They can be replayed at the same time as the main tracer, or can be presented separately.

Many options can be set to control the polytracer effects, for example via GUI controls Fig 19.

If a force-feedback device is being used to present shapes, then it will normally follow the path of the main/medial tracer.

The volumes presented can reflect the actual brightness of the content of a layout, or false" volumes 550 can be used so that there is a clear change in volume when a change of content occurs, even if, say, two different colour shades happen to be of similar intensity. As the system already has facilities to process the image to a limited number of colour shade levels (usually 2-levels in any panel etc.), these levels can be used to determine which clearly-different volume levels are presented. Most of the Region Layout and Object-related Layout arrangements described elsewhere can be presented by the polytracers 552, for example those producing silhouette-like effects, which work effectively.

The pitch range used for the polytracer effects can match the pitching conventions used elsewhere by the system 554. Alternatively a polytracer-specific pitch range can be used, for example a musical pitch range.

The polytracers can be set to be "light-sounding", "dark-sounding", or "least-sounding" 556, the latter setting being used to emphasise either dark or light effects, whichever is least present, in order to minimise the confusion of sounds.

The number of tone-like tracers or speech-like voices 558 can be set, as well as the relative overall volume 560. For "rectangular" polytracer arrangements the output tracers can be made to "pulse" as separate sounds on change of columns/pillars or rows/layers 562 to give a "beat" effect which will help convey the Pillar and Column "matrix". Many other options could be included.

The "humming' polytracer sounds can refer to a "buzz track" profile to obtain the options to be used, whereupon the conventions that are associated with that profile are used by the polytracers when appropriate. The non-speech sounds can vary in a similar manner to those used for "buzz tracks". For example the timbre of the tracers can change to reflect the left-right positioning as previously described.

When small entities are being presented as polytracers then they can be enlarged to better convey their shape, in a similar maimer to that described for other shape-conveying tracers.

Optionally, a minimum volume for active polytracer effects can be defined so that active tracers do not "disappear" completely when the volume reduces.

Many effective combinations of settings can be implemented for polytracers. For example the medial-line tracer can be speech-like and the polytracers can be non-speech humming sounds, or vice versa; just two humming tracers can accompany a speech-like main tracer; humming outer-edge tracers can accompany otherwise speech-like choir of tracers; etc. Many of the layout arrangements previously described can be presented via polytracers (including symbolic layouts), either as "rectangular" Page 33 polytracers, or shaped to match to the entity being presented. The layout levels can be presented as found, or false volume levels can be used. Alternatively the raw image can be used to determine the volume levels. The polytracers can present "symbolic shapes", using similar principles as are used for symbolic tracers. As an option the system can be set to act as a form of optophone.

To summarise, polytracers are often used to support the Layout effects previously described; and to give greater clarity to the shapes being presented, and to the distribution of material within those shapes.

Any combination of humming sounds; or speech sounds; can be applied to any of the tracers that comprise a polytracer, and the user can use an appropriate GUI for selecting the combination, for example by using checkboxes to allow assignment of non-categorical and/or categorical sounds to the outer tracers 590 Fig 14, the medial tracer 592 and the remaining tracers 594.

The user should be able to control which buzz track-like effects (e.g. pillar and corner effects etc.) are applied to which of the tracers. The user should be able to control which tracers exhibit corner effects (applying them to too many of the tracers may produce confusing results).

A Main tracer can still be designated, especially if a moving Force Feedback device is being used to present tactile representations of shape. The Main tracer need not necessarily follow the same path type as the polytracer -for example it could present the perimeter of an object when the polytracers are presenting contoured tracers along the medial line of the object Fig 14.

Where appropriate any of the polytracer types described above caii be configured around the curving linear medial line Fig 57; a "circuit medial" Fig 50; branching" medials Fig 18; or a "straight medial" line 596 Fig 17. The "straight medial" is the straight line that divides the entity being presented so that the same amount of entity material is to each side of the line (or a "centre of mass" approach can be used). If no clear alignment exists then the "medial" approach can still be used, but with the medial aligned in an arbitrary direction. The system (or a human designer) can determine which medial-line type is most appropriate to us (for example using a lineal medial path for elongated entities and a "circuit" medial for other entities).

Most of the techniques described in this section for parallel polytracers with curved medial tracers can also be applied to those with straight medials 596 Fig 17 where appropriate.

Polytracers' overall sound volume can vary according to the overall width of the entity being presented at any moment. Doing this will typically produce a smoother "faded" start and finish to the polytracer, which can produce a more pleasing effect.

Tactile equivalents to polytracers can be provided by presenting them on a tactile pad, such as the ones described inUK Pat. No. GB244 1434.

4.5 IMPLEMENTING "BUZZ TRACKS" AND "POLYTRACERS" One straightforward way of implementing additional sound tracks for "buzz tracks", timbre, "pillar" Page 34 and "layer" effects, and "polytracers", is to use Microsoft's "DirectSound" facilities: if additional sound buffers are opened, samples of the required track sounds (generated or recorded to present a particular pitch) can be replayed continuously in looping mode (e.g. by using the DirectSound ".Play" method with the DSBPLAY LOOPING flag set). Then the DirectSound ".SetVolume", ".SetPosition", and ".SetFrequency" methods can be used to set the volume, 3D sound position, and height-conveying pitch respectively of the samples.

If a changing sound timbre is required, then a straightforward way of implementing this is to have additional sound buffers playing in a continuous loop for each of the timbre types, then adjusting the volume of each to the appropriate level for the point in the path of the tracer, for the property they are presenting (e.g. the left-to-right location). The DirectSound ".SetVolume" method can be used to set the appropriate volume level, and decibel-based calculations used to make the volume change match human audio perception (e.g. by using S.S. Steven's Law to change volumes logarithmically, in a similar way to that used by some volume controls). In theory the system could continuously alter the shape of the sound waves being produced in order to generate intermediate sound timbres, but in practice it is found to be effective to use the simpler technique of playing several sounds and altering their respective volumes.

Although timbres for buzz tracks and polytracers can be generated "on the fly", greater flexibility is generally provided by using recorded or generated sound samples, for example held in the popular ".WAV" format.

Several buzz track profiles can be created and stored for quick and easy access. Each profile can contain several effects, each effect being triggered under different conditions. Each effect can have one or several timbres assigned: for example if a smoothly-changing timbre indicates left-right positioning, then that effect will have two component parts, one causing an increase in one timbre as the tracer moves right, and one causing an increase in the other timber as the tracer moves left. This can be set up as two "effects", and will require two sound buffers.

If several timbres are used, and if the conditions that cause them to be sounded are mutually exclusive, then they can share the same sound buffer. For example if different timbres are assigned to each quadrant of the image then all four such timbres can share the same sound buffer, with the replayed track being changed when appropriate.

For non-speech tracers, there is normally no need to "stretch" the sound samples being replayed, as they are normally continuous sounds (not speech), and so can simply be repeated.

For speech sound polytracers, the stretching process described in UK Pat No. GB244 1434 for a single tracer should be performed separately on the waveform for each tracer, and reflect the path followed by each tracer, so that the "chorus" of tracers is synchronised. As facilities already exist in the system for producing one tracer, it is straightforward to produce the additional tracers to produce the polytracer effects, by using the same routines as previously described for a single tracer, but with the input paths (and other parameters) set to those of the individual tracers.

Page 35 (There are many other techniques that could be used for implementing buzz tracks and polytracers. For example a facility could be provided to stretch sounds "on-the-fly" so that sounds that have a distinct tempo could be used for buzz tracks and could maintain their original tempo, whatever pitch they are replayed at.)

APPLICATIONS

Si PRE-DEFINED AND FOUND FEATURES The system can present both entities found in images "on the fly" by using optical processing / "computer vision" methods; and pre-defined entities from prepared media identified and marked up by a human designer. (The system can also present entities submitted by an external system -see section 5.6).

Fig. 20 summarises the process:-for non-prepared media (e.g. "live" images) the system attempts to Find (a) objects according to the users requirements, and builds a "Guide" (b) of the found objects.

Alternatively a previously-prepared Guide (b) can be used to give the objects and features that are present. Finally, the corresponding Effects (c) are presented to the user.

For example the system can use pre-defined Guide information if available, otherwise it can aftempt to find appropriate objects, and if none are found, it can output Region Layouts.

Once objects have been detected, they can be classed as identified (for example identified by a sighted designer; or by face detection software etc.), or unidentified.

Section 5.2 below describes one technique for pre-defining entities within images, in which entities are marked onto image bitmaps and a corresponding table is produced and stored on a text file which describes the nature and properties of the entities.

For non-prepared images ("live" images, etc.) pre-defined features will not be available. Section 5.3 below describes various techniques for identifying and extracting entities from images etc. The selected blobs and other entities can then be "painted" onto the corresponding image bitmap by the system, with their data bits set as appropriate, as if they had been marked up by a sighted designer; and a table can be produced by the system, describing the selected entities, in the same manner as described in section 5.2 for pre-defined images.

Said bitmap and text file can then be presented to the same routine as described in section 5.2 for pre-defined images, and processed by the system as if they were pre-defined effects. In other words, the processing for "Found"/automatically-detected objects can be handled by the same routine as is used to process the pre-defined features.

The processing can be summarised as being a) Find Objects if requested; then b) Process Guide (whether created by the Find Objects routines or pre-defined); then c) Output Effects.

The selected "objects" ("blobs" and other entities) can then be presented by stepping round the objects in order of importance or via whatever method the user has selected.

Page 36 Automatic selection of output method It is useful to have a mode of output available whereby, as an option, the system can have a hierarchy of methods for presenting objects. For example the order can be I) symbolic object path tracers if an object is identified (e.g. via face detection methods such as the Viola and Jones algorithm); then 2) unidentified objects for example object outline path tracers if a qualifying unidentified object has been found (e.g. if an object of the requested colour has been found); then 3) the Region Layout tracer of the viewzone (if no qualifying object has been found) 112 Fig 4.

(The controls 114 Fig 4 specify whether the Guide is used (if available), or whether the scene is always automatically processed in order to discover objects etc., even if a pre-defined Guide is available i.e. the Guide is ignored/overridden.) 5.2 CREATING AND USING A PRE-DEFINED "GUIDE" For prepared media, a sighted designer can highlight the entities present in images, and identify what they are, their importance etc. (Alternatively eye-tracking" technologies can be used to determine objects that sighted people look at when they view a picture or movie, in order to decide the most appropriate objects to present.) Such pre-defined entity information can be held on a separate file, or embedded in files of common multimedia formats (e.g. MP3 or AVI). The combined files are produced via a straightforward procedure, and they can also be played on standard media players (without audiotactile effects).

The pre-defined sequences are presented as a series of one or more "Views" that describe the scene being presented. These can then be linked to standard media image formats (e.g. bitmap, GIF or JPEG still images; or DVDs or AVI movie files). While the approach of embedding information in the least significant pixel data using "steganographic" methods is effective for uncompressed formats such as bitmaps, for compressed files it was found to be more straightforward to simply attach the pre-defined feature information to the end of the file. This allows the file to be viewed (and/or heard if an audio file) by sighted people using standard media players (which do not normally detect such additions to the file), but when the file is processed by the system, the system can detect if pre-defined feature information is present, by searching for a distinct set of identification data at the end of the file. If this is found then the pre-defined feature information can be separated and processed as a separate file. If the system is being developed using Visual Basic, then the "Put" statement may be found to be useful for rapidly adding data from, for example, a numerical array to a file; and the "Get" statement may be found to be useful for rapidly extracting sections of file to a numerical array. (Other programming languages have similar facilities.) The data can then be manipulated as required.

One or more "Views" can be used in pre-defined sequences (known as "Guides") to present a scene.

Movies can be conveyed by presenting several images or "Views", each lasting about 10 seconds (though the length will vary considerably depending on the material being presented), and approximately equivalent to a "shot" in a movie "storyboard". Usually one View will be used for still images and Page 37 several Views will be used for motion picture sequences, but this is not necessarily the case -for example a simple "actuality" presented as a single movie shot, for example an AVI or MPEG file, may be presented using a single View presenting the items in the scene; whereas a detailed complex still image may best be presented by a series of Views -for example, one View covering the whole image, then one or more further Views which cover smaller cropped parts of the whole image.

For each View, one or more "objects" can be presented. It is found to be effective to "step" around each object in a View, showing the most important objects and features in order of importance.

Alternatively the user can select the items that are to be presented.

Binding Guides to audio files As described above, guide files can be bound with a visual media file, for example a bitmap ("BMP") or "JPEG" file, or played accompanying a corresponding DVD. However the media to which a Guide is bound need not be a visual media file -it can instead be bound to an audio file, for example an "MP3" audio file soundtrack. The bound Guide can have images embedded in it to present for sighted users to understand, or else it can just contain the features to be presented. For example a compact "MP3" or standard "WAV" wave audio file can be bound with a "multi-view" Guide of a movie. In this way a compact file can contain a) the soundtrack; and b) the guide information (containing the corresponding pre-defined features); for a movie. This may be a more suitable way of providing the audiotactile effects for a movie to a totally blind person than using a DVD / MPEG file etc. with associated guide, because high-quality visuals are not required. In a test, a movie sequence of approximately 150 seconds was presented via a Guide file (with embedded JPEG images) bound to a corresponding MP3 file of acceptable sound quality. The resultant combined file was about 500 kilobytes in size, corresponding to a typical feature movie size of about 20 megabytes. Such a file could be added as an "extra" file on a conventional movie DVD (or other medium), allowing blind people to access certain visual features of a movie. Experiments with MP3, "WAV" and other sound-format files showed that such files with attached Guide files could also be played normally (as audio only) via standard computer media players, for the examples tested. However certain media players could potentially reject such files, so it is useful to include means of identifying files with bound audiotactile effects, for example by including certain groups of characters in the file name to indicate that audio tactile effects data has been added to the file.

Creating Pre-defined Guide sequences The following approach was found to be effective for creating pre-defined sequences:- 1) Plan the Views to be presented by the Guide. Movies can be conveyed by presenting several images or "Views", each approximately equivalent to a shot in a movie "storyboard". Sometimes they can be timed to fit the dialogue. For each view a "best" image can be selected for mark-up and presentation.

Normally still images will have one View and movies will have several, but not necessarily, as described above.

Page 38 2) For each view, determine the number of non-overlapping groups of objects and/or features occurring in the scene. Generally one group of objects will present the background, and one or more further groups of objects present the foreground and details Fig 21. "Non-overlapping object groups" (or just "Groups") are similar to "layers" used by computer image processing software such as "Photoshop", but the key feature is that objects within them must not overlap, but can be at different distances.

3) For each Group, create a bitmap file of the image but with the least significant bits masked out/cleared. These can then be marked with non-overlapping objects Fig 21 using standard image processing software such as "Photoshop" or "GIMP". Each such marking should use a colour with the least significant bits (which were cleared when initially preparing the bitmap file) set to a distinctive value. The marked objects must use the exact colour shade applicable to them: "anti-aliasing" or other smoothing features must be turned off so that non-precise colours are not included. Images and movies can be marked up to indicate the extent, importance etc. of the entities within an image to produce a "Guided" image or movie. The images with marked objects and features can contain just Objects 440 Fig 21 and/or the Paths to be followed when presenting the features 441, and "Nodes" specifying the location of start points of paths and/or corners etc. The marked "Objects" can be fragmented and spread over several areas and have limited opacity e.g. "a flock of seagulls".

"Paths" can be included to illustrate (a) the shape of objects and/or (b) the paths that objects move along in the scene being portrayed. For example for a bouncing ball, the shape of the ball, and the path that it follows, can be presented. Every path can have a start point defined (sometimes automatically).

Paths generally refer to a particular entity, usually an object, though sometimes they refer to the whole View, for example when indicating the path of a panning or zoom shot.

"Node points" can be defined within a path, giving the "Start", "Corners" and "Arc ends" within a path. A single node can combine several such properties, flagged via different bits. "Arc ends" allow path shapes to be marked/constructed via sets of circular arcs and/or straight lines. Nodes usually refer to a particular path. Nodes could be extended to include other properties, for example to include other effects.

The "magic scissors" or "magnetic lasso" feature of image processing software is found to be useful for marking-up the edge of objects in images. Although the perimeter of an object is often the best path to present, this is not always the case, and it is useful to be able to include "diversion" paths 400 which take priority when the system is presenting shape tracers representing the objects. (See section 4.2 "Object Tracer Paths" for details of other object path options.) 4) Create a text table listing the objects in the Views, and the exact colour shades applicable to them.

Standard word processing or spreadsheet software can be used for this purpose. The table can also specify the Importance, Distance, Opacity etc. of items. An example section of such a file is shown in Fig 22, which includes information applying to the whole Guide, as well as the data describing the mark-up colours etc. of the marked-up View shown Fig 21. Once completed, the table, or sections of the table, Page 39 can be saved in text-only format using the "Convert table to text" and "Save as" features commonly found in word processors. (Section 5.4 below describes a facility for automatically linking marked areas to the table of descriptors.) 5) After the text file and Markup bitmaps have been created, the system should process/merge them, creating a Guide file containing details of the objects and features in the View(s), and with the bitmap Markup information included (for example as run-length-encoded ("RLE") format information giving the extent of each object), in a structured manner so that the information can be extracted later by the system.

The resultant Guide file should contain details of the objects in the View(s), their importance, mark-up bits etc., as well as the extent of said objects. Run-length-encoding is an efficient and compact method of storing the parts of the image occupied by the objects. Figure 23 illustrates this process.

6) The resultant Guide file can be presented standalone, or attached to a standard media file, as described above.

7) The system can present the features in Views to the user in the required manner, for example by presenting the objects/features in a Group; the most important objects/features, or objects/features whose importance has been determined to be greater than a certain amount; or a selected object/feature (for example by the user positioning a pointer and pressing a button-sequence, or "clicking" on an object with a mouse); etc. (If the user has e.g. clicked on a point in an image that does not contain a defined object, then, as an option, the system can look for the object to present, for example by inspecting successively larger circles around the clicked point until an object is encountered i.e. to find the closest object that satisfies the current search criteria. ) Once selected, the system presents the effects for those object(s).

"Clicking" on an object with a mouse or similar on its own may appear to be an unsuitable approach for use by a blind person, but a method of feeding back standard mouse location, and mapping standard mouse location to an image area is described in section 5.4 below, and that can be used in conjunction with this feature. Alternatively a coordinate-based system could be used whereby the user indicates the required point by typing coordinates or alternatively a joystick or other pointer can be used to indicate a point.

Alternatively the user can specify keyword(s) included in the Descriptions of the objects/features, so that only those items whose description contains the keyword(s) are presented 116 Fig 4 (more complex object databases can also be devised for embedding in the Guide file, and classifying the objects in various ways for selection etc.). This can act as a "search"/"find" facility, and can be extended so that for a multi-View Guide, more than one View is inspected by the system when searching for specified item(s). For each object/feature the system can positions the viewzone to cover the object/feature, and can then move the tracer to describe the shape or other path for the object/feature, as well as presenting related categorical information in the manner previously described.

For movies, accessible VCR"-style controls 118 Fig 4 can control the presentation of the entities in Page 40 successive Views of a multi-View Guide.

The approach of allowing the system to "step" round a sequence of selected items has been found to be very effective.

The selected entities can be presented using any appropriate audiotactile technique as described elsewhere. The tracers can be sized to correspond to the item size and shape; or be expanded; or expanded 546 only when an item is smaller than a certain size 547 Fig 5.

As the nature of the objects is known for pre-defined Guides, an "audio description" feature can be provided to speak a description of the item being presented at any time.

(Note that Region Layouts can also be fully pre-defined and stored on a Guide file, in a similar manner to that used for objects.) 5.3 USING COMPUTER VISION The technology known as "Computer Vision" allows many of the techniques described above to be applied to specific features of live images such as those provided by a video camera; or to existing images that have not been prepared for special presentation.

There are several software libraries available for performing computer vision techniques. The application of four standard computer vision techniques will now be described using the facilities of the OpenCV package, namely blob extraction; object identification; motion detection; and object tracking.

However many other similar techniques can be applied to producing shapes and paths etc. for the audiotactile tracers to present.

A GUI can be provided for precisely controlling what is presented (not shown) wherein the user can specify the settings for any particular Activity, so that they can be rapidly implemented when particular tasks are undertaken, and the user can finely control the parameters of the computer vision package functions. Alternatively the simple controls 120 Fig 4 can be used whereby the user can select the colour(s) that the system is to search for, by ticking "checkboxes" 122 (which can also be "checked" or "unchecked" via a key sequence) i.e. several colours can be sought simultaneously. The system then performs standard blob extraction techniques, and selects those "blobs" which match the selected colours, and assigns an "importance" value based on the area occupied by the chosen "blobs".

Checkboxes are also provided for requesting searching for people's faces 124, human figures 126, and areas of motion 128, as described below.

Blob extraction B lob extraction / image segmentation techniques are useful, as the perimeter, or medial line, etc. of the extracted blobs can be the features that are presented via tracers. Filtering methods such as "moving average" can be used to reduce the detail in an image, and pixels falling into particular colour bands can be grouped together to form blobs 702, which can then be presented to the user. Blob resolution can be improved by performing standard optical processing techniques such as eroding and dilating the blobs.

Page 41 Several different colours can be grouped together 706 -for example the pixels of the colours Red, Orange and Yellow can be handled as a single shade, so avoiding the fragmentation of objects that contain several such colours.

Users can specify the maximum number of such blobs that should be presented, and the selection criteria.

If no "blobs"/obj ects of the required colours are detected, it is found to be effective to automatically adjust the filter controls, for example image brightness and contrast, to several settings, and for each combination of settings attempting to detect matching blobs again. Some experimentation will be required to find the best adjustments to make and the best "trade-off' between being able to find matching blobs, and spending excessive time searching for them.

Object detection As previously mentioned, there are methods available for detecting the presence of particular objects within an image, and the system can provide this facility. The OpenCV package provides the function "HaarDetectObjects" function, which allows faces and other "objects" to be identified. The function implements object-detection via "Haar classifiers" using the "Viola-Jones detector" technique. The user can specify which object types are sought, and how many are presented and the selection criteria. For example faces can be presented to the user as easily-recognised Symbolic Object Paths Fig 10.

Motion detection The presence of motion within a sequence of images (for example live video images) can be detected via a variety of methods. For example the OpenCV function "CalcOpticalFlowPyrLK" allows points to be tracked between successive images, either by picking particular points in the image to follow (for example by using the OpenCV function "GoodFeaturesToTrack") , or by specifying a regular grid of points. If the latter approach is used, then an effective technique is to present two images to CalcOpticalFlowPyrLK in "reverse" order normally CalcOpticalFlowPyrLK is presented with two images, and it tracks the movement of the points in, say, a regular grid arrangement in one image to where they appear in the second image (which will not normally form a regular grid). By presenting the images to CalcOpticalFlowPyrLK in the reverse order, the locations to which the points appear to move form a regular grid, rather than where they start from. The grid of motion end points can be consolidated to form "blobs" 710 Fig 24 whose shape and location can be presented to the user by using the techniques described previously. The direction of the "tails" of the arrows showing the direction of motion around the perimeter of the blob 712 can be used to detenriine the overall direction of movement.

In Fig 24 a lateral move left is occurring. Fig 26 shows the motions that may occur if "zooming" and rotation are occurring. Notice how the moves are most pronounced at the perimeter 714.

By analysing the direction of motion flow, motions such as rotation tilt "zoom" (growth or shrinkage); and lateral movement (up, down, left, right etc.) can be determined. Lateral moves are Page 42 straightforward to determine, as all points tend to move in the same direction. However by sampling the angles 716 Fig 26 that the moves make with the radius lines 718 that run from the middle of the motion area to the perimeter, zoom and rotation movements can be estimated. (Further information can be gleaned by inspecting other moves within the area of movement.) The derived information can be then be presented to the user either directly (e.g. via speech synthesis), or more intuitively via buzz track effects such as altering the timbre of a buzz track that is conveying the shape and location of the area of motion 710 Fig 65.

Motion tracking Once an area of interest within an image has been determined, for example via face detection or motion detection as just described, the system can track (i.e. follow) the entity concerned. For example the OpenCV function "CamShifl" returns an ellipse 720 Fig 27 giving a probable location and extent of a particular entity whose initial location is given to OpenCV. An effective way of doing this is to allow the user to trigger tracking when a particular entity is being presented 130 Fig 4. If Motion 128 or Faces 124 are being presented, then when, for example, a particular face is being output, the user can select tracking 130, whereupon the normal image processing is interrupted, and the location of the current object (or motion blob etc.) is determined and that area is then tracked. (Alternatively an arbitrary area of the image can be selected for tracking, for example via a mouse-drag action, or via more accessible methods such as speech recognition.) The CamShifl function is relatively speedy in updating the tracking ellipse 720 Fig 27, so that the system can call it several times per second, and so update the location of the audiotactile tracers that can convey the location, size and orientation of the tracking ellipse 720 as it moves around.

The motion of the tracked area can be presented by the main tracer following the centre of the tracked area 722 Fig 27. This tracer can present speech information, and have buzz track effects applied to it, in a similar manner to standard tracers, except that it will be reporting a continuously-changing entity (namely the centre of the tracking ellipse 720) rather than presenting individual shapes etc. within successive image "snapshots". Additional tracers can present polytracer-like effects, for example by separate tracers following the paths of the corners of the ellipse-enclosing rectangle 724 Fig 27, and/or the paths of the ends of the ellipse cross-hairs 726. In this way an intuitive impression of the speed, location and area of motion can be conveyed.

Similar types of change to those that can be detected for motion can be directly reported for the tracking ellipse, for example zoom (shrink/grow); tilt; rotation; and lateral motion.

It is also useful to report the type of entity being tracked, for example if it is a face, a motion area, or a specifically-selected area -for example by the system speaking the words "Tracking Face", "Tracking Motion" or "Tracking Selection" respectively.

The OpenCV function "CamShift" provides a robust tracking method that does not use excessive processing resources, but other methods can be used: for example the previously-described OpenCV Page 43 function CalcOpticalFlowPyrLK can be used to track points within an area of interest.

The effect perceived by the user is similar to that produced by the previously-described polytracers, but whereas the effects produced by polytracers mainly convey areas within a particular image, with motion tracking the rates of movement of the tracers convey the speed and nature of the movement of the entity.

The tracers that convey tracking motion can use some of the techniques previously described for buzz tracks, for example conveying direction of movement, speed of movement etc. via qualities of the audiotactile effects, for example the timbre or pulse rate. The tracking effects can be similar in nature to those described for buzz tracks and polytracers (for example alterations to the timbre, volume, pitch and beeps', or corresponding tactile effects) but can alternatively convey different quantities, for example the tilt of the tracking ellipse.

Several items can be tracked simultaneously using the methods described.

In the tactile modality, the main tracer path will by default also follow the path of the tracked area (e.g. a force-feedback joystick can follow a path related to the path of the centre of the tracking ellipse. Buzz track and other effects can be applied to the tactile display (e.g. force-feedback joystick) in order to clarify the path and convey other information.

The motion tracking feature provides a facility for blind people to perceive the paths followed by moving entities.

Processing simple images There is one type of image which is straightforward to handle and very effective to present, namely simple images or visual materials containing a limited number of colour shades, and with clearly defined areas 132 Fig 4. Examples of such materials include certain maps, diagrams, cartoons etc., and these are often encountered, particularly in environments where a computer might be being used (e.g. office or educational environments). It is important that the invention handles such materials effectively. Though they can be handled via the standardlgeneral routines that handle any type of image, it was found to be effective to have special processing for simple images.

The system could be programmed to use the following processing approach, which was found to be effective: -a) Before doing general image processing, sample the image pixels, or inspect all image pixels if the image is not too large, and if the number of different colour shades exceeds a particular number, for example 15, handle the image as a standard (non-simple) image. Otherwise perform simple-image processing as follows.

b) Process the image perimeter pixels, and define the background as being the most popular colour shade found along the perimeter of the image. Often the background will be white (or black), but not necessarily, and it is useful to be able to exclude such areas of background when presenting objects in a simple image. Inspecting the perimeter in this way was found to be a simple but effective way of Page 44 automatically determining the background colour shade. (The background colour so determined can also be used to control "figure/ground" effects, for example as used for object-related layouts described above. It can also be used to control the dot "down" colour for layouts presented on a braille display etc., so that non-background colours are emphasised as dot "up".) c) Process every pixel of the image, and for each pixel blank the lower bits and set them to a value that corresponds to the number of the limited range of colour shades detected (for example 1-15). The system can either handle non-contiguous same-coloured areas as being a single fragmented object, or as several objects of the same colour -for the latter case, different identifying bits should be assigned. If many separate areas of colour are present then blob selection will be necessary, using a similar technique to that described elsewhere.

d) Create a text file describing the selected blobs, in the same manner as described in section 5.2 above.

Blobs can be assigned an "importance" value based on the area that they occupy i.e. assign a higher importance to larger blobs (but do not assign a high or any importance to the background).

e) The processed bitmap and text file can then be processed by the standard output routines as previously described, and the appropriate effects presented to the user (outlines, corners, colours etc. as required). No image filtering such as moving average smoothing etc. will normally be required.

This approach is straightforward to implement for simple images held on "lossless" file/image formats that define a precise shade to each pixel, for example ".GIF" and ".BMP" images. Formats primarily designed for general images and/or photographs, for example "JPEG" images, often use compression techniques that result in varying pixel values for material which was originally all of one value. For the latter cases, the approach can be adapted so that minor smoothing and simplification is performed prior to the image being analysed for "simplicity". For example the image can be sampled, and if most pixel colour values cluster around a limited number of colours then each pixel can be set to the nearest of those colours. Mild moving-average smoothing can optionally be applied. For "live" simple images, de-speckling techniques can be used to eliminate "noise" pixels that are often found in such images, for example by ignoring individual pixels that differ greatly from surrounding pixels.

When using the technique described above, if the images need to be resized then anti-aliasing effects must be avoided, as these will tend to produce pixels of intermediate values that do not correspond to any of the limed number of colour shades used in the simple image i.e. the system should "decimate" or repeat pixels (i.e. not anti-alias the image) if an image needs to be stretched or resized.

Various techniques can be used to refine the given approach for simple images, for example by using a Guide file (attached or standalone) giving the range of colour shades used identifying the objects in the image; and tailoring the presented categorical shades to match the distinct shades; etc. The checkboxes used to specify colours to find 122 Fig 4 can optionally temporarily change to match the found colours.

Page 45 5.4 CREATING AND ACCESSING AUDIOTACTILE IMAGES A facility can be provided whereby a user can paint/draw simple shapes on a background, in a similar manner to that followed when using standard computer painting programs. The facility can be used by a blind person to create images and present corresponding audiotactile effects. It can also be used to facilitate the process of creating pre-defined Guides (see section 5.2 above).

The resultant images can be immediately replayed; edited; and saved. A blind person can check the created image. Audio feedback can use sounds similar to those used for buzz tracks in order to clarify locations and shapes.

The facility can include features tailored to the marking-up process for pre-defined Guides (see section 5.2 above), for example by filling in closed shapes with the same colour; or by allowing the least significant bits of an image (e.g. a photograph) to be cleared, then automatically incrementing the marker bits of selected mark-up colours and producing the corresponding text file listing the colours used, in a similar manner to that described in section 5.2.

The "drawing" facility can be implemented as a general facility for marking up images with features to be presented as audiotactile effects. Fig 28 shows an example GUI for such a facility. In the figure, an image 500 is being "marked up" with objects 502. The incorporated table facility 504 allows the user to edit a Guide table and bitmaps in a more convenient manner than if using a separate word processor or spreadsheet and image editing program as described in section 5.2. For example the system can automatically adjust selected colours 506 so that they are precisely the correct colour to align with the Guide, as is required for the system to link objects marked-up in the image 500 with the objects contained in the Guide table 504.

The simple editing controls 520 allow the user to draw lines (for example via a computer mouse) which can then be presented as tracer paths; or "closed" lines can be "filled" with colours assigned to particular objects 502, which are then presented using the current system settings as described above. Corners can be marked by the user clicking up and down with the mouse on the same point; or by pressing a certain key on the keyboard; or by holding the mouse still for a period of time, after which the system can interpret this action as a request for a corner to be included at the point where the mouse is located.

Controls 508 allow the user to clear the image to a plain background, so that simple lines and shapes can be drawn and immediately presented (alternatively the current image can be selected as a background by using a control 510, allowing the user to trace round items etc. for markup purposes). When a new background is selected in this manner, the system clears the special "markup" bits of the image (e.g. the lower bits of each RGB colour component for the example shown in Fig 28), so that subsequent markup drawing performed by the user using the precise colours shown in the Guide 504 (with the lower bits set as per the Guide) can be identified by the system as markup information.

Paths drawn onto the image by the user can be presented directly (by activating a control 512), or the system can present the "blobs" drawn on the image using the current system settings 514 (for example using the current settings for object tracer-type -e.g. perimeter/outline, medial line etc.). If the path is Page 46 presented directly 512, then the relative timing used when drawing the path can optionally be used to adjust the relative speed at which sections of the tracer path are presented (this option can be selected via the checkbox-style toggle-button control 516).

The facility shown in Fig 28 can be made accessible to blind people by allowing them to use a joystick 522, or a mouse with constrained movement such as the Logitech Force Feedback Mouse, to draw lines and indicate points in the image 500. An "unconstrained" mouse (i.e. standard mouse) can also be used, as described below. For blind users, stereophonic humming sounds using similar conventions to those used for "buzz tracks" can be used to give continuous feedback to the user about the location of the mouse pointer (i.e. software drawing "pen") at any time. When users move a mouse (or joystick) in a certain path, the feedback sounds that they hear will be similar to those produced when a tracer moves in the same path, and they will hear similar sounds when the buzz track" of the same shape is replayed.

The same effects as are described above for implementing "buzz tracks" (and appropriate timbre and Pillar/Layer effects) can be used to provide feedback to the user as they move the mouse (or joystick).

A "dwell" mouse action can be used to mark specific corners, so that the effects associated with corners can be triggered. The audio feedback can optionally be made louder when the "mouse button down" action is being used, for example to mark the route of a path. Corresponding tactile effects can also be provided if a force-feedback device is used.

The system can "speak" location coordinates via speech synthesis, when appropriate, to indicate a particular area of an image, or present them in a tactile manner, for example via tap codes.

A facility can be provided whereby if a force-feedback device or similar device is used, then the device movement is constrained to vertical, horizontal or diagonal movements when instructed by the user.

Other constraints can also be applied, for example to move in a smooth arc. When such constraints are applied, the action of the force-feedback device will be feel similar to moving a stylus within a groove on a surface. If such a facility is provided, then it is useful to also provide a facility to quickly activate or deactivate such constrained movement (for example via a key or button sequence), so that the user can rapidly and easily move the device position to a new location.

In the tactile modality, column and row positions can be emphasised by implementing a force-based "grid" so that a force-feedback mouse or joystick tends to follow particular row or colunm positions. A tactile effect can be triggered when row and column positions are passed, with different effects used for each direction of movement, in a similar manner to that used for audio "Pillar" and "Layer" effects.

Buttons on the joystick can act as mouse buttons.

Alternatively a keyboard can be used to draw lines etc., by using a system of coordinates (as described point (h) of section 6 below) to specify the start of lines and arcs, and the presence of explicit corners.

Such points can give the beginning and ends of straight lines andlor curved arcs that are being used to "draw" or "markup" the image. The arrow keys (or numeric keypad) can be used to progress the path of a line.

An important feature of the draw/markup facility is allowing users to add specific common shapes to Page 47 an image 526 Fig 28, for example simple component shapes such as circles, squares etc., or specific standard objects such as faces, trees, cars etc. The latter can resemble their corresponding "symbolic object paths" FiglO. The user only needs to specify the object type (e.g. via keyword or object number), and the start and end point of the object (using similar techniques to those used to specify the start and end of a line or arc). The object is then interpolated between the start and end point given (for example via a mouse or via speech input), making object drawing very straightforward for blind users. The object can be "stretched" or squeezed 528 Fig 28 by the user optionally supplying a percentage or fraction. The resultant shape can be "filled" if required, and feature as part of a guide (and be included in Object-related Layouts etc.).

Many similar methods can be devised. For example a joystick or mouse (or similar device) can be used to define the end of a line or arc (from the current start position). When such a facility for drawing a straight line or curved arc is being used, as an option the "buzz track" sounds corresponding to the current line or arc as defined by the current joystick or mouse location can be output repeatedly, so that a blind user can assess the shape as they position it (the current line or shape can also be output as a tracer path on a force-feedback device, to allow tactile assessment of the shape).

When the facility shown in Fig 28 is requested, if no Guide is active, then a simple default Guide can be loaded, for example a Guide containing one View, one Group and one Object 229 Fig 28. This can be added to by using the control buttons 518, as the user marks up the image.

Numerous other image editing facilities that are found to be useful can be included in the drawing/markup facility. For example an "auto-complete"/"auto-fill" facility can be provided whereby a line is automatically joined up to its start point (not shown), and the resultant closed shape optionally filled with the appropriate colour to represent an object.

Using a computer mouse to draw images An unconstrained computer mouse is normally considered to be of little use to a totally blind person, as they are unable to visually follow the mouse pointer on the screen. However if the location-conveying buzz track audio feedback method described above is implemented, then the user can be aware of the mouse location, and the shape of the path in which it is moving.

However a problem with this method is that for a drawing application such as that shown in Fig 28, the user has to locate the mouse pointer in the drawing area/"canvas" 500 Fig 28, which is difficult to do even with audio feedback. A solution is to allow the mouse to be moved anywhere on the computer's screen/"desktop" area (i.e. use the full dimensions of the screen), but with the location being processed to map to the canvas area e.g. if the user moves the mouse half-way across the screen then this maps to half-way across the canvas. In this way the user does not need to be concerned about staying within the canvas area 500, and gets audio feedback on where the mouse pointer is currently located. The mouse pointer is free to move over the "desktop" or other applications (as long as they are not set to act on the mouse merely moving or "dwelling" over them, which is generally the case).

Page 48 As the mouse may move over other applications, the standard main mouse button cannot be used in this mode. To do a "mouse button down" action e.g. to "click down", or hold the mouse button down to draw a line etc., users can use a particular keyboard key, such as "M"(ouse). An alternative is to use the middle button of a 3-button mouse, as it is not used by many applications, and usually produces no change when clicked over most applications (though this is not always the case). (Modern "scroll-wheel" mice often have a third/middle bufton combined with the scroll-wheel, so that if the scroll button is pressed downwards, then a middle-button action is triggered.) The action of mapping "full screen" mouse movement to the canvas (as described above), and use of the middle button to act as "mouse button down" provide a relatively intuitive way of drawing shapes, and work well as long as other applications that might be open are not set to use the middle button (if they are, then the keyboard-simulated "mouse button down" action can be used instead).

To implement this action under Windows, the system can use a timer or loop to call the GetCursorPos APT function to find the mouse location, and call the GetKeyState APT function to detect the middle button state.

An incidental use of this special mouse action can be to move focus to the system's application window via a middle click. It can also be used to specify the required Viewzone location within the image, when it is zoomed in. Additionally it can be used to give location when the user is clicking to select "objects" as described in section 5.2 above.

A modified computer mouse could be used for this special mouse action, with no active buttons, or alternatively just a single button acting as the middle button of a conventional mouse. (If the latter course is used then an embedded scroll wheel facility is useful for controlling zooming actions etc.) The facilities that enable blind people to create "audiotactile images" (e.g. to draw lines and objects as described above) can be accessed by methods that include Computer Mouse, Keyboard, Joystick, and also by Voice recognition. Voice recognition tends th work best if the "Command and Control" approach is used i.e. only allowing a limited number of command words to be recognised by the system.

In addition to using a conventional mouse, "alternative" input devices that simulate the action of a mouse may be used. For example a "graphics pad" can be used -some people may prefer the fixed active area, which can be clearly felt, and the stylus input. An interactive "whiteboard", or a computer touch-screen, can be used in a similar manner. Such devices sometimes allow their switches to be mapped to the mouse middle button action. Other new methods that are devised can be used as an input device if appropriate, for example using gesture-based input, Gyration's "Air Mouse", or Microsoft's "Kinect" system. Numerous other input methods can be used.

5.5 CREATING AND ACCESSING DATA, GRAPHS, AND WAVEFORMS In UK Pat No. GB244 1434 mention is made of data that can be presented visually. The visual representations presented by the system can be for example a 2-dimensional line graph, and the lineal features can be lines on the graph, with corners highlighted on change of line direction. This concept has Page 49 been developed with the present invention, and new features are provided for presenting data, graphs and charts, and waveforms.

Example approaches will be described, but the techniques can be extended in many ways.

Data (for example as held in a computer spreadsheet) can be presented in the form of audiotactile shapes, in a similar manner to the methods described for other shapes, the shapes being similar in fom to certain standard visual graphs and charts. The system (or an external program) can read the data and process it into the form of path shapes that are reminiscent of standard charts and graphs Fig 29. The path followed by the tracer in presenting non-visual effects to convey an "audiotactile graph" resembles the corresponding conventional visual graphs and charts of the same type, for example line graphs 600 Fig 29, column charts 602, pie/doughnut charts 606 etc. Such "audiotactile graphs" are particularly effective when presented with "buzz track" effects applied, as they clarify the shape being presented.

One key difference from general image presentation is that the distinct indicator effects, which normally convey the presence of corners, can be used to represent data points. Such effects would sometimes be generated by the system anyway if it was presenting shapes such as 600 Fig 29 when the slope of the line graph changed suddenly, but the system can explicitly generate similar indicator effects to represent data points 601 even if no corner is present, such indicator effects being presented at the moment when the tracer is passing the locations of the data points occur within the line graph, in a similar manner to the way corners in shapes are presented. Such effects are output even if the angle of slope does not change so as to form a clear corner 603.

As the apparently-moving tracer presenting such shapes usually "travels" at a constant speed, the time spent presenting each section gives cues as to its size. For example for the path corresponding to a pie chart 606, the time spent on each section is proportional to the size of the pie chart "slice" represented.

A computer spreadsheet Fig 30 includes values to be presented 620, and special identification elements 622, which can be created by a user, or generated by another system. The identification elements 622 contain a distinct prefix which is unlikely to occur otherwise, followed by an identifier. These are followed by data and other values 620. These can, by a convention, describe the type of graph to be presented (for example LIN"=Line graph 600 Fig 29, COL"=Column chart 602, BAR"=Bar chart 604, "PIE'=Pie/doughnut 606, NET'"Net'/'radar' 608, or WAV'Wave file -described later). As shown in Fig 29, the path followed in presenting non-visual effects to convey the graph resembles the corresponding conventional visual graphs and charts of the same type.

The graph type can be overridden by the user. The range of the graph, and the location of Pillar effects etc. can be given on an information row 624 Fig 30. Other information, such as categorical information to be presented etc. can also be included. Other rows of the spreadsheet contain the data points to be presented 626.

The system presents the data in the form of a shape, preferably with buzz track effects applied, which conveys the data. For example 600 Fig 29 shows a linear path which resembles a line graph. Similar Page 50 graphs have previously been presented by using optophone-style audio mapping. However by using audiotactile "tracers" that can move in any direction, shapes that resemble other graph and chart styles can also be presented via non-visual effects Fig 29.

If special audio effects (or short tactile buzz effects) are a being used for indicator effects (rather than, for example, momentarily stopping the tracer), then their frequency can relate to the frequency assigned to location (e.g. height) at which they occur, so conveying and emphasising the data point value.

The following features can be provided, for example controlled by a GUI Fig 31, or specified via the input file:-a) Presenting data points as indicator effects 630 Fig 31, as described above.

b) Equal time legs 632. The tracer will normally travel at an even speed, but optionally the speed can alter so that an equal time is assigned to each data point, the tracer travelling faster between data points that are located further apart spatially. This approach may be preferred by some users. Other effects such as timbre can be used to distinguish separate legs.

c) Switch sound between points 634. The tracer buzz track sound timbre can on change on change of leg. This approach is effective when Pie charts are presented 606 Fig 29, as the several sections are emphasised in this way. Particular timbres can convey the nature of the item represented by the corresponding leg. (Tactile effects can also change on change of leg, in a similar manner.) d) "Spiky" charts 636. The line representing the graph can return to a zero point 603 (or other value) between data points, in order to emphasise the height/location of the data point. This will produce a "bouncing" effect between points. This feature can be used to produce effects that simulate visual column and bar charts 602 & 604 Fig 29.

Several "rows" of data can be presented, for example in sequence 606 & 610 Fig2 9.

One minor problem with presenting graphs in this way is that for certain types of graphs there is a discontinuity between one graphical "line" and the next; or, if the graph is being repeated, between presentations of the same line. For audio-only effects this may be the desired effect, the discontinuity highlighting the start of a new line. However if the path is being presented on a tactile device, for example a force-feedback device, then the sudden movement of the device back to the start position can be distracting. One mitigation is, as an option, to alternate the direction in which the line is presented, so that, for example, the graph shown 600 Fig 29 would be presented as effects moving left-to-right, then right-to-left, the system alternating between the two directions.

Presenting wave shapes As well as providing facilities for presenting data in the form of graphs and charts as described above, the system can also present waveform shapes, for example for educational purposes.

The system can be controlled via a spreadsheet Fig 32 in a similar manner to that used for graphs, but Page 51 with parameters for waveform shapes provided; or via bespoke GUI controls 640 Fig 31 that can be used to trigger the output of shapes of particular waveforms.

For example the waveform types presented can be sinusoidal waves 650 Fig 32, "rectangular" waves (including square waves), sawtooth-shaped waves (including triangular waves), and other wave shapes.

The parameters of the waveform can be controlled in a similar manner (i.e. via a spreadsheet or GUI), and can include wave shape 650; the number of waves to be presented 652; the wave minimum to maximum range 654; the start phase 656; etc. Several waves can be added/combined, and the resultant waveform presented. The several waves can be specified via a spreadsheet Fig 32; or via bespoke GUI controls 638 Fig 31. For example a "Fourier Series" of several sine waves can be combined to demonstrate how a periodic signal can be represented as a sum of sinusoids Fig 32 & 638 Fig 31.

The approaches just described have the advantage that they can be used by blind people to specify/create graphs, charts and waveforms, as well as access them via audiotactile effects, because certain spreadsheet programs have comprehensive accessibility features that allow blind people to use them. A user could typically set the system to repeatedly present the graph or waveform content of a spreadsheet (highlighted via the special text 622 Fig 30), then change the content in the spreadsheet in order to experience the effect of changing shapes, parameters, etc. A convenient method of interfacing standard spreadsheet programs with the system is to save the spreadsheet data in the popular text-only "comma separated variable" format (".CSV" format), so that it can easily be read by the system.

With all of the graph, chart and waveform, and other techniques described in this section (and elsewhere), the "buzz track' techniques described previously can be used, where appropriate, to clarify the shapes presented.

5.6 INTERFACING WITH EXTERNAL SYSTEMS As well as providing bespoke facilities for the content of particular spreadsheet formats to be read and presented as shapes, the system can provide a more general facility for presenting shapes generated by external applications. A useful feature is to allow external applications to provide parameters for shapes and corners or other point-like features (to be represented by the distinct indicator effects) in order to allow particular shapes to be presented.

One approach is for a standard-format text-format file to be produced by the external application, with a file name of a particular format, which is saved by the external application to an area accessible to the system. The system can then search for files of the standard format, and allow the user to select them for presentation by the system.

Such an approach allows the system to be used for particular specialised applications. For example the graph, chart and waveform facility shown in Fig 61 could be created as an external application. Many Page 52 other applications can be devised which generate shapes, for example games, map applications, educational software etc., and these can be focused on particular niche applications, whereas the main system may be designed for more general use.

The standard-format text files generated by external applications to provide shapes for the system to present can contain the coordinates of lines, arcs, and indicator effects, as well as sound and tactile effects to be presented, categorical information to be presented, etc. Alternative effects can be included, for example alternative sounds (by giving the file name and location of a sound file).

Both standard-format text files containing shapes and other details, and more standard visual representations such as image files, can be presented to the system via an indirect "pointer file" 642 Fig 3 1, wherein the name of a text file or similar is presented to the system, said text file containing the name and location of the actual file to be processed. Such an approach can have the advantage that the external application can change the image, shape, etc. to be presented by simply changing the contents of a small text file, without the user having to intervene.

A facility can be provided wherein the system can output bespoke sounds (and other effects) to be presented (e.g. in the form of a sound file) or alternatively text that is to be converted to speech sounds, and the tracer path that the sounds are is to follow. The text can be converted to speech sounds using, for example, Microsoft's "Text-to-speech" (TTS) facility. The sound files might, for example, contain a verbal commentary, e.g. describing additional information about the visual representation, such as the data points on a graph. Sound and text files of this type can be provided by an external application.

"Buzz track" effects can be added to the presented tracers that are generated via external applications, in order to clarify the shapes presented.

5.7 USING A "VIEWFINDER" TO CAPTURE IMAGES The presented images can be gathered from various sources, such as files, DVDs, or live video images, and these may be specifically handled by the system. However a more general facility can be provided by defining an area of the computer screen contents as being the image to be presented Fig 55. This area can be controlled via a sizeable and moveable "viewfinder" frame that can "hover" over any part of the screen. The screen content framed by the viewfinder is then captured and the image replayed in the same maimer as other images e.g. it can be stretched to fill the square image area, and presented in whatever maimer is currently selected.

The "viewfinder" Fig 33 can be sized and moved via the keyboard, or via an mouse used with audio or tactile feedback (e.g. in a similar way to that described in section 5.4 above for drawing paths). One effective way of using the mouse in such cases is for the user to use the mouse to define/mark the area to be presented, either via an approximate rectangle or loop, or simply a diagonal. On a signal from the user, the viewfinder is positioned by the system over the area of the computer screen enclosed by the rectangle that "frames" the defined/marked area (with straight edges parallel to the screen edges), and the Page 53 content of that area is presented. (A similar approach can be used by blind users to specify rectangular areas for other purposes, for example for specifying a "viewzone" size and location.) The "viewfinder' facility can "snap" parts of the desktop, or parts of any application that is not handled by the bespoke image gathering facilities. (For some video applications it may be necessary to set options so that "overlay" mode is not used, as images displayed using overlays may not be snapped correctly.) It is useful to provide facilities to lock the viewfinder 564 Fig 33 (to an area, or to a particular application 566 -so that it "follows" it if the application is moved); to "see through" the system application 568; and to store and retrieve the settings for later use 570.

To implement such a "viewfinder" facility under Windows, the SetLayeredWindowAttributes API function can make a form's background transparent, producing a "viewfinder" effect 572 Fig 33. The system can trigger a screen snap action to get the screen contents into the operating system's "clipboard", then call the GetSystemMetrics API function to obtain the viewfinder size and location within the screen area, in order to determine the required area of the screen snap.

The scroll wheel on a mouse can be used to zoom the window in and out, with appropriate audiotactile feedback given. Alternatively a keyboard-sequence can be used for this purpose.

6 OTHER FEATURES The invention is not intended to be restricted to the embodiments described herein and may include the following features: -a) The system can output a characteristic effect when it is not presenting particular information, but is waiting to present new effects, or, in the case of a physical device such as a force-feedback device, moving to a new location. For example, when presenting objects, if no suitable objects can be found to present, the system can instead output a characteristic sound and/or tactile effect. Alternatively the system can be silent at such times.

b) Standard software features such as "drag and drop" processing, and accepting images and/or Guides "pasted" from other software, can be provided.

c) The moving tracers described herein may be able to be presented via retina stimulation systems, for example using the methods recently used by Moorfield's eye hospital.

d) The optical processing technique known as "Adaptive Thresholding" can be effective for controlling the brightness and/or contrast etc. of sections of the image being processed at any time.

e) Voice and other sound samples can be of one frequency, and then be pitch-shifted through the required range, as described in UK Pat. No. GB244 1434. Alternatively a "multisampling" approach can be used wherein samples are recorded at several points within the required pitch range, in order to produce more realistic sounds.

Page 54 f) It is effective to have a feature whereby, as an option, the user can instmct the system to evaluate the colours in Panels etc. as described in UK Pat. No. GB244 1434, but use monochrome processing when deciding the segment levels. This will have the effect of informing the user of the categorical colour shades present within each Panel, but provide the user with a monochrome representation of the layout of the segments in the Panels, which may be simpler to comprehend.

g) It was found to be effective in certain circumstances to use short straight legs instead of curved arcs as described in UK Pat. No. GB244 1434 when processing e.g. outline shapes, as these can more easily be distorted by the system e.g. stretched when required. Either method can be used as appropriate.

Automatic corner detection can be implemented by calculating the angle of slant of tracer legs (described above) and noting when the change in slant angle between successive legs exceeds a certain angle -such occurrences being deemed a corner. Very short legs should be combined (by obtaining the beginning and end of a group of such short lines) and the slope of the combined legs used, so that false corners are not detected.

h) When the user wishes to indicate an point within an image (for example to request the presentation of objects at or near such a point), they can input coordinates of a point (e.g. via keyboard or speech recognition), for example by giving two decimal values (to whatever precision they require), indicating the horizontal and vertical coordinates of the point in question. For example ".723,28 8" can indicate a precise point, while ".7,3" can indicate a point in the same area/environlneighbourhood, but given with less precision (but being quicker to type). Or a very simple "letter and numeric" convention can be used, for example using the nine letters A to I and the numbers 1 to 9 to indicate the horizontal and vertical coordinates e.g. "Al" for the top left, and "A9" for bottom left etc. Alternatively the standard approach of specifying successively-smaller squares numbered 1 to 9 within a 3 by 3 grid can be followed. Such techniques can be used to make some of the features of the system more accessible. For example the image in the "drawing" facility shown in Fig 45 could be accessed in this way. The system can "speak" such coordinates, when appropriate, to indicate a particular area of an image. (Alternatively a joystick or other pointing device with a constrained range of movement, or special mouse action, can be used by blind people to indicate a point in an image, as described above.) i) As an option, only the indicator effects that represent point-like features (e.g. corners, data points etc.) of shapes can be output (i.e. no intermediate moving tracer presented), for example for shapes with several corners -the user can then mentally "join up the dots" in order to get an impression of the presented shape. For some users this may produce an "audiotactile optical illusion" of the full shape.

Page 55

ADVANTAGES

From the description above, a number of advantages of some embodiments of my improved audiotactile vision system become evident:- (a) The "buzz track" feature gives clearer perception of shape.

(b) The "pillar" and "layer" effects allow clearer perception of line slope.

(c) Several minor improvements are included which make the system more useful and easier to use.

(d) It can be used for numerous application.

CONCLUSIONS, RAMIFICATIONS AND SCOPE

Accordingly, the reader will see that the system addresses several of the shortcomings of previous inventions in the field. Furthenriore, the system has the additional advantage that one preferred embodiment can be implemented using low-cost standard computer hardware.

Although the description above contains many specificities, these should not be construed as limiting the scope of the embodiments but as merely providing illustrations of some of the presently preferred embodiments. Numerous modifications may be made to the arrangements that have been described without departing from the true spirit and scope of the invention.

Thus the scope of the embodiments should be determined by the appended claims and their legal equivalents, rather than by the examples given.

Claims

CLAIMS1. A method of communicating visual representations comprising: acquiring at least one lineal feature related to said visual representations, said lineal features having paths; processing said paths of said lineal features into the form of at least one apparently-moving effect of particular timbre, wherein said apparently-moving effects follow paths that relate to said paths of said lineal features; providing at least one display means, wherein said display means include display means selected from the group consisting of: at least one audio display means, at least one tactile display means, and combinations thereof; outputting at least one of said apparently-moving effects of particular timbre to at least one of said display means; whereby said lineal features within said visual representations can be perceived via said apparently-moving effects of particular timbre.
2. A method as claimed in Claim 1 wherein said apparently-moving effects of particular timbre are apparently-moving buzzing effects.
3. A method as claimed in Claim 1 or Claim 2, wherein said apparently-moving effects are stereophonic sounds.
4. A method as claimed in any preceding claim, further comprising: second acquiring at least one non-lineal feature related to said visual representations; second processing said non-lineal features into the form of at least one separate effect; second outputting at least one of said separate effects to at least one of said display means; whereby said lineal features can be perceived via said apparently-moving effects, and said non-lineal features can be perceived via said separate effects.
5. A method as claimed in Claim 4 wherein: said non-lineal features include features selected from the group consisting of: point-like features, categorically-perceived properties, continuously-changing properties, and combinations thereof said second processing said non-lineal features into the form of separate effects includes processing selected from the group consisting of: processing said point-like features into the form of short distinct effects, processing said categorically-perceived properties into the form of categorically-perceived effects, and processing said continuously-changing properties into the form of continuously-changing effects; whereby said separate effects have qualities that relate to the qualities of said non-lineal features from which said separate effects are processed.
6. A method according to claim 5 wherein said categorically-perceived properties include categorically-perceived properties selected from the group consisting of: categories of visual properties, spatial arrangements of said categories of visual properties, classified entities, and combinations thereof.
7. A method as claimed in any preceding claim, further comprising: dividing said visual representations, or portions of said visual representations, into at least one matrix of elements, said elements having borders between adjacent matrix elements; Q third acquiring at least one border crossing point within said paths of said lineal features, said border crossing points having locations substantially at the intersections of said paths of said lineal Q features and said borders between adjacent matrix elements; third processing said border crossing points into the form of at least one border indicator effect, said border indicator effects optionally conveying the direction of crossing; third outputting at least one of said border indicator effects to at least one of said display means; wherein said border indicator effects are output substantially when said apparently-moving effects reach locations that relate to said border crossing points; whereby the shape and slope of said lineal features, and locations within said paths of said lineal features, can be more accurately perceived.
8. A method according to any preceding claim wherein said lineal features include lineal features selected from the group consisting of: perimeters of entities, medial-lines of entities, lines of symmetry of entities, paths that symbolise the classifications of entities, a plurality of lines each related to sections of entities, and combinations thereof.
9. A method as claimed in any preceding claim, wherein a plurality of said apparently-moving effects are output simultaneously.
10. A method as claimed in any preceding claim wherein said visual representations include visual representations selected from the group consisting of: live images, recorded still or moving images, created still or moving images, filtered still or moving images, still or moving images prepared by a person, maps, abstract shapes, descriptions of shapes and point-like features and other visual properties, visual representations produced by computer vision processing, data that can be presented visually, graphs and charts, waveform shapes, parts of computer desktops, visual representations provided by external systems, and combinations thereof.
11. A method as claimed in Claim 10 wherein said visual representations produced by computer vision processing include visual representations selected from the group consisting of: areas of common properties, areas of movement, paths followed by moving entities, identified objects, and combinations thereof.
12. A method as claimed in Claim 10 wherein said graphs and charts are processed into the form of apparently-moving effects and short distinct effects that relate to the lines and points respectively of said graphs and charts.
13. A method as claimed in any claim ito 9, wherein said lineal features and said non-lineal features are Q created using a drawing facility.

Q
14. A method as claimed in any preceding claim which is accessible to blind people.
15. A device enabling a person to perceive visual representations comprising: acquiring means for acquiring at least one lineal feature related to said visual representations, said lineal features having paths; processing means for processing said paths of said lineal features into the form of at least one apparently-moving effect of particular timbre, wherein said apparently-moving effects follow paths that relate to said paths of said lineal features; at least one display means, wherein said display means include display means selected from the group consisting of: at least one audio display means, at least one tactile display means, and combinations thereof outputting means for outputting at least one of said apparently-moving effects of particular timbre to at least one of said display means; whereby said lineal features within said visual representations can be perceived via said apparently-moving effects of particular timbre.
16. A device as claimed in Claim 15 wherein said apparently-moving effects of particular timbre are apparently-moving buzzing effects.
17. A device as claimed in Claim 15 or Claim 16, wherein said apparently-moving effects are stereophonic sounds.
18. A device as claimed in any claim 15 to 17, further comprising: second acquiring means for acquiring at least one non-lineal feature related to said visual representations; second processing means for processing said non-lineal features into the form of at least one separate effect; second outputting means for outputting at least one of said separate effects to at least one of said display means; whereby said lineal features can be perceived via said apparently-moving effects, and said non-lineal features can be perceived via said separate effects.
19. A device as claimed in Claim 18 wherein: said non-lineal features include features selected from the group consisting of: point-like features, Q categorically-perceived properties, continuously-changing properties, and combinations thereof said second processing said non-lineal features into the form of separate effects includes processing Q selected from the group consisting of: processing said point-like features into the form of short distinct effects, processing said categorically-perceived properties into the form of categorically-perceived effects, and processing said continuously-changing properties into the form of continuously-changing effects whereby said separate effects have qualities that relate to the qualities of said non-lineal features from which said separate effects are processed.
20. A device according to claim 19 wherein said categorically-perceived properties include categorically-perceived properties selected from the group consisting of: categories of visual properties, spatial arrangements of said categories of visual properties, classified entities, and combinations thereof.
21. A device as claimed in any claim 15 to 20, further comprising: dividing means for dividing said visual representations, or portions of said visual representations, into at least one matrix of elements, said elements having borders between adjacent matrix elements third acquiring means for acquiring at least one border crossing point within said paths of said lineal features, said border crossing points having locations substantially at the intersections of said paths of said lineal features and said borders between adjacent matrix elements; third processing means for processing said border crossing points into the form of at least one border indicator effect, said border indicator effects optionally conveying the direction of crossing; third outputting means for outputting at least one of said border indicator effects to at least one of said display means; wherein said border indicator effects are output substantially when said apparently-moving effects reach locations that relate to said border crossing points; whereby the shape and slope of said lineal features, and locations within said paths of said lineal features, can be more accurately perceived.
22. A device according to any claim 15 to 21, wherein said lineal features include lineal features selected from the group consisting of: perimeters of entities, medial-lines of entities, lines of symmetry of entities, paths that symbolise the classifications of entities, a plurality of lines each related to sections of entities, and combinations thereof.
23. A device as claimed in any claim 15 to 22, wherein a plurality of said apparently-moving effects are Q output simultaneously.
24. A device as claimed in any claim 15 to 23 wherein said visual representations include visual representations selected from the group consisting of: live images, recorded still or moving images, created still or moving images, filtered still or moving images, still or moving images prepared by a person, maps, abstract shapes, descriptions of shapes and point-like features and other visual properties, visual representations produced by computer vision processing, data that can be presented visually, graphs and charts, waveform shapes, parts of computer desktops, visual representations provided by external systems, and combinations thereof.
25. A device as claimed in Claim 24 wherein said visual representations produced by computer vision processing include visual representations selected from the group consisting of: areas of common properties, areas of movement, paths followed by moving entities, identified objects, and combinations thereof.
26. A device as claimed in Claim 24 wherein said graphs and charts are processed into the form of apparently-moving effects and short distinct effects that relate to the lines and points respectively of said graphs and charts.
27. A device as claimed in any claim 15 to 23, wherein said lineal features and said non-lineal features are created using a drawing facility
28. A device as claimed in any claim 15 to 27 which is accessible to blind people.
29. A device according to any claim 15 to 28 which can be arranged on a substantially horizontal surface, whereby said device can be used on a desktop or similar surface.
30. A device according to any claim 15 to 28 which is portable. r r rAmendments to the claims have been filed as followsCLAIMS1. A method of communicating visual representations comprising: acquiring at least one lineal feature related to said visual representations, said lineal features having paths; processing said paths of said lineal features into the form of at least one apparently-moving effect of substantially humming timbre, wherein said apparently-moving effects follow paths that relate to said paths of said lineal features; providing at least one display nieans, wherein said display means include display means selected from the group consisting of: at least one audio display means, at least one tactile display means, and combinations thereof; outputting at least one of said apparently-moving effects of substantially humming timbre to at least one of said display means; whereby said lineal features within said visual representations can be perceived via said apparently-moving effects of substantially humming timbre.2. A method as claimed in Claim wherein said apparently-moving effects of substantially humming timbre are apparently-moving substantially buzzing effects. r(0 3. A method as claimed in Claim 1 or Claim 2, wherein said apparently-moving effects are stereophonic sounds. a,0 4. A method as claimed in any preceding claim, further comprising: second acquiring at least one non-lineal feature related to said visual representations; second processing said non-lineal features into the form of at least one separate effect; second outputting at least one of said separate effects to at least one of said display means; whereby said lineal features can be perceived via said apparently-moving effects, and said non-lineal features can be perceived via said separate effects.5. A method as claimed in Claim 4 wherein: said non-lineal features include features selected from the group consisting of: point-like features, categorically-perceived properties, continuously-changing properties, and combinations thereof; said second processing said non-lineal features into the form of separate effects includes processing selected from the group consisting of: processing said point-like features into the form of short distinct effects, processing said categorically-perceived properties into the form of categorically-perceived effects, and processing said continuously-changing properties into the form of continuously-changing effects: whereby said separate effects have qualities that relate to the qualities of said non-lineal features from which said separate effects are processed.6. A method according to claim 5 wherein said categorically-perceived properties include categorically-perceived properties selected from the group consisting of: categories of visual properties, spatial arrangements of said categories of visual properties, classified entities, and combinations thereof.7. A method as claimed in any preceding claim, further comprising: dividing said visual representations, or portions of said visual representations, into at least one matrix of elements, said elements having borders between adjacent matrix elements; third acquiring at least one border crossing point within said paths of said lineal features, said border crossing points having locations substantially at the intersections of said paths of said lineal features and said borders between adjacent matrix elements; third processing said border crossing points into the form of at least one border indicator effect, said border indicator effects optionally conveying the direction of crossing; third outputting at least one of said border indicator effects to at least one of said display means; wherein said border indicator effects are output substantially when said apparently-moving effects reach locations that relate to said border crossing points; whereby the shape and slope of said lineal features, and locations within said paths of said lineal features, can be more accurately perceived.8. A method according to any preceding claim wherein said lineal features include lineal features selected (0 from the group consisting of: perimeters of entities, medial-lines of entities, lines of symmetry of entities, paths that symbolise the classifications of entities, a plurality of lines each related to sections of entities, 0) and combinations thereof.9. A method as claimed in any preceding claim, wherein a plurality of said apparently-moving effects are output simultaneously.0. A method as claimed in any preceding claim wherein said visual representations include visual representations selected from the group consisting of: live images, recorded still or moving images, created still or moving images, filtered still or moving images, still or moving images prepared by a person, maps, abstract shapes, descriptions of shapes and point-like features and other visual properties, visual representations produced by computer vision processing, data that can be presented visually, graphs and charts, waveform shapes, parts of computer desktops, visual representations provided by external systems, and combinations thereof.11. A method as claimed in Claim 10 wherein said visual representations produced by computer vision processing include visual representations selected from the group consisting of: areas of common properties, areas of movement, paths followed by moving entities, identified objects, and combinations thereof.12. A method as claimed in Claim 10 wherein said graphs and charts are processed into the form of apparently-moving effects and short distinct effects that relate to the lines and points respectively of said graphs and charts.3. A method as claimed in any claim I to 9, wherein said lineal features and said non-lineal features are created using a drawing facility.4. A method as claimed in any preceding claim which is accessible to blind people.15. Apparatus enabling a person to perceive visual representations comprising: acquiring means for acquiring at least one lineal feature related to said visual representations, said lineal features having paths; processing means for processing said paths of said lineal features into the form of at least one apparently-moving effect of substantially humming timbre, wherein said apparently-moving effects follow paths that relate to said paths of said lineal features; at least one display means, wherein said display means include display means selected from the group consisting of: at least one audio display means, at least one tactile display means, and combinations thereof; outputting means for outputting at least one of said apparently-moving effects of substantially (0 humming timbre to at least one of said display means; whereby said lineal features within said visual representations can be perceived via said apparently-moving effects of substantially humming timbre.16. Apparatus as claimed in Claim 15 wherein said apparently-moving effects of substantially humming timbre are apparently-moving substantially buzzing effects.7. Apparatus as claimed in Claim 15 or Claim 16, wherein said apparently-moving effects are stereophonic sounds.8. Apparatus as claimed in any claim 15 to 17, further comprising: second acquiring means for acquiring at least one non-lineal feature related to said visual representations; second processing means for processing said non-lineal features into the form of at least one separate effect; second outputting means for outputting at least one of said separate effects to at least one of said display means; whereby said lineal features can be perceived via said apparently-moving effects, and said non-lineal features can be perceived via said separate effects.19. Apparatus as claimed in Claim 18 wherein: said non-lineal features include features selected from the group consisting of: point-like features, categorically-perceived properties, continuously-changing properties, and combinations thereof said second processing said non-lineal features into the form of separate effects includes processing selected from the group consisting of: processing said point-like features into the form of short distinct effects, processing said categorically-perceived properties into the form of categorically-perceived effects, and processing said continuously-changing properties into the form of continuously-changing effects; whereby said separate effects have qualities that relate to the qualities of said non-lineal features from which said separate effects are processed.20. Apparatus according to claim 19 wherein said categorically-perceived properties include categorically-perceived properties selected from the group consisting of: categories of visual properties, spatial arrangements of said categories of visual properties, classified entities, and combinations thereof.21. Apparatus as claimed in any claim 15 to 20, further comprising: dividing means for dividing said visual representations, or portions of said visual representations, into at least one matrix of elements, said elements having borders between adjacent matrix elements; third acquiring means for acquiring at least one border crossing point within said paths of said lineal c.0 features, said border crossing points having locations substantially at the intersections of said paths of said lineal features and said borders between adjacent matrix elements; 0) third processing means for processing said border crossing points into the form of at least one border indicator effect, said border indicator effects optionally conveying the direction of crossing; third outputting means for outputting at least one of said border indicator effects to at least one of said display means; wherein said border indicator effects are output substantially when said apparently-moving effects reach locations that relate to said border crossing points whereby the shape and slope of said lineal features, and locations within said paths of said lineal features, can be more accurately perceived.22. Apparatus according to any claim 15 to 21, wherein said lineal features include lineal features selected from the group consisting of: perimeters of entities, medial-lines of entities, lines of symmetry of entities, paths that symbolise the classifications of entities, a plurality of lines each related to sections of entities, and combinations thereof.23. Apparatus as claimed in any claim 15 to 22, wherein a plurality of said apparently-moving effects are output simultaneously.24. Apparatus as claimed in any claim 15 to 23 wherein said visual representations include visual representations selected from the group consisting of: live images, recorded still or moving images, created still or moving images, filtered still or moving images, still or moving images prepared by a person, maps, abstract shapes, descriptions of shapes and point-like features and other visual properties, visual representations produced by computer vision processing, data that can be presented visually, graphs and charts, waveform shapes, parts of computer desktops, visual representations provided by external systems, and combinations thereof.25. Apparatus as claimed in Claim 24 wherein said visual representations produced by computer vision processing include visual representations selected from the group consisting of: areas of common properties, areas of movement, paths followed by moving entities, identified objects, and combinations thereof.26. Apparatus as claimed in Claim 24 wherein said graphs and charts are processed into the form of apparently-moving effects and short distinct effects that relate to the lines and points respectively of said graphs and charts.27. Apparatus as claimed in any claim 15 to 23, wherein said lineal features and said non-lineal features are created using a drawing facility. (028. Apparatus as claimed in any claim 15 to 27 which is accessible to blind people. a)29. Apparatus according to any claim 15 to 28 which can be arranged on a substantially horizontal surface, whereby said apparatus can be used on a desktop or similar surface.30. Apparatus according to any claim 15 to 28 which is portable.