WebbIE, a web browser for visually impaired people

A. King, G. Evans and P. Blenkhorn. Poster presented at the 2nd Cambridge Workshop on UNIVERSAL ACCESS and ASSISTIVE TECHNOLOGY (CWUAAT). 22-24 March 2004. Cambridge, UK.

1 Introduction

Webpages are documents written in Hypertext Mark-up Language (HTML) defined by the international web standards body, the World-Wide-Web Consortium (W3C) ( HTML 2003 ) . HTML combines text information with meaningful semantic mark-up of the text, for example This is a header or this is a list of information. An HTML client has responsibility for rendering the content to the user in accordance with the mark-up. For example, text indicated as header text in HTML might be rendered in a large, bold font in a visual client, spoken at a louder volume in an audio client or explicitly labelled with Headline in a text-only client. Having written a document in HTML, the author can provide formatting information for a particular medium using the related Cascading Style Sheets (CSS) (CSS 2003 ) . These contain information on how the author wishes the information to be presented that may be used by the client, so a visual Stylesheet might instruct the client to use a particular font or colour combination, and an audio Stylesheet to use a particular voice or volume. This approach separates content (HTML) from presentation (CSS). The the end user should thus be able to view the document as intended by the author (using their CSS) or to access the content in a manner more desirable or practical for the user (using the HTML).

In practice, however, the web is primarily a visual medium, and the principle of separating content and presentation has not been honoured. Presentation information has been included in HTML mark-up. This has repercussions for access by users of non-visual clients. For example, HTML tables, designed to contain information best presented in a tabular format (for example, arrays of number), are often used as layout containers to position content on the screen for the visual user. This can break up the normal flow of a document when viewed in a linear manner, as with a screen reader, and frustrate resizing and reformating of content for the user. For example, enlarging text trapped in a fixed size area makes some of the text impossible to read. Dedicated mechanisms for navigating real tables are frustrated.

The example above illustrates a more general problem of the reliance on visually-meaningful formatting rather than correct HTML markup to communicate semantic content. For example, instead of using the HTML headline markup to indicate the page's main headline, the author makes the headline text bold, centred, and a larger font size. This is a perfectly recognisable convention for sighted users, but is not useful for non-sighted users, who are forced to choose between losing this vital piece of semantic information or trying to identify titles by guesswork. The corollary of this problem is using semantic markup for purely visual effect, for example using HTML headline markup simply to create the visual effects normally associated with headlines for text that is not headline text. This lack of semantic information can cause severe problems for blind webpage users. Sighted users can, on first seeing a webpage, quickly identify the salient features - headline, navigation bars, main text content, advertising - and therefore the meaningful content of a webpage and how to access it. Blind users can be forced to move laboriously through the text of a webpage, perhaps starting with a navigation bar with fifteen hypertext links, then some advertising copy, until they encounter the content of the page which may or may not be useful to them. This is a slow and frustrating process.

A second obstacle to the use of webpages by blind people is the embedding of  non-text content into HTML documents. The most obvious example of this is the use of images, used not only to present pictures and diagram graphics, but as a way to provide absolute layout of other components; headings and text with the size and font desired by the author; and decorative features such as borders and bullet points. Bitmap graphics contain no useful information for visually-impaired people: at best, the meaning of the image can be inferred from the filename (e.g. border.gif or cat.jpg). HTML does provide mechanisms for annotating this embedded content: the specification states that an author can set the ALT attribute of an embedded content element to describe the image as text or indicate that the content is of no value to a non-sighted user (e.g. an image used as a background spacing element).  However, use of this attribute is not mandated by the HTML specification, and even if ALT text is provided, there is no guarantee that its information content is equivalent to the embedded content. There are other types of embedded content, for example, Java applets or Macromedia Flash animations, that are not rendered natively by the client browser but rely upon the action of a supporting application provided by the client format developer termed a plug-in. The accessibility of such content depends upon the supporting application and the nature of the content. For example, Java applets prior to Java version 1.2 displayed content using native operating system controls, such as buttons or text fields, that are normally accessible to a screen reader. As a result these applets are often usable. Java applets from Java 1.2 onward use lightweight controls implemented purely within Java (Swing components) and these are supported only by screen-readers such as JAWS which have been written to take advantage of the new Java Accessibility API (Sun 2003 ) . In either case, if the Java applet is used to display non-accessible content, such as animated images, the accessibility of the content interface is irrelevant.

A third problem is the widespread adoption of what is termed Dynamic HTML (DHTML), the inclusion of event-driven code, typically JavaScript, into HTML documents to provide functionality to HTML's hypertext document system (Netscape 2003 ) . JavaScript code is interpreted by a compliant client when the hosting HTML page is loaded, and allows the web author to create websites with features familiar to users of Graphical User Interfaces (GUIs), such as drop-down menus, mouse events, and the dynamic display of content within an HTML document. This poses two problems: first, it hugely increases the complexity of webpages, which now require the user to comprehend a non-standard user environment to access the content they desire; second, content may be created by the web author that is inaccessible to users who cannot use a mouse-driven visually-orientated interface. For example, navigation of a website may be accomplished by drop-down menus implemented in JavaScript that cannot be activated without positioning the mouse pointer over the menu, or JavaScript may be used to test that the user's client conforms to a standard set by the web author and deny access to the website to users of non-standard accessible browsers.

A final accessibility problem originates with web browsing clients themselves. Designed to display content to sighted users, they typically paint content onto a canvas intended purely for viewing and lacking features like a caret, the ability to focus on text content, or access to ALT text content. Screen reader users of Microsoft Internet Explorer, for example, have no simple way to access the text content of any page: the application only allows focus to lie upon form elements and links, so the user cannot focus the screen reader upon the text content and have it read out to them. Inaccessible clients can be overcome by screen readers designed to address this problem directly, and the predominance of Internet Explorer has encouraged this, but the result can be a very complex user interface.

Most of the problems detailed so far can be ameliorated to a great degree by the efforts of web authors to produce web pages that are accessible to non-sighted users. The means of doing this are codified in a number of standards, notably the Web Content Accessibility Guidelines from the W3C (WCAG 2003 ) which provide a checklist of recommendations for use by web authors such as Don't rely on color alone. While disability legislation and lobbying by pressure groups and individuals has made accessibility a key factor in web design, it does not necessarily follow that compliance with these standards results in an accessible website. If accessibility is considered to be a matter of ticking the appropriate boxes rather than addressing the likely needs of visually-impaired website users, then accessible webpages are unlikely to result.

2 Existing solutions for blind people

Solutions to the problem of web accessibility fall into one of four categories: reliance on a conventional web browser and a screen reader; utilising the accessibility features of HTML and existing web clients; using transcoding proxy servers to convert webpage HTML into a more accessible format; and using a dedicated web browser.

2.1 Conventional web browser and screen reader

The web browser market is dominated by Microsoft's Internet Explorer (MSIE), which holds a 95% share (CNET 2002 ) . It is therefore the defacto standard for web clients, and web authors frequently write HTML and DTHML code targeted at MSIE. Using MSIE and a screen reader or magnifier guarantees that a maximum of websites will work for the user, in the sense that the functionality intended by the author will be available to the sighted user, and that the user interface will be common to sighted people - such as those providing technical support - with the obvious exception of the use of the assistive technology. The problems with this approach are the inaccessibility of content displayed by the browser and the complexity of the user interface already described. However, progress has been made by screen reader developers, notably Freedom Scientific's JAWS (JAWS 2003 ) , in supporting MSIE and by extension the vast majority of web users. The resulting control interface can be, as noted, very complex.

2.2 Utilising HTML accessibility

The second approach takes advantage of the principle of HTML, separating content and presentation, and the native abilities of clients to present content in a way desirable to the user. Web clients permit the user to define their own presentation preferences, for example using a particular mix of colours (yellow on black is preferred by many visually-impaired people), fonts (Tiresias (Tiresias 2003 ) is designed to be very legible) and font sizes. Clients can also choose to ignore presentation dictates from web pages, stripping out decorative and confusing background images or preventing text from blinking (harmful to users with epilepsy (WCAG 2003 ) ).  These are all helpful approaches for visually-impaired people. The Mozilla browser allows the user to turn on a caret, overcoming the normal web browser canvas problem by providing a means to indicate to a screen reader the current content of interest. The problems with these approaches are that they fail to address a range of problems related to overly-complex interfaces (tables and page layout are generally still preserved, so the user must still search over the page for content of interest) and the needs of users without any degree of functional vision. The other practical problem is that users are required to specify their user preferences within the client, which is not common user behaviour and may not be possible in the user's environment, for example where the user is on a different computer or where user preferences are locked by their network policy.

2.3 Using a transcoding proxy server

The third approach places the solution between the author and the client by running requested HTML pages through a transcoding proxy server. Requests for webpages from servers are made not to the servers themselves but to a intermediate server, a proxy, which fetches the page, converts it according to a set of rules, and returns the converted page to the requesting client. This process is employed for users of limited browsing devices, such a mobile telephones, which cannot handle fully-featured webpages and relay on proxy gateways to reduce the standard webpages into a limited format supported by the telephone (Kennel 1996, Brown 2001 ) .  Visually-impaired users can use the same approach: the proxy can be configured to alter the HTML document to provide the font, font size, colour and other settings desired by the user in much the same way as the use of the accessibility features of a client. The advantage is that these can be set remotely, so the client itself need not be amended by the user. The disadvantages relate to the second-hand nature of the HTML document transmission. Page features, such as client redirects, may not be supported by the proxy, and many websites assume the use of a client directly and provide functionality based on this assumption (for example, the use of cookies to track users and provide password-authenticated services to them). Finally, the processing performed by the proxy server requires the server to have full access to the content of the HTML document, which means that secure transmission protocols used in Internet commerce such as HTTPS are unusable.

2.4 Using a dedicated web browser

The final approach is to use a dedicated web browser designed for visually-impaired or blind people. There are two tactics employed: the first, exemplified by the Home Page Reader from IBM, is a self-voicing application that provides a complete audio interface to web pages. The second is to render the content of a webpage as a text-only flat document and permit the user to access this accessible content using their normal assistive technology, typically a screen reader. This tactic is demonstrated by Webwizard from Baum and WebFormator from Frank Audiodata. Developing a dedicated web browser affords the maximum flexibility in approach, but requires the developer to take more responsibility for the presentation of web content.  Although in theory a non-visual web browser is just as standard as a visual one presenting marked-up HTML, in practice the visual bias of the web means that alternative applications have to focus on providing access to resources designed for the sighted. The greater flexibility in approach has lead to a number of different products which are worthy of examination.

IBM's Home Page Reader (HPR) (IBM 2003 ) is a standalone product that breaks down a web page into a linear array of items which can be moved through by the user and are voiced as they are encountered.  The user can select the granularity of the array, from letters upwards. Links are presented in a different voice (female rather than male) to distinguish them: the ability to present information like this is an advantage of developing a self-voicing application. The default setting presents the page as an array of structural mark-up elements: list items, headers, and paragraphs. This is a good level of resolution for well-constructed web pages, since it allows the user to immediately access the document via a reasonable number of segments which reflect the semantic meaning known to the document author. Less well-designed web pages where mark-up is used for visual presentation are presented less successfully, since there is less scope for inferring the semantic meaning of particular items of content from the mark-up.

BrookesTalk (Zajicek 1998 ) is another self-voicing web browser that employs a similar approach to HPR. In addition, it attempts to address the problem of communicating to the end user the semantic content of a page by providing summaries and keywords obtained by analysing the structure of the web page. Zajicek reports that blind users did not find the summary information of use because it was regarded as inaccurate: certainly, interpreting a page for its important semantic meaning is a very difficult computing problem.

Asakawa et al's talking web browser (Asakawa 2002 ) focused on the problem of communicating semantic information about content to the user. It utilised a number of different auditory and tactile interfaces to communicate structural information derived from analysis of the HTML of a web page. Assuming that visual users used grouping of similar elements as a vital part of understanding web page structure (e.g. those link buttons make up a navigation bar) Asakawa's system attempted to group HTML elements by colour, area and border, identify items of emphasis, and communicate the resulting groups using background music and tactile output.  Individual components of the page, such as text or buttons, were communicated with auditory icons and earcons. Emphasis was communicated through bell-like sounds. Results indicated that the indication of emphasis was well received: it may be that this is because it successfully communicated important semantic information about the page.

WebFormator from Webwizard ( Webwizard 2003 ) and Frank Audiodata (WebFormator 2003 ) from Baum use the second tactic, running simultaneously with MSIE and re-presenting the contents in a text field that can be accessed by a screen reader. This text can be navigated with a caret as a normal text field, and like the other two applications users can bring up lists of links, frames and other features that can be of use in understanding the content of the web page. WebFormator/Webwizard also provide different navigation modes for exploring HTML tables, navigating from cell to cell within the table: while tables are typically used for layout, rather than structuring data, if a real data table is encountered this may be of use.

WebbIE, developed at UMIST, uses the same tactic as WebFormator and Webwizard, presenting the web page content as accessible text rather than self-voicing an entirely novel interface. It goes a step further in creating a freestanding independent application providing web access, and is described fully in the next section.

3 WebbIE

WebbIE was developed to fulfil our design philosophy of allowing users to access standard applications, in this case Windows Internet Explorer, through an interface that simplifies and represents the content without losing information or being too complicated for non-expert users. It is not self-voicing, but rather provides support for partially-sighted people and allows screen reader users to continue to use their familiar environment.

Internally WebbIE uses the MSIE control object (WebBrowser), and this handles the  acquisition of webpages and parsing the HTML into the W3C standard Document Object Model (DOM) (DOM 2003 ) (Figure 1). Using MSIE guarantees maximum compatibility with websites, although another control that handles fetching webpages and parsing them into the DOM could be used with a minimum of alteration (the Mozilla control has been tested and works successfully). The DOM provides a rich API for manipulating and querying the webpage

Diagram. The WebBrowser control object access the World-Wide-Web for WebbIE. It returns an HTML document converted into the HTML DOM. This interacts with the WebbIE user interface. When the user wants to go somewhere else, a navigation request is fed back to the WebBrowser control object.
Figure 1: The WebbIE architecture

WebbIE navigates the DOM, collecting active content components such as hypertext links and form components, and building up a plain-text representation of the content. This plain text is presented to the user. Components are presented on new lines with distinguishing titles, like LINK for a hypertext link. Functionality is accessed through pressing the return key on a line with a presented component. Figure 2 shows WebbIE in action.  WebbIE supports existing MSIE bookmarks, frames, the great majority of HTML 4, forms, tables, and display of embedded multimedia.

WebbIE looks much like Internet Explorer, with a row of navigation buttons above a panel showing the web page. However, the web page is pure linear text, like a Word document, not laid out all over the page.
Figure 2: WebbIE in action

As a dedicated web browser, WebbIE attempts to address the accessibility issues associated with web pages already described:

Complex web pages - WebbIE presents the whole web page in a linear text form, so it can be explored as a standard familiar text document, which is much simpler than puzzling out the potentially complex interface of a web page. The disadvantage of this is that any information inherent in the spatial layout of the web page is unavailable to the user, such as separation of content into body and navigation parts.

Summarising web page content - WebbIE highlights marked-up headlines and enables the user to access them directly. It allows the user to skip links to non-link text (this works especially well when skipping the navigation bars commonly found at the top of pages). It also makes an attempt to identify the section of the page containing the main content text, and the section of the page containing navigation links. It can either work directly on the processed content, checking for successive lines with text or links, or use a more sophisticated approach by scoring the component parts of a webpage - frames, table cells, and HTML division elements - for text content and link content and identifying the two winning sections to the user.

Images -WebbIE presents the ALT text or ignores the image if this is not available, unless it is also a hypertext link, in which case it gives the destination as the most meaningful possible information.  This is sometimes very useful (home.htm) and sometimes not (cgi-bin/serve.pl?p3).

JavaScript -WebbIE allows access to the most common JavaScript triggers through the DOM and MSIE control object. If a page relies heavily on JavaScript and mouse-related events then WebbIE has problems supporting the functionality. The user can switch to a view of the Internet Explorer browser displaying the page, but this may not itself be accessible.  JavaScript is a problem that can be insurmountable for an accessible client.

Flash/Java/multimedia embedded content - WebbIE can present this content separately in a pop-window that can be accessed by the user's screen reader, so if the content is accessible the user should be able to access it.

Forms - WebbIE allows forms to be handled in the page using simple text components. For example, input boxes are presented as INPUT BOX: (content) on a line. If the user presses the return key, WebbIE pops up an input box to receive the user's input text, and then updates the page with the text input for review. The same simple approach is taken with select buttons and other form elements (see Figure 2)

Frames - WebbIE runs frames together to present them as a single, linear text page, so the user does not have to navigate different panes of content. It does allow users to navigate within the page as though the different areas were still operating as in the frames, so frame navigation is supported, but it is assumed to be simpler for users with the same consistent interface for frame and non-frame pages.

4 Evaluation

WebbIE was evaluated with nine users by means of a questionnaire. The users ranged in experience and levels of visual impairment, and the sample size was small, so the data acquired is anecdotal but has the benefit of being from actual users. The users were all associated with a company that distributes WebbIE and performs training, so the results reflect some common background of training and preference.

The users were all screen reader users. The six users that had used the web before used MSIE in conjunction with their screen reader, although their level of success varied: after using of WebbIE three intended to use it.

The users cited a variety of favourite sites and most users browsed for new pages of interest. This suggests that able VIPs do successfully overcome browsing problems to an extent that allows them to gain advantage from the exploration of unknown sites, although all expressed some confusion over or ignorance of non-HTML embedded content, confirming that HTML is the most accessible format for web content.

All the users that expressed a preference preferred Google as a search engine suggesting that a tailored Google interface within WebbIE might be a good next development: WebbIE already allows users to query Google from WebbIE directly, but doesn't perform any special processing on the result, for example to prioritise the search results over the page navigation content. Other popular sites included the BBC Radio sites to obtain radio program recordings and banking and grocery shopping sites. These commercial sites permit visually-impaired people access to services that usually require either customised information (e.g. bank statements in Braille) or intervention by a sighted person (e.g. to shop in a supermarket). Using a web site puts blind people on a more equal footing and allows providers to make their services more accessible at relatively little expense.

Aside from specific issues with the WebbIE interface general complaints were made about the many links often encountered at the top of a web page before the content of interest. These links are typically navigation bars, very useful for sighted people but get a distraction for visually-impaired people. As a consequnve the WebbIE function that skips links and moves the cursor to content (mentioned above) proved popular.

The main benefits of WebbIE were perceived to be the ability to cut and paste text from the simple text interface, allowing users to prepare content in other formats, and the handling of forms through a simple text interface. Users did not report any general problems with accessibility to websites, but as one user reported if an inaccessible site is encountered there is lots of choice so I leave them alone, so this may reflect why the sites that were singled out as being inaccessible were service providers where the user has a strong reason to wish to gain access to that service and not another generic one, for example financial or supermarket sites.

5 Status

WebbIE is available for download from www.screenreader.co.uk. It is freely available for use and distribution. For more information, contact webbie@co.umist.ac.uk.

6 References

Chieko Asakawa, Hironobu Takagi, Shuichi Ino and Tohru Ifukube (2002)

"Auditory and Tactile Interfaces for Representing the Visual Effects on the Web", ACM ASSETS 2002, Edinburgh, Scotland, UK, 8-10 July 2002.

Silas S Brown and Peter Robinson (2001)

"A World Wide Web Mediator for Users with Low Visions", presented at CHI2001 Workshop 14, Seattle, Washington, US, 31 March - 5 April 2001

CSS (2003) Cascading Style Sheets, http://www.w3.org/Style/CSS/, accessed August 2003.

CNET (2003) Internet Explorer 95.3, Mozilla 0.4, http://news.com.com/2100-1023-938784.html, accessed August 2003.

DOM (2003) W3C Document Object Model, http://www.w3.org/DOM/, accessed August 2003.

IBM (2003) IBM Accessibility Center: IBM Home Page Reader 3.0, http://www-3.ibm.com/able/solution_offerings/hpr.html, accessed August 2003.

JAWS (2003) JAWS for Windows, http://www.freedomscientific.com/fs_products/software_jaws.asp, accessed August 2003.

Kennel, A., Perrochon, L. and Darvishi, A (1996) Wab: World-wide-web access for blind and visually-impaired computer users, ACM SIGCAPH Bulletin.

Netscape (2003) JavaScript Central, http://devedge.netscape.com/central/javascript/, accessed August 2003.

Sun (2003) Accessibility,  http://java.sun.com/j2se/1.3/docs/guide/access/

Tiresias (2003) Tiresias Fonts Website, http://www.tiresias.org/fonts/index.htm, accessed September 2003.

WebFormator (2003) Official WebFormator Site, http://www.webformator.com/, accessed August 2003.

Webwizard (2003) http://www.baum.de/webwizard.htm, accessed September 2003.

HTML (2003) World-Wide-Web Consortium HTML Standard, http://www.w3c.org/html/, accessed August 2003.

WCAG (2003) Web Content Accessiblity Guidelines 1.0, http://www.w3.org/TR/WCAG10/, accessed August 2003.

Zajicek, Mary, Powell, Chris and Reeves, Chris (1998) A Web Navigation Tool for the Blind, ASSETS 1998, New York.

Alasdair King, 10 August 2004. Last updated 13 August 2004.