Читать книгу Design and the Digital Divide - Alan F. Newell - Страница 12
ОглавлениеCHAPTER 2
Communication Systems for Non-Speaking and Hearing-Impaired People
The development of a voice-operated typewriter for non-speaking, physically disabled people described in Chapter 1 led to the development of the Talking Brooch. This was one of the first truly portable communication aid for non-speaking people. Demonstrating this to a chance visitor to the Department introduced the challenge of providing a communication aid for a profoundly deaf Member of Parliament and led to the development of a system based on the automatic transcription of machine shorthand.
2.1 A VOICE-OPERATED TYPEWRITER FOR PHYSICALLY DISABLED PEOPLE
In my readings related to speech recognition research, I had come across a paper that was trying to automatically recognise hand-sent Morse Code. This had not been particularly successful as the timing of hand-sent Morse Code is not accurate. Fast operators can send Morse that can be understood by a human being, but differences in the lengths of the dots and dashes and the spaces within and between characters defeated automatic recognition methods.
It seemed to me, however, that, if speed was not the overarching objective, an operator could be trained to send Morse Code which could be automatically decoded. Also, the system would provide excellent feedback from errors as, if an “i” (dot-dot), was recognised as an “m” (dash-dash), the operator would know that s/he had to reduce the length of the dots. Thus, spoken Morse code was a possible way in which people who were paralysed from the neck down could type. I simulated VOTEM (Voice Operated Typewriter Employing Morse-code) on the PDP 8 to prove that this was possible and subsequently designed and built an electronic version [Newell and Nabavi, 1969, Newell, A., 1970].
Clearly, disabled people would have preferred to talk to a typewriter—as some 30-odd years later they would be able to do—but it was not possible at the time. VOTEM was an example of reducing the requirements to match what was possible. Clearly, spoken Morse code was not a viable input method for someone who could use a typewriter keyboard, but was a candidate for someone who could not. This situation is still the case: speech recognition is only really viable in situations where it is not possible or inconvenient to use a keyboard.
VOTEM—the Voice Operated Typewriter Employing Morse Code sparked my interest in developing technology to assist people with disabilities. It also introduced me to technologically assisted human-human communication. The knowledge and background I had gained in my research into Automatic Speech Recognition showed me that speech communication is very much more than the words which are spoken. Human communication is the very basis of our humanity and is a very complex and subtle process. Our communication with other human beings is not just a set of messages that we relay to other people: it is, in a very real way, our personality. If we are to develop artificial means to replace speech, we must be as concerned about the form of the communication as the efficiency of it as a message carrier.
I thus embarked on background reading in the area of what was to become known as Augmentative and Alternative Communication (AAC)—technology to support people with impaired speech and language. I also became familiar with a range of (relatively unrelated) research topics that would prove to be very useful in my future research.
At this time, Possum Controls was one of the leading developers of systems for severely paralyzed people, these were based essentially on scanning a matrix by sucking and blowing down a tube. Figure 2.1 shows an early version of such a system. This technology had been developed by Reg Mailing, who was a “visitor” at Stoke Manderville Hospital. He observed patients using a whistle to communicate with people. At that time even simple electronics was too expensive for this application, but he realized that the Strowger equipment (a two-dimensional mechanical selector mechanism used in telephone exchanges at time) was inexpensive, and could be modified to provide a scanning matrix which could control domestic equipment or an electric typewriter via a pneumatic tube. He formed the POSSUM Company [Mailing and Clarkson, 1963, Mailing, R., 1968], which in the early 21st Century is still marketing communication aids for disabled people.
By the early 1970s, a number of similar systems had begun to appear [Copeland, K., 1974, Foulds et al., 1975, Ridgeway and Mears, 1985, Vanderheiden, G., 2002]. Some examples of these are shown in Figure 2.2. There were no portable systems, and they all required the disabled person and the conversational partner to look at a remote display or printout. This meant that eye contact, and the ability to notice facial expression, which I believe is very important in face-to-face communication, was not possible. In addition, I thought that an AAC system should be instantly available—so that users did not feel that they had to wait for something important to say, before switching their system on, and that, like speech, it should provide a transitory communication—not a “permanent” written one. An example of the dangers of a printed output was related to me by Arlene Kraat. A non-speaking patient had printed out the message “you did not brush my hair properly” to a nurse, but, instead of this being interpreted as a relatively unimportant comment, it was taken as a formal complaint. This example highlights the difference between the impact of spoken and written messages, which also became a concern in our research into television sub-titling.
Figure 2.1: An early Possum system.
2.2 THE TALKING BROOCH—A COMMUNICATION AID FOR NON-SPEAKING PEOPLE
A challenge for AAC systems was to create a portable device that was mounted near to the face. Conventional displays were not appropriate as visual displays at that time were heavy, large, and expensive. My “eureka” moment occurred whilst I was travelling through King’s Cross station where there was a rolling newscaster display, and I remembered Taenzer’s [1970] work aimed at improving the “Opticon”. This was a reading aid for the blind, in which the operator scanned a printed page via an array of small vibrators on their finger.
Taenzer had shown that a rolling display of only one character width was readable. In a preliminary experiment we showed that the reading speed increased as the number of characters displayed increased [Newell et al., 1975]. We thus conducted formal reading experiments using a simulated display between 1 and 12 characters long. We also compared rolling and walking displays (where the location of the character matrixes were fixed and the letters jumped from one matrix to the next). Users performed significantly better with the rolling display and, with a 5-character display, 99% of sentences could be read at a (fast typing) rate of 60 wpm [Newell and Brumfitt, 1979b].
Figure 2.2: Early AAC Devices. (a) Portaprinter - commercially available; (b) TIC - developed by Rick Foulds, Tufts University, Boston; (c) AutoCom - developed by Greg Vanderheiden, University of Wisconsin-Maddison.
Although a rolling display would be more expensive to produce, we decided that this was essential, and, as a compromise between cost and readability, we built a 5-character prototype using individual light-emitting diodes. This can been seen in Figure 2.3(a). A later version shown in Figure 2.3(b) used light emitting diode array. The whole system consisted of the display mounted in a breast pocket, a battery pack and a keyboard. This fulfilled the requirements I had laid down, and an indication of the success of the Talking Brooch idea was given by a child’s parent who said that the very first time he had told a joke was via his Talking Brooch.
Figure 2.3: The Talking Brooch. (a) A prototype Talking Brooch; (b) the commercially available Talking Brooch.
A portable device the ELKOMI 2 marketed by Diode (Amsterdam) had a 9-letter walking display, but, even though the display was longer, our results would indicate that it would be less easy to read at normal typing speeds.
At much the same time Toby Churchill of Toby Churchill Ltd had developed the Lightwriter, which consisted of a much longer single line display integrated with a keyboard [Lowe et al., 1974], and is shown in Figure 2.4(a). In later versions of the Lightwriter, such as that shown in Figure 2.4(b), there is a two-sided display—one side facing the communication partner, and another identical display facing the operator—again with an integrated keyboard. Although the Lightwriter was not as good at promoting eye contact, it did facilitate an appropriate body language for face-to-face communication. In addition, the integrated nature of the system meant that there was only one “box”, and no external wiring. The only other portable device available at that time was the Cannon Communicator, which essentially was similar in style to a pocket calculator, but with an alphanumeric keyboard and a strip printer.
Figure 2.4: The Lightwriter. (a) Earliest version; (b) 2010 version.
In 1976, Vanderheiden [1976] reviewed the literature in this field, addressing the issues of accessing communication aids and the relative merits of direct selection (as employed in the Talking Brooch) and scanning and encoding techniques. He also surveyed the range of AAC devices that were available at that time. There were very few portable devices but, in addition to the ones mentioned above, he described the MCM device marketed by Micon Industries (California) that had been primarily designed as a communication for the deaf. He also cited the Versicom and Autocom, developed by the Trace Centre at the University of Wisconsin-Madison, as examples of wheelchair portable systems.
The Lightwriter and the Talking Brooch had made slightly different design compromises. The Talking Brooch majored on eye contact and immediacy, whereas the Lightwriter allowed the disabled user to see what they were typing, and had no external wiring or sockets, with their associated fragility. The Cannon Communicator had the advantages of a single box, but did not facilitate appropriate body language. The Talking Brooch [Newell, A., 1974a] was marketed by the University of Southampton, and had modest sales. The Lightwriter is still selling well in the early 21st Century. This shows how important it is to really examine the use of any system in real contexts and, where necessary, to compromise on the “purity” of the goal for pragmatic reasons.
A full appreciation of the potential uses of systems in real contexts is essential.
The experience of developing the Talking Brooch led to a range of projects all designed to improve the efficacy of communication aids for people with speech and language dysfunction. It also led to my award of a Winston Churchill Travel Fellowship to investigate communication aids for non-speaking people in the U.S. This formed the basis of much of my future work in this field. I met Arlene Kraat, who subsequently became my mentor from the field of Speech Therapy, and President of the International Society of Augmentative and Alterative Communication. Other very important friends and colleagues from that Fellowship included Greg Vanderheiden, from the University of Wisconsin Madison—who has made a major contribution to technological development for disabled people at a research and development and political levels—and Rick Foulds, who led very exciting research in this area for many years at the Universities of Tufts and Delaware.
2.3 SPEECH TRANSCRIPTION FOR DEAF PEOPLE
I had noted that the Talking Brooch could also be used for deaf people, but the catalyst for my next research projects was a visit to the Department of Lewis Carter Jones, MP. He was a colleague of Jack (now Lord) Ashley [Ashley, J., 1973] who had become deaf and was struggling to continue his parliamentary career. It is impossible to lip read in the Chamber, and he was surviving by relying on a fellow MP, sitting next to him in the House, writing notes for him on what was said. I arranged to meet Jack and his wife Pauline in the House to demonstrate the Talking Brooch. His (accurate) assessment was that it would be no better than written notes—what he required was a verbatim transcript of what was being said. A good typist can type at 60-80 words per minute, but speech can reach over 200 words per minute. In the British Parliament, particularly at Prime Minister’s Questions, it often happened that an innocent aside (which would not be deemed worth writing down for Jack) would be picked up a couple of speeches later—often as a joke. If he did not have a verbatim transcript, Jack was likely to miss the point of these references [Ashley, J., 1992]. This was similar to the reported complaints of deaf students who were offered a real-time version of lectures on a visual system using an operator who listened to the lecture and dictated a synopsis to a typist [Hales, G., 1976].
Fortune favors the prepared mind. (Pasteur 1854) Therefore be a research “butterfly” and read widely.
It was clear to me that automatic speech recognition would not work within this environment: a couple of years previously I had written that “we must put firmly out of our minds any thoughts of, or hopes for, a ‘mechanical typist’. If we do this we will be in a better position to specify the sort of machine that can be built and may be useful in helping the deaf” [Newell, A., 1974b]. (The limitations of speech recognition are discussed in more detail in Newell [Newell, A., 1992c]). This was my opportunity to put these comments into effect. During my ASR research I had come across attempts, some ten years previously, to transcribe the British Palantype machine shorthand [Price, W., 1971], and the American machine shorthand system, Stenograph [Newitt and Odarchenko, 1970]. I thus knew that it was possible to input Palantype data into a computer, but also that current systems required large computers and were not accurate enough to make them a commercial possibility for Court Reporting. Jack Ashley, however, did not need a correct transcription just one which was readable, but he did need a portable system which had to work in real-time. Thus research which had been a commercial failure at that time led my team to develop a system that was appropriate for people with disabilities.
The excellent is an enemy of the good.
Palantype, Stenograph, and the French Grand Jean system work in similar ways (Figure 2.5(a) shows a Palantype Machine). All these systems have chord keyboards, where a number of keys are pressed at the same time, and they work in a syllabic mode, which means that each syllable is encoded in one stroke in a pseudo phonetic form. The left-hand keys being used to encode the initial phoneme, the right-hand keys encoding the final phoneme and the center keys the vowels. Word boundaries are not encoded. The output from these machines is a roll of paper on which the coded speech is printed. An example output from a Palantype Machine is shown in Figure 2.5(b).
Palantype follows relatively strict phonetic rules, but Stenograph uses more complex, less phonetic coding. Grand Jean, being weak on final consonants, is not appropriate for English. These machines provide a record of verbatim speech in the form of printed strips of paper that require significant skills to read. They are translated into orthography by trained operators. Automatic translation would clearly be valuable and was being investigated in the UK and the U.S. A major challenge with transcription of machine shorthand is to determine word boundaries, and this requires considerable (in the 1970s) computing power, and large amounts of storage for dictionaries. Research into automatic transcription of Palantype at the National Physical Laboratories (NPL) in the UK [Price, W., 1971] had not been taken up commercially due mainly to technological constraints, and to a (misguided) belief, prevalent in the UK, that tape recording would be cheaper and more effective [HMSO, 1977]. In the U.S., there was a much greater pool of Steno-typists, and this made Stenograph transcription a more commercially attractive proposition. C.A.T. (Computer Aided Transcription) systems, based on large and expensive (often time-shared) mini or mainframe computer systems which could not operate in real time, were beginning to be available [National Shorthand Reporter, 1974].
Figure 2.5: (a) The original Palantype Shorthand Machine; (b) paper output from a Palantype Machine.
Changing context can turn a failure into a success.
Neither the U.S. or the UK systems were appropriate for the situation I was investigating. In contrast to commercial use of machine shorthand transcription, the requirements for an aid for the deaf were a portable system that produced a readable—not necessarily correct—output in real time. I hypothesized that converting the phonemic codes to a readable form within the keyboarded syllabic structure could produce a readable output. The UK Palantype system uses a purer form of coding for the phonetic representation than Stenograph, Palantypists use very few abbreviations, and the operator training encourages more operator standardization than does the Stenograph system. Thus, a code conversion approach was feasible for Palantype, but any Stenograph transcription system was likely to require complex software and large dictionaries, and thus—in those days—a large computer system [Newell and Downton, 1979c].
2.4 DEVELOPING A FIRST PROTOTYPE WITH NO EXTERNAL FUNDING
The major advantage of a code conversion approach was that it could be done by relatively simple electronic circuits within a portable system. Joe King, an excellent undergraduate student, produced the first prototype [Newell and King, 1977b] with assistance and loan of equipment from NPL. We had invaluable and enormous help throughout all our Palantype transcription projects from Miss Isla Beard from the Palantype Organisation. She acted as an expert consultant and demonstration operator throughout our research, and was also Jack Ashley’s personal Palantypist for many years.
Following demonstrations of King’s system, we obtained a commission to develop a system for the House of Commons and also won research grants to develop the ideas further. A prototype was demonstrated to Jack Ashley and the Chief Whip, and the House agreed to purchase a system [Ashley, J., 1992]. A second system was designed and built by my colleagues, Andrew Downton and John Arnott, and subjected to a six-month trial in the House of Commons. This prototype, shown in Figure 2.6, used a plasma panel display mounted in a specially designed brief case. As can be seen in Figure 2.6(c), the output from this device is a simple code conversion and is syllabic and quasi-phonetic. Nevertheless, Jack Ashley was able to read this style of text after only a few hours training.
One of the technical challenges for a display of verbatim speech is what to do when the text filled the whole screen. The normal approach would be to move all the text up one line and write new data into the bottom line. This sudden jerky change, however, can disorientate the reader. Smooth scrolling was a possibility, but the speed of motion would be variable, and commercially available display systems did not offer such a facility. In addition, if the text moved up vertically, a reader who looked away from the screen could find it difficult to return to where they were reading. Leaving the text on the screen and writing over it from the top also proved confusing in practice. It was thus decided to modify the display by providing a “moving blank” of two lines situated immediately in front of any new data. This, together with a cursor, gave an unambiguous and clear indication of how to read the display at any moment in time.
2.5 NON-TECHNOLOGICAL CHALLENGES TO IMPLEMENTATION
The use of such a system within the Chamber of the House of Commons presented many political challenges. Objections raised included that:
• “Ashley would be at an advantage, therefore all MPs should have one”,
• “He would have to have a seat assigned to him which was against the rules of the House” (Although woe betide any new member who took an established member’s favorite seat. I also found that seats could be booked by inserting a card in them, before “prayers”),
Figure 2.6: (a) & (b) First version of Palantype Transcription System; (c) output screen from Transcription System.
• “There would need modifications to the oak bench in the foreign press gallery” (where the Palantypist was to be situated).
As a non-MP, I had to obtain special permission from the Sergeant at Arms himself to sit on one of the “green benches” to try the system out, even though the House was not sitting at the time. At the Press Conference to launch the trial, it was commented that this was an historic day as “it was the first time in history that a member had had a specific seat assigned to him!”.
We finally overcame all the objections and following a training period of approximately 20 hours, Ashley was able to follow all but the fastest speakers. The trial was a success and the service, with gradually improved systems, was continued for all Jack’s subsequent career as an MP, both in the Chamber, in Committee and at one-to-one meetings. Ashley [1992] claimed that “It was a turning point in my life as an MP”. A later system, shown in Figure 2.7(a), had a microprocessor and 20 kilobytes of storage [Newell and Downton, 1979c]. The output of this machine, shown in Figure 2.7(b) (which includes the effects of operator keying errors), was adequate for deaf people, but the commercial court reporting field had to wait until portable technology could support systems with large dictionaries.
Figure 2.7: (a) The Rt. Hon. Jack (now Lord) Ashley using a computerized Palantype Transcription System; (b) the output screen of the Transcription System.
2.6 TECHNOLOGY TRANSFER
POSSUM Controls were licensed to produce the systems. The transfer of this technology was substantially assisted by Colin Brookes, a research student/assistant at the University, transferring to POSSUM Controls to manage their developments. The commercialization provided many challenges, as machine shorthand was not popular in the UK. Palantype machines had not been produced for many years, and there were no training schemes. Thus, not only did POSSUM have to re-design the Palantype machine itself, but also had to develop and market training courses for Palantypists. In the U.S., machine shorthand is very popular and thus all that was required was to develop electrical output for machines and transcription software. POSSUM systems were used in the UK in a variety of situations including by a deaf business man, many conferences, a telephone translation service for deaf people, and for live TV subtitling. The transcription software was improved and became adequate for commercial requires a high-quality output, because, if the recognition rate is less than 95%, it takes less time to re-type the script than to edit it.
Even with this improved system, POSSUM found it difficult to break into the Court Market. The Lord Chancellor’s Office (who is in charge of Court Reporting in the UK) did not support this development. Officials believed that Tape Recordings and, eventually Automatic Speech Recognition, were the solution [Baker, 1966]. Tape recordings were introduced, but found to be more costly, less reliable, and not to capture important visual information (e.g., the witness pointed at the person in the dock). Their shortcomings were fully documented by Osmond’s [1972] report of the Lord Chancellor’s working party. POSSUM also found it difficult to persuade the Lord Chancellor’s Office that a simultaneous transcript would be valuable (e.g., so that court officials could read it overnight). This was solved after a demonstration of the system to judges, who immediately saw the benefits. In later years speech recognition has been used in the U.S., but requires a trained speaker to re-speak the words uttered in Court.
Following fashion—even in research—may produce incremental advances. Swimming against the tide can lead to major advances.
With funding from the National Research and Development Corporation (which became the British Technology Group), Palantype transcription systems were licensed to, and made commercially available by, POSSUM Controls Ltd. They also organized operator training. They offered transcription as a service to deaf people in a range of situations, including meetings, conferences, in a pilot telephone translation service and by a deaf business man [Hayward, G., 1979]. As technology improved, these systems incorporated very large dictionaries, and similar portable systems were developed in the U.S. for Stenograph machines.
Rapid reading of orthographic text requires good literacy, and thus can be difficult for pre-lingually profoundly deaf people, for whom a sign language translator is more effective, but both require a trained operator. The advantages of an orthographic output are:
• the words remain on the screen for a short time after they have been spoken, and thus the reader can briefly look away from the screen without missing words;
• orthographic output can be used for other purposes (e.g., retaining a record of the speech);
• a stenographer requires fewer and shorter breaks than a shorthand translator; and
• orthography can be used for “closed captioning” of television, whereas technology does not (yet) exist for transmitting sign language other than in the picture.
A French student at Southampton University showed that a Grandjean transcription system for the deaf was feasible [Sayi et al., 1981]. This was not taken forward, but CAT software (IBM-TASF) is now available which is compatible with the Grandjean shorthand machine. For his Ph.D. Colin Brooks [Brooks and Newell, 1985] investigated the potential of handwritten shorthand, but concluded that it was unlikely to be a viable alternative to machine shorthand.
In the U.S., Computer Aided Transcription (CAT) systems were developed for commercial applications, such as the law courts, and were not used for supporting deaf people until the technology could be made portable. In the UK the first commercially available CAT systems were small dictionary systems designed to support deaf people, and these became useful in traditional court reporting situations when large dictionaries could be included in such systems.
The major disadvantage with this research and development in the UK was that hand-written shorthand was much more popular than machine shorthand, and there was a shortage of operators: until POSSUM re-introduced it, there had been no training available for many years. In contrast Stenography is very popular in the U.S.—there are a number of companies producing training courses, machines, and CAT systems. Thus, as the availability of cheap portable computers which could host a full dictionary based CAT system became available, POSSUM’s marketing edge in the field of supporting deaf people was removed and stenographic transcription, supported by a large technical base in the U.S., began to become available in the UK.
Support from potential customers is not a pre-requisite for successful research.
At the time of applying for the grant, we did not have support from potential users (other than Jack Ashley), but the Research Councils at that time did not demand proof of commercial viability, and we were awarded a Science Research Council Grant “Simultaneous translation of machine shorthand for the deaf” (1977/79). With this funding we produced a prototype which worked in realistic environments, and also spent much time and effort in selling the idea to potential user groups, neither of which activities are “academic”, and which tend to be squeezed out when University funding is reduced.
2.7 THE NEED FOR LUCK, FAITH, TIME, AND EFFORT
The Palantype Transcription story is the story of success being based on the results of what could be considered to be failures [Newell, A., 1988a], and had the following challenges:
• Although originally a great success and a very well designed system, Palantype Machine Shorthand never became very popular in the UK, and from the 1960’s had begun to decline.
• The NPL work on CAT for Palantype had not led to a commercial product.
However, although automatic speech recognition systems only became commercially viable some 30 years after the research reported above, my knowledge of this area prompted me to investigate machine shorthand.
Success can be produced from commercial failures.
The project required:
• a great deal of luck. A chance meeting with an MP and an excellent undergraduate student who produced the first prototype;
• a great deal of faith;
• financial support for untried ideas which had little support from potential users;
• time available for “academically” non-productive work including: marketing the idea, liaising with potential users, investigating companies and negotiating licenses;
• technical developments to produce a “pre-production” system that worked in real environments, rather than research leading to a proof of concept laboratory prototype; and
• acceptance of restrictions on publication due to commercial confidentiality.
It is interesting to speculate how difficult it would be to achieve this within the context of Universities in the 21st Century. Would there be time in an academics diary to do these essential, but academically un-productive aspects of a project of this nature? Would it be possible to obtain funding for an idea which had such a narrow focus, and no support (from the Lord Chancellor’s Office) for the wider ramifications of the idea? An “Impact Statement” (as is now required by UK research councils) which reflected this reality would likely be seen as rather weak.
Another unanticipated spin-off from this research was a project supported by an Engineering and Physical Sciences Research Council into Automatic Speech Recognition. We used Palantype Machine Shorthand Transcription in a “wizard of oz” simulation of a “listening typewrite” to examine human factors aspects of a “listening typewriter” [Newell et al., 1991b].
2.8 COMMERCIAL AVAILABILITY
A range of portable communication aids for speech-impaired people are now available, including the Lightwriter, and are best seen at the commercial exhibitions associated with the biennial Conferences of the International Society of Augmentative and Alternative Communication. Machine shorthand transcription for hearing-impaired people is now routinely available in the UK using both Palantype and Stenograph systems.