There does not seem to be much agreement in the earlier literature about how vocabulary knowledge influences the reading process. Intuitively it would seem that vocabulary would be of great importance. There are many researchers who have cited vocabulary as being of prime importance in both L1 and L2 studies (Davis,1971; Kruse, 1979; Chall, 1958; Loban, 1970; Yorio, 1971 and Phillips, 1974, cited by Adams 1982), but there are others who disagree. Duffy and Kabance found that "Simplifying vocabulary and sentences has little, if any, effect on performance even though the readability, according to formula is greatly improved" (1982:738).They found that their data "add[ed] substance to the hypothesis that word and sentence difficulty are correlative but not causative factors in comprehension" (1982:744).
Freebody and Anderson found that "performance was lower when the passages contained difficult vocabulary, and in half of these cases the effect was significant" (1983:291). However, they caution that "it takes a surprisingly high proportion of difficult vocabulary to produce reliable decrements in comprehension measures." (1983:291)
Davison and Kantor argue that "Readability formulas….fail to give any adequate characterization of readability, except in a purely statistical sense from which no particular valid conclusions can be drawn for creating readable texts" (1982:207).
So perhaps there are many factors related to vocabulary difficulty to which traditional readability formulae are not sensitive and which may be very complex to investigate. Readability formula have been criticised for omitting many factors such as syntactic complexity and rhetorical organisation. To these we could also add factors which make a word hard or difficult to process - factors which go some way beyond the length of the word or the number of syllables it contains. Bernhardt (1984) is also sceptical about the presumed relationship between word length and difficulty pointing out that graphemic uniqueness of a word may make it much more accessible than shorter words "such as the, them, they, their, there, this, that, and those [which for L2 readers] are extremely difficult words despite their length." She points out
"eye movement protocols generated by native readers of German and by highly experienced readers of German indicate that many short function words such as articles and prepositions are processed for duration in excess of 1000 milliseconds - in other words, for durations approximately three times longer than average processing time for longer words in connected discourse. These data indicate that the relationship between word length and word difficulty positively correlated by the concept of readability appears to be tenuous." (Bernhardt 1984:323)
So far we have considered only single words but it may be that what readability indices do not pick up is the fact that they do not account for lexical phrases. Lexical "chunks" (Nattinger & DeCarrico, 1992; Moon, 1997) may account for a large proportion of vocabulary. In fact:
"Research on French..... has shown that there are more complex units than simple ones. For instance, there are 6,000 adverbial expressions compared with 2,000 adverbs, 300,000 - 400,000 compound nouns versus 80,000 simple nouns...." (Arnaud & Savignon 1997:160)
L2 readers' lack of awareness that a combination of words may constitute a chunk may affect their reading ability in a variety of ways:
"When chunking is impeded, less information can be stored at one time in short-term memory. Such a reduction in storage capacity means that less linguistic data can be analyzed simultaneously, which results in inefficient use of redundancy and contextual clues. Because of limitations in human attention and memory processing capacity, these additional cognitive demand may account for the observation that good L1 readers are often not able to apply their reading skills to L2 texts." (Nattinger & DeCarrico, 1992:159,160)
It surely seems to be that it is this inability to "get going" and process larger stretches of text which slows readers down. Laufer makes a similar point:
"Since the amount of information that can be cognitively manipulated at one point of time by controlled processing is limited, focussing on slightly or completely unfamiliar words will take up some cognitive capacity that would otherwise be used for higher level processing of the text. Automatic recognition of a large vocabulary, or a large sight vocabulary, or the other hand, will free one's cognitive resources for (1) making sense of the unfamiliar or slightly familiar vocabulary and (2) interpreting the global meaning of the text." (Laufer, 1997a:22,23)
So it seems that beyond a certain percentage of unknown words (or chunks), processing becomes quite laborious and strategies which otherwise might help (inferencing etc) also become useless.
"Drawing on a variety of studies, including her own, Laufer claims that by far the greatest lexical factor in good reading is the number of words in the learner's lexicon. A vocabulary of 3,000 word families or 5,000 lexical items is needed for general reading comprehension, as this would cover 90-95% of any text. Below this threshold, reading strategies become ineffective." (Coady, J. & Huckin 1997:2)
And in fact inferencing may have been overemphasised as a useful reading strategy (Bensoussan & Laufer, 1984; Kelly, 1990).
"Haynes and Baker (1993) too came to the conclusion that the most significant handicap for L2 readers is not a lack of reading strategies but insufficient vocabulary in English." (Laufer, 1997a:21)
So we have to accept that whether we are talking about words or lexical phrases, vocabulary is a fundamental consideration in assessing difficulty (and in fact has long been used for grading EFL readers - see Nuttall (1982) for a table of British EFL readers and their vocabulary levels).
This is the first and most important point.
It should not be too difficult to arrive at some rough estimate of what percentage of words are unknown. One can simply get the readers to scan the text and underline the words they do not know. One can use a cloze test. Or if one had an estimate of the reader's vocabulary size one could simply eliminate the words the readers are expected to know and count up the rest . There are various tests available for estimating vocabulary size (see Read, 1997 for a review). But these are rough and ready methods because a word may not simply be known or unknown.
Difficulty from the point of view of the reader is not just a question of knowing or not knowing a word. (leading to the simplistic notion that a count of unfamiliar words will give an index of difficulty) There is a cline of word knowledge from the idea of having seen it before to knowing and being able to use the word in all its forms and collocations. This is less easily measured (but see Read 1997:317).
The second is to identify which words or chunks are likely to cause difficulty for, or be unknown to, specific readers.
Williams and Dallas (1984) examined vocabulary difficulties in content area textbooks and identified the following problems a) difficult words used in definitions (e.g. too many abstract words, definitions which are too broad rather than narrowly related to the meaning in context, few examples), b) idiomatic expressions (difficult to infer the meaning from constituent vocabulary), c) homonyms (especially problematic where they occur in a high density) d) specialised vocabulary from 'imported text'. Their approach was not to predict vocabulary difficulty but rather to give the texts they were investigating to the readers for whom they were intended and to analyse certain aspects of vocabulary by a multiple choice test. Readability formulae are an attempt at a shortcut but evaluating texts through testing (or other procedures such as think aloud) with their target readership is the only way of ascertaining whether they are suitable and the only way of investigating specific causal variables of text difficulty.