     UK English Wordlist With Frequency Classification

This wordlist is primarily intended to be useful for
checking spelling. Editorial policy is conservative.

Principal omissions:

   -   words requiring a capital letter
   -   abbreviations
   -   slang

Colloquialisms and archaisms are generally excluded. A rare
word similar to a common word may be excluded. Both -ise and
-ize spellings are included.

The character set is: lowercase letters, hyphen, apostrophe.
Words which can be spelt with accents occur here in their
plain form.

If this wordlist is to be used with ispell the following
lines may be appropriate for the affix file:

   boundarychars [---]
   boundarychars '
   wordchars [a-z] [A-Z]

The commonest words are labelled 16 and the least common 0.

Coverage of common words should be good, but note the
categories excluded.

                        Brian Kelk bck22@bckelk.uklinux.net
                        April 2002


Here are bits of a brief conversation I had with the author:

From: Brian Kelk <Brian.Kelk@cl.cam.ac.uk>
Date: Sat, 08 Jul 2000 20:27:21 +0100

> I was wondering what the copyright status of your "UK English Wordlist
> With Frequency Classification" word list as it seems to be lacking any
> copyright notice.  Also, how did you arrive at the "Frequency
> Classification".

There were many many sources in total, but any text marked
"copyright" was avoided. Locally-written documentation was one
source. An earlier version of the list resided in a filespace
called PUBLIC on the University mainframe, because it was
considered public domain.

Briefly about frequency: rather than counting occurrences of
a word this classification is more along the lines of counting
the number of texts in which the word occurs. That way you
get some noise immunity, which you very much need. It's based
on maybe 5-10 million words of text on the Cambridge mainframe
in the 1980s. I had in mind that it might be useful for ranking
possible corrections ...

Date: Tue, 11 Jul 2000 19:31:34 +0100

> So are you saying your word list is also in the public domain?

That is the intention.




