Latest version word count

simonesmithsimonesmith Member
edited December 2017 in Ask the Community: Other

The v.1.10.0 release notes mentioned "includes a big update of new words", I was so curious how big and checked using the wordList function (e.g. err response for 300K offset).

It doesn't show much difference:
GB wordList before : 226077, now 227495
US wordList before : 214112, now 215619

It seems is aprox. a thousand extra words on each dictionary, that's only 0.5%, I wouldn't say big update :smile:

Just dropping by from time to time
Simone

Tagged:

Comments

  • AmosDuveenAmosDuveen Member, Administrator, Moderator admin

    Hi @simonesmith,

    It's a matter of perspective. 1,000-odd is a fairly sizeable update for a dictionary, especially one with such broad coverage already. For comparison, the annual updates we get from our partners for non-English datasets usually include new entries numbering in the low 00s.

  • simonesmithsimonesmith Member

    Hi @AmosDuveen

    Agree, could be considered a lot for a mature and professional dictionary; could not be compared with crowd sourced dictionaries (e.g. Wiktionary , Urban Dictionary) which have a lot of junk content.

    It is mentioned on the website that ODE has over 600K words, but api list returns only 220k, I was wondering if there are words missing in the API or there were actually 600k word senses?

    OED is stated have 280K words, is the 600K a combination ODE+OED, in that case there is a lot of overlap and not unique words.

    Regards
    Simone

  • AmosDuveenAmosDuveen Member, Administrator, Moderator admin

    Hi @simonesmith,

    Can you please point me to the page where that claim is made so I can check it out?

    My gut feeling is that you have probably made the very common mistake of confusing the Oxford English Dictionary (OED, the massive 24-volume historical record of the English language) with the Oxford Dictionary of English (ODE, the single volume dictionary of current UK and World English usage) which is the data behind ODAPI. The figure of 600K sounds about right for the OED, which obviously includes many obsolete and rare words that wouldn't be considered for inclusion in ODE.

    If you are interested, we do actually have a demo API of OED content here, but it is not being made publicly available yet so your ODAPI credentials won't work. You are welcome to explore the demo via the web page, however.

  • simonesmithsimonesmith Member
    edited March 16

    Hi @AmosDuveen

    www.oed.com
    https://en.wikipedia.org/wiki/Oxford_English_Dictionary

    On https://developer.oxforddictionaries.com/our-data is mentioned 600k but for OED
    later is stated 280K entries. If it was 280k words and 600k senses then that would make sense (pun intended) but the numbers are reversed; is a bit confusing. What do "entries" means: words or senses?

    Will the future OED api contain 600k word list or 280k?

  • AmosDuveenAmosDuveen Member, Administrator, Moderator admin

    Hi @simonesmith,

    I'll be honest, I don't recognise the 280K figure; I think it's possibly a mistake and will make enquiries with the relevant people. "Over 600,000 words" is the usual marketing spiel for the OED, as noted in the section immediately preceding the blurb with the incorrect number:

    Oxford English Dictionary
    The unsurpassed guide to the meaning, history, and pronunciation of more than 600,000 words – past and present – from across the English-speaking world.

    It is also possible to look for answers in the differences between the precise meanings of each word on the basis that one lemma (entry) may take multiple forms (words), but I will withhold judgement until I've asked a few more questions.

  • AmosDuveenAmosDuveen Member, Administrator, Moderator admin
    edited March 16

    Hi @simonesmith,

    OK, the 280K number is the number of actual entries in the OED but it isn't relevant to the API because the subentries have been de-nested (i.e. where the in-house data might show 1 entry containing 5 subentries, the API data will show 6 entries). The number of lemmas in the OED is 600K, as I suspected.

    NB: "words" is not a particularly apt description either as many of the lemmas are actually multi-word phrases.

  • AmosDuveenAmosDuveen Member, Administrator, Moderator admin

    Hi @simonesmith,

    FYI, the OED descriptions on the our data page have now been updated .

Sign In or Register to comment.