What is significance of multiple results to some LexiStats requests?

When using the API to make a LexiStats request for frequency data, I get an r.text object. Typically under results there is one frequency result, however for certain words (e.g.; 'the', 'today', 'New Zealand'), there are multiple results in exactly the same format. Typically the first frequency is very low (number ranging from 2 to 15), obviously incorrect, then the second result appears correct, in the thousands or millions or whatever. Is this an error in the data, or is there some significance to the multiple results? Thanks


  • SimoneSimone Administrator admin

    Hi @efleming582
    Let me check with my colleagues, and I'll get back to you - bear with me, please!

  • SimoneSimone Administrator admin

    Hi @efleming582

    One of my colleagues has just got back to me about your question.
    Here is what he sent me:

    The API returns the frequency of the different results group by lexical category and grammatical Features.

    • ‘frequency’ gives the number of appearance in the corpus
    • ‘normalisedFrequency’ gives how frequent that specific word is on average in 1 million words.

    For instance:

    Lexical Category: adverb
    { "wordform": "today", "lemma": "today", "normalizedLemma": "today", "trueCase": "today", "frequency": 1664, 
    "normalizedFrequency": 0.182546247200115, "lexicalCategory": "adverb", "grammaticalFeatures": {}, "firstMention": "2012-05-01T00:00:00", "components": "N/A", "type": "word" }, 
    Lexical Category: Noun + Plural
    { "wordform": "today", "lemma": "today", "normalizedLemma": "today", "trueCase": "today", "frequency": 1, "normalizedFrequency": 0.000109703273558, "lexicalCategory": "noun", "grammaticalFeatures": { "numberType": "plural" }, "firstMention": "2013-02-01T00:00:00", "components": "N/A", "type": "word" },
    Lexical Category: Noun + Singular
    { "wordform": "today", "lemma": "today", "normalizedLemma": "today", "trueCase": "today", "frequency": 554237, "normalizedFrequency": 60.80161322683295, "lexicalCategory": "noun", "grammaticalFeatures": { "numberType": "singular" }, "firstMention": "2008-09-01T00:00:00", "components": "N/A", "type": "word" }, 
    Lexical Category: Adverb + Temporal
    { "wordform": "today", "lemma": "today", "normalizedLemma": "today", "trueCase": "today", "frequency": 2316151, "normalizedFrequency": 254.08934675408236, "lexicalCategory": "adverb", "grammaticalFeatures": { "thematicRoleType": "temporal" }, "firstMention": "1960-12-01T00:00:00", "components": "N/A", "type": "word" }

    I hope that helps!

  • SimoneSimone Administrator admin

    Hi @efleming582
    Just a quick note to say we haven't forgotten about this question - my colleagues are still looking into it, so bear with us, please!

Sign In or Register to comment.