• Syllables of a word (Word97SR2+)

    Author
    Topic
    #399725

    I wanted some simple code to compute the Gunning-Fog index (readability) of text, and for that I thought I needed to determine the number of syllables in an English word.

    Here’s the complete template.

    The syllable-counting code is not complete. It needs much adjustment and fine-tuning in special-cases.

    Examine the Module MAIN for the two test macros.

    Viewing 2 reply threads
    Author
    Replies
    • #774416

      BTW, have you tried Tools>Options Spelling & Grammer “Show readabilty statistics”?
      Rates text on a 100-point scale; the higher the score, the easier it is to understand the document. For most standard documents, aim for a score of approximately 60 to 70.

      The formula for the Flesch Reading Ease score is:

      206.835

      • #774828

        > Tools>Options Spelling & Grammer (ooops!)

        Thank you, and yes, I have. However the motivation for coding the FOG index was its appearance in the text book I’m using for a Business English course.

        I’d love to hear your feedback on the sylabble-counting portion of the code.

        Most algorithms for readability seem to focus on length of sentence and count of “complex” words. What else might one use?

      • #774829

        > Tools>Options Spelling & Grammer (ooops!)

        Thank you, and yes, I have. However the motivation for coding the FOG index was its appearance in the text book I’m using for a Business English course.

        I’d love to hear your feedback on the sylabble-counting portion of the code.

        Most algorithms for readability seem to focus on length of sentence and count of “complex” words. What else might one use?

    • #774417

      BTW, have you tried Tools>Options Spelling & Grammer “Show readabilty statistics”?
      Rates text on a 100-point scale; the higher the score, the easier it is to understand the document. For most standard documents, aim for a score of approximately 60 to 70.

      The formula for the Flesch Reading Ease score is:

      206.835

    • #972816

      Here is “better” code using a crude Regular Expression; I’m not happy with the definition of a “word”, but it’s a start ….

      Sub FOG()
          If Len(Selection.Text)  lngCount Then
                  lngResult = lngResult + 1
              Else
              End If
          Next wd
          lngCountWordsSyllables = lngResult
      End Function
      Public Function lngSyllables(strText As String) As Long
          Dim re As New RegExp
          With re
              .Global = True
              .Pattern = "[aeiou]*[^aeiou]+"
              Dim matches
              Set matches = .Execute(strText)
              lngSyllables = matches.Count
          End With
      'Sub TESTlngSyllables()
      '    MsgBox lngSyllables("establish")
      'End Sub
      End Function
      • #972895

        Hi Chris,

        I think the line:
        .Pattern = “[aeiou]*[^aeiou]+”
        in the lngSyllables function should be changed to:
        .Pattern = “[^bcdfghjklmnpqrstvwxz]+”

        The reasons are twofold:
        [tab[aeiou]*
        does nothing in this context and
        .Pattern = “[aeiou]*[^aeiou]+”
        with or without “[aeiou]*” doesn’t pick up syllables that form word endings, such as in “baby”, but interprets “Dad” as having two syllables.

        Both approaches fail, though, when the word ends in a silent vowel (eg “babe”) or has “es” to form the plural without sounding the “e”. Still, counting these as adding to the text’s complexity might be reasonable, on the premise that they take extra effort to apply correctly (kinda like saying they’re silent syllables).

        Later:
        Regarding the word count in the sngFogIndex function, I think you should use:
        sngWords = rng.ComputeStatistics(wdStatisticWords)
        instead of
        sngWords = rng.Words.Count
        That’s because .Words.Count returns a count of words, paragraphs & spaces, while .ComputeStatistics(wdStatisticWords) returns a count of words only. You can see the effect of this if you run the code on a simple document and add/delete some empty paras. This may address your concerns over the definition of a Word.

        There is also a problem with the Sentence count, in that Word doesn’t differentiate between an abbreviation followed by a period and a sentence. For example, “Hello, Mr. Chips.” counts as two sentences. I imagine one could code around that with a great deal of effort …

        Cheers

        Cheers,
        Paul Edstein
        [Fmr MS MVP - Word]

        • #972988

          Macropod, thanks for the constructive criticism.

          I agree with all your points. The code is somewhat slipshod in its components, but serves me well for gaining a rough value of the FOG index. In particular, given a chunk of text, such as a one-page memo, I needed a means to observe if the FOG index had improved between draft and final version, so almost any method would suffice – as long as the tool was consistent.

          > .Pattern = “[^bcdfghjklmnpqrstvwxz]+”
          Quite so. My regExp skills are still nascent, and I’d toyed with a more complex string, but settled for what I thought was “a series of vowel strings followed by consonant strings”. Your specification says, i think, “a series of consonant strings”, which is more accurate.

          >ends in a silent vowel (eg “babe”)
          Quite so, and here too I was aware that my string wasn’t perfect, but figured (again) that as long as the defect was applied consistently, I’d be OK. In this case I figured that my syllable count would tend to be elevated, so that the FOG index would be elevated, and I had to bear in mind that if I got a value of 10.7 it was probably, really, a little less. Nonetheless, a comparison of before/after should indicate if any change had taken place.

          > sngWords = rng.ComputeStatistics(wdStatisticWords)
          Again, agreed. I think I once found four different ways of counting words, each providing a different result. In debugging this code I’d noticed “.” as a word, and shrugged again, on my grounds that consistent defects provided a consistent comparison.

          > problem with the Sentence count
          Right, and yes, effort, which I didn’t want to spend for this purpose. In class we might devote ten minutes to FOG index, and providing a handy tool to get a general idea is my goal. (“Here’s a little template on a floppy; if you are worried about viruses, the code is short and you can see it all ….”) without offering a major opus involving complex iteration.

          In particular, I don’t like the loop through the .Words in the range. I feel that some of those loops take an absurd amount of time. I’d be inclined to use a fast method to extract strings and process only those. (Your homework assignment (grin!) is to devise a RegExp that will quickly extract only strings of length > 5 from a string of text; if nothing else this ought to isolate candidates for three syllables (3 vowels, 2 consonants or longer) and reduce the iterations.

          Thanks again for the input. I will incorporate your suggestions and compare the two styles.]

          And now on to your next message ….. (later: I noticed that you combined the two into one post. Oh well …)

          >I noticed that you haven’t counted ‘y’

          Right. Same reasons as above. I went for a quick-and-dirty that gave results.

        • #972992

          Results of a quick test :
          I tested a single one-page memo (letter), using both the entire document, and just the body – the paragraphs between “Dear Sir” and “Yours sincerely”.

          	All		Body
          Me	11.16308	14.0771
          Pattern	12.49641	15.42976
          Words	12.7947		15.4073
          

          A better pattern (“[^bcdfghjklmnpqrstvwxz]+”) gave me a higher FOG index; that suggests that your pattern found more 3+syllable words than did mine. I’d go with that, because I’m not even sure that my pattern detected trailing syllables. (“Babe” is two syllables in most of the pop-songs of my age group!)
          A better word count raised the FOG index above my original method, BUT in combination with your better syllable count, the FOG index dropped (15.42976 to 15.4073). Presumably a more accurate (lower) value for AverageWordsPerSentence outweighed a more accurate and higher value for PercentageLongWords.

          Regardless, applying your two corrections shows me that my FOG index for this single memo is significantly higher than I had thought, and since my purpose is to gain an alert whenever the wording gets too complex, this is good. (The theory is a FOG index between 8 and 12 is OK, but outside those bounds is too simple or too complex.)

          So, your fixes are very much IN! Thanks again.

    Viewing 2 reply threads
    Reply To: Syllables of a word (Word97SR2+)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: