Topic: Iterating Word objects efficiently (Word VBA) @ AskWoody

Iterating Word objects efficiently (Word VBA)
Home » Forums » AskWoody support » Productivity software by function » Visual Basic for Applications » Iterating Word objects efficiently (Word VBA)
- This topic has 16 replies, 7 voices, and was last updated 20 years, 1 month ago.
Author

Topic
New Reply
WSAndrew77
AskWoody Lounger

March 3, 2005 at 7:49 pm #416597

Some recent threads have had to do with deleting objects from Word documents. As the posts show, when deleting objects, you’re usually confined to using a For…Next loop rather than the much faster For…Each loop.

But in some cases, there’s a third alternative that I haven’t seen discussed before (though I admittedly didn’t look very hard) that offers nearly the same speed as a For…Each loop, but with the flexibility to delete objects along the way found in a For…Next loop.

One such case is with Paragraphs, obviously something that you might often need to iterate and occasionally delete.

Paragraphs, along with a handful of other objects (fields ar e another), include a “Next” property, which returns the next object in the series. By using the Next property, you can quickly move along a collection, while still being able to delete items along the way if needed.

The following three examples don’t actually delete paragraphs — I thought I’d keep it simple for illustration purposes — but they do nicely illustrate the three different techniques for iterating Paragraphs in a Word doucment. I ran all three on the same, 252-page, 4,030-paragraph Word document.

The first uses a For…Next loop, and was quite slow, as you might expect. 2 minutes, 20 seconds.

The second uses a For…Each loop, and was a bit speedier, at 2 seconds (yup!)

The third, which starts at the first paragraph and uses the Next property to move along, also took … 2 seconds

Sub IterateParasTheSlowWay() Dim doc As Document Dim para As Paragraph Dim k As Integer Set doc = ActiveDocument For k = doc.Paragraphs.count To 1 Step -1 Set para = doc.Paragraphs(k) If para.Style = doc.Styles(wdStyleHeading1) Then para.Range.HighlightColorIndex = wdBrightGreen End If Next k End Sub '---------------------------------------------------------- Sub IterateParasTheFastestWay() Dim doc As Document Dim para As Paragraph Set doc = ActiveDocument For Each para In doc.Paragraphs If para.Style = doc.Styles(wdStyleHeading1) Then para.Range.HighlightColorIndex = wdBrightGreen End If Next para End Sub '------------------------------------------------------------ Sub IterateParasTheFastAndFlexibleWay() Dim doc As Document Dim para As Paragraph Dim paraNext As Paragraph Set doc = ActiveDocument Set para = doc.Paragraphs.First Do While Not para Is Nothing Set paraNext = para.Next If para.Style = doc.Styles(wdStyleHeading1) Then para.Range.HighlightColorIndex = wdBrightGreen End If Set para = paraNext Loop End Sub

In the case of this last subroutine, instead of applying highlighting, I could just as easily have deleted those Heading 1 paragraphs, and still been able to move along the collection correctly, since I’ve already got my hands on the following paragraph, which becomes the current paragraph on the next trip through the loop.

For…Each loops are still my weapon of choice when doing standard iterations, but using the Next property technique (the “linked-list method” formally) has proved a valuable additon to my Word macro toolbox.

Cheers!

Reply | Quote
Viewing 3 reply threads
Author

Replies
- WSHansV
  AskWoody Lounger
  
  March 3, 2005 at 7:59 pm #932384
  
  Thanks for sharing! This will come in handy, I’m sure.
  
  For others reading this, Next and Previous are properties of the following objects:
  Cell
  Column
  Field
  FormField
  MailMergeField
  Pane
  Row
  TabStop
  TextFrame
  Window
  
  Reply | Quote
- WSAndrew77
  AskWoody Lounger
  
  March 4, 2005 at 3:25 am #932487
  
  Thanks for putting that list together, Hans!
  
  Reply | Quote
- WSAlanMiller
  AskWoody Lounger
  
  March 5, 2005 at 12:16 am #932731
  
  Thanks Andrew. Neat trick and neat code samples too.
  
  Alan
  
  Reply | Quote
- WSGary Frieder
  AskWoody Lounger
  
  March 5, 2005 at 8:31 am #932753
  
  Andrew,
  
  That is neat, thanks for sharing it. There have been threads here before with regard to using .Next to iterate quickly, but that “Set obj = objNext” is a really nice trick.
  
  Another object that needs to be added to the list of objects that support First/Next is the Range object; in particular, this allows you iterate through the Characters collection (you can also do it with For Each, but that is minus the benefit of your method). In this example, all characters that are upper-case get highlighted (don’t try this on a 200 page document!):
  
  Sub IterateCharactersNext() Dim doc As Document Dim char As Range Dim charNext As Range Set doc = ActiveDocument Set char = doc.Characters.First Do While Not char Is Nothing Set charNext = char.Next If char.Case = wdUpperCase Then char.HighlightColorIndex = wdBrightGreen End If Set char = charNext Loop Set doc = Nothing Set char = Nothing Set charNext = Nothing End Sub
  
  Gary
  
  Reply | Quote
- WSAndrew77
  AskWoody Lounger
  
  March 5, 2005 at 2:49 pm #932786
  
  Hi Gary,
  
  Thanks for the info on the Range object — that also means you can go by word as well as by character (and sentences, but Word’s definition of a sentence is a bit sketchy).
  
  Another recurring topic on the board is iterating over each character, just like you’ve described. The standard objections to doing so is slow (which is why you’ve warned against running your macro on a long document).
  
  But sometimes iterating each character is the best or only way to tackle a problem, so the question moves to how to optimize the iteration, so that you (well, not you specifically, Gary) only iterate characters when you absolutely have to.
  
  For example, if you wanted to work on any characters in a document whose formatting was different from that defined by its paragraph or character style (such as direct bold or italic applied), one fairly efficient approach is the following.
  
  This macro uses two supporting functions to isolate only those words in the document that contain some degree of direct formatting (for illustration purposes, I’ve confined ‘direct formatting’ to mean bold, italic, size or font name change — in practice, that’s usually sufficient).
  
  '============================= Sub IterateCharactersSelectively() Dim doc As Document Dim wrd As Range Dim char As Range Dim para As Paragraph Set doc = ActiveDocument For Each para In doc.Paragraphs If AnyDiffFontsInPara(para) = True Then For Each wrd In para.Range.Words If AnyDiffFontsInWord(wrd) = True Then wrd.Select MsgBox "This word has character formatting " & _ "that is inconsistent with its style" ' now you only have to iterate each character ' in a word, rather than a whole paragraph ' or a whole document. Put your character ' iterating/modifying code here End If Next wrd End If Next para End Sub
  
  Basically, there’s no point iterating all the characters in a particular paragraph if none of them are any different from the paragraph style properties. So by checking that first, we can move quickly past a lot of text. If and when we do find a paragraph that contains differing formatting, then we go word by word to isolate the problem, only then iterating each character. Depending on the amount of direct formatting in a document, and the average number of characters per word in your document, this technique can be several orders of magnitude faster than iterating each character in the document. Your mileage may vary.
  
  Here are the two supporting functions used by the main macro. These could be adjusted as needed to look for things like highlighting or superscripting.
  
  '=============================================== Function AnyDiffFontsInPara(para As Paragraph) As Boolean Dim lDiffBold As Long Dim lDiffItal As Long Dim lDiffSize As Long Dim sDiffName As String AnyDiffFontsInPara = False With para.Range.Font lDiffBold = .Bold lDiffItal = .Italic lDiffSize = .Size sDiffName = .Name End With Select Case wdUndefined Case lDiffBold AnyDiffFontsInPara = True Exit Function Case lDiffItal AnyDiffFontsInPara = True Exit Function Case lDiffSize AnyDiffFontsInPara = True Exit Function End Select If Len(sDiffName) = 0 Then AnyDiffFontsInPara = True Exit Function End If End Function '========================================== Function AnyDiffFontsInWord(wrd As Range) As Boolean Dim docstyles As Styles Dim wrdstyle As String wrdstyle = wrd.Style Set docstyles = wrd.Parent.Styles Select Case True Case (Not wrd.Font.Bold = docstyles(wrdstyle).Font.Bold) AnyDiffFontsInWord = True Case (Not wrd.Font.Italic = docstyles(wrdstyle).Font.Italic) AnyDiffFontsInWord = True Case (Not wrd.Font.Name = docstyles(wrdstyle).Font.Name) AnyDiffFontsInWord = True Case (Not wrd.Font.Size = docstyles(wrdstyle).Font.Size) AnyDiffFontsInWord = True End Select End Function
  
  Cheers!
  
  Reply | Quote
  
  WSGary Frieder
  AskWoody Lounger
  
  March 5, 2005 at 4:44 pm #932800
  
  Andrew,
  
  Thanks for posting this as well – this is great stuff, and deserves a star of its own. I may have missed some related threads in the past year or so, but recall a long one from 2001 or so on this same topic (will post a link later if I can track it down). If I recall right, Klaus Linke suggested a similar approach to optimizing by filtering what gets searched, but it’s safe to say that nothing posted back then, approached this for elegance.
  
  Thanks also for demonstrating some unusual ways to use Select Case structures:
  
  Select Case wdUndefined Case lDiffBold Select Case True Case (Not wrd.Font.Bold = docstyles(wrdstyle).Font.Bold)
  
  Who knew?
  
  Gary
  
  Reply | Quote
  
  WSchrisgreaves
  AskWoody Lounger
  
  March 7, 2005 at 1:27 am #933001
  
  Andrew thanks for this. I found that it wasn’t isolating individual words within a range, and so I modified it slightly (attached) to collect a “fnt” object from the first character of the paragraph; I also made it a function that accepts a Range as parameter, so I’m not restricted to a docUment.
  
  PS I should add that I purchased a copy of WordHacks two weeks ago, and love it.
  
  Reply | Quote
  
  WSAndrew77
  AskWoody Lounger
  
  March 7, 2005 at 1:54 am #933004
  
  Hi Chris,
  
  Glad to hear you like the book — it’s very gratifiying to hear that people have found it useful.
  
  I’m a little unclear on what you mean by “wasn’t isolating individual words within a range”; could you describe the problem (or post a sample document)? I wasn’t able to get it to not isolate on each word. Your revised macro and my original produced the same results for me.
  
  Reply | Quote
  
  WSchrisgreaves
  AskWoody Lounger
  
  March 7, 2005 at 2:03 pm #933089
  
  > “wasn’t isolating individual words within a range”
  
  Andrew, I have attached a Sample.doc containing two paragraphs which themselves contains a text formatted in a user-defined character style (MacroCharacters). The VBA module has a copy of your code. If I extend the formatting to include the second part of the word preceding the original formastting, it is well-detected.
  
  Your code is timely as I am currently analysing 6,000+ documents with a client’s request to isolate all non-standard formatting, and had been using an abbreviated “font” object, much as you suggest, for matching:
  
  With .Font strresult = strresult & .Bold & strdelim & .Italic & strdelim & _ .Underline & strdelim & .Size & strdelim & .StrikeThrough & strdelim .Bold = wdUndefined
  
  >people have found it useful
  I wouldn’t have described it as “useful” (grin!)
  
  Reply | Quote
  
  WSAndrew77
  AskWoody Lounger
  
  March 7, 2005 at 2:38 pm #933112
  
  Ah, perhaps I wasn’t clear on the macro’s evaluation criteria. The goal is to isolate direct formatting not associated with a style — a paragraph style or a character style. In the case of your sample document, those words that use the “MacroCharacter” style are perfectly acceptable — the user has correctly applied a character style to differentiate a portion of a paragraph. If, however, you apply additional formatting on top of the MacroCharacter style, like italics, the text will get flagged.
  
  If you want to detect any deviation from the paragraph style (including the use of character styles), you could probably just change:
  
  Dim wrdstyle As String wrdstyle = wrd.Style
  
  to
  
  Dim parastyle as String parastyle = wrd.Paragraphs.First.Style
  
  I have not tested that, by the way.
  
  Hope this makes sense. Cheers!
  
  Reply | Quote
  
  WSchrisgreaves
  AskWoody Lounger
  
  March 7, 2005 at 4:23 pm #933141
  
  > detect any deviation from the paragraph style
  Right, thanks, and yes, it does make sense.
  I realised this morning that an essential part of any detection like this will be establishing the basic criteria.
  My mind was on “different from the first character of the paragraph”, but it could have easily been “different from the first word of the paragraph” (including FWIW “undefined” as an allowable basis for comparison). Your example was, then “different from the style of the word”. I think I got that right. I’m pretty sure, though, that my problem was caused by my not reading your definition, and trying to make the code do what I wanted without first priming it in the correct manner.
  
  Reply | Quote
- WSfburg
  AskWoody Lounger
  
  March 16, 2005 at 5:38 am #935066
  
  Hi Andrew,
  
  I just read the entire thread to date since I haven’t had a chance to read much of anything on the Lounge lately. This looked interesting.
  
  Let me hypothesize that the first loop is doing something different than the 2nd and 3rd. Practically speaking they result in the same output. I don’t know if this is really true, so this could be all smoke.
  
  The For…Next goes thru the paras in reverse order. Is it possible that Word or VBA has to step thru a link list each time to find the i-th para? Although I know why you go in reverse order, would the 1st approach be a little better than it is now if you went forward? Maybe not much.
  
  The 2nd approach lets Word/VBA do the driving and keeps track of pointers as you go thru the loop of paras. It probably takes advantage of the For…Each construct to step thru the collection of paras using pointer mechanisms. The 3rd approach seems like it could be the same with you doing the work instead of Word. In fact, I wouldn’t be surprised if the actual implementation of the 2nd approach looked like your 3rd approach.
  
  I would also suspect that there could be a difference in how the loop conditions were handled in the first 2 cases. For example, do you know if the looping statement
  
  For k = doc.Paragraphs.count To 1 Step -1
  
  has to retrieve the doc.Paragraphs.count in each iteration? Although this would be bad programming in terms of compiling the source code (or interpreting it), I’ve seen worse. If this makes a diff, than I always stored this kind of var in a local var. That is:
  
  paraCount = doc.Paragraphs.count
  For k = paraCount to 1 Step -1
  
  Also, another key diff between your 1st and 2nd approaches is the need for the Set stmt in the 1st approach. This probably adds overhead in that the code has to set a pointer after retrieving yet other info (doc.Paragraphs(k)). The 2nd approach is letting the loop mechanism take care of this so you’re cutting out figuring out what doc.Paragraphs(k) is.
  
  Even tho the 3rd approach does a Set, you’re setting paraNext, itself a “pointer”, to a “pointer” in the current para, which you already have access to. So, as I mentioned above, the 2nd and 3rd approaches should be the same.
  
  I’d also wonder if the size of the doc may have something to do with the big diff. For example, in getting to a 200+ page doc, I doubt that you entered the paras sequentially. Type a few paras, go back and insert a para before another para, copy and paste a few paras from another document in between 2 existing paras. I’m going to guess not and that Word probably relinks the para collection when you insert. So the link list would look the same when all’s said and done regardless of whether you entered them “right the first time” or went back and forth as mentioned just above.
  
  Or this could be way off base.
  
  Fred
  
  Reply | Quote
- WSAndrew77
  AskWoody Lounger
  
  March 16, 2005 at 3:45 pm #935185
  
  Hi Fred,
  [indent]
  
  Let me hypothesize that the first loop is doing something different than the 2nd and 3rd. Practically speaking they result in the same output. I don’t know if this is really true, so this could be all smoke.
  
  [/indent]
  For … Each loops are an optimized shortcut for iterating a collection of objects, or an array of Variants. It’s functionally (though not performance wise) to use either of the following two loops:
  
  For Each para in ActiveDocument.Paragraphs ' Do something here Next para ' ------ For i = 1 to ActiveDocument.Paragraphs.Count Set para = ActiveDocument.Paragraphs(i) ' do something here Next i
  
  In the case of the For Each loop, the set statement is implicit, and there’s no need for an iterator variable, since the 1 to .count is also implicit. The For Each loop is faster because VBA can, in effect, pre-load the objects your loop needs. The For..Next loop on the other hand, can’t do that, because VBA has no way of knowing whether or how much the value of i might change between iterations.
  [indent]
  
  The For…Next goes thru the paras in reverse order. Is it possible that Word or VBA has to step thru a link list each time to find the i-th para? Although I know why you go in reverse order, would the 1st approach be a little better than it is now if you went forward? Maybe not much.
  
  [/indent]
  The order in which you iterate doesn’t matter for speed, but is important if you want to delete any items. Deleting items while moving forward will result in skipped items, which is the same thing that can happen if you try deleting while using a For..Each loop. Consider this example:
  A document with 4 paragraphs, in this order: Heading 1, Heading 2, Heading 2, Normal.
  
  For k = 1 to ActiveDocument.Paragraphs.Count If ActiveDocument.Paragraphs(k).Style = "Heading 2" Then ActiveDocument.Paragraphs(k).Delete Next k
  
  In this case, the third paragraph won’t get deleted (and you’ll get an error when k gets to 4).
  
  [indent]
  
  The 2nd approach lets Word/VBA do the driving and keeps track of pointers as you go thru the loop of paras. It probably takes advantage of the For…Each construct to step thru the collection of paras using pointer mechanisms. The 3rd approach seems like it could be the same with you doing the work instead of Word. In fact, I wouldn’t be surprised if the actual implementation of the 2nd approach looked like your 3rd approach.
  
  [/indent]
  I think you’re probably pretty close on that.
  
  [indent]
  
  I would also suspect that there could be a difference in how the loop conditions were handled in the first 2 cases. For example, do you know if the looping statement
  For k = doc.Paragraphs.count To 1 Step -1
  has to retrieve the doc.Paragraphs.count in each iteration?
  
  [/indent]
  No, the value is computed once at the start of the loop, not during each iteration.
  
  [indent]
  
  I’d also wonder if the size of the doc may have something to do with the big diff. For example, in getting to a 200+ page doc, I doubt that you entered the paras sequentially. Type a few paras, go back and insert a para before another para, copy and paste a few paras from another document in between 2 existing paras. I’m going to guess not and that Word probably relinks the para collection when you insert. So the link list would look the same when all’s said and done regardless of whether you entered them “right the first time” or went back and forth as mentioned just above.
  
  [/indent]
  I actually did insert them sequentially, using the rand() trick. I wouldn’t think the order in which the paragraphs were entered would have much impact on the efficiency of the iteration, but I could be wrong.
  
  Thanks for the insightful comments!
  
  Reply | Quote
  
  byteme
  AskWoody Plus
  
  March 17, 2005 at 5:42 am #935324
  
  Based on my experience, it looks to me like if you use For Each…Next to iterate through a document’s paragraphs, deleting some of the paragraphs as you go, no paragraphs get skipped (assuming the code in the loop isn’t using some kind of index reference that hasn’t been adjusted to account for the deletions). To take your 4-paragraph document example, this works (without any skipping):
  
  For Each parX In docX.Paragraphs If parX.Style = "Heading 2" Then parX.Range.Delete End If Next parX
  
  I’ve found the same to be true of iterating through a document’s styles, and I expect it’s true of most of the VBA object collections.
  
  Reply | Quote
  
  WSAndrew77
  AskWoody Lounger
  
  March 17, 2005 at 2:45 pm #935422
  
  Hi Steve,
  
  I’ve definitely run into problems when deleting while using a For Each loop, particularly with Hyperlinks and Fields. For example, take a look at the attached document. It’s got a macro that tries to delete all the hyperlinks in a document with a For..Each loop:
  
  Sub DeleteWithForEach() Dim h as Hyperlink For Each h in ActiveDocument.Hyperlinks h.Delete Next h Exit Sub
  
  Just double click the macrobutton field in the first paragraph to run it.
  
  You’ll see it definitely does not delete all the hyperlinks.
  
  Rather than experiment with different collections to see which ones might work with a For Each, when deleting I always use a For Next and work backwards (or use the linked-list method described above).
  
  Reply | Quote
  
  byteme
  AskWoody Plus
  
  March 17, 2005 at 7:32 pm #935485
  
  I see what you mean about Hyperlinks, but it looks like that’s a consistent, replicable “design flaw” having to do with the way For Each works with Hyperlinks. What happens in both your sample document and in a separate document I created is that the odd-numbered Hyperlinks (i.e., every other Hyperlink) get deleted, suggesting that Word is effectively using some kind of non-updated index counter to work through the document’s Hyperlinks.
  
  In a sense, the consistency is encouraging (to me, anyway) because it suggests (to me, anyway) that built-in object collections that don’t display the every-other-item behavior will probably consistently not be subject to the every-other-item flaw. Those “non-flawed” collections seem to include Paragraphs, Styles, date Fields (the only kind I’ve tried) and Bookmarks.
  
  I’d be interested to hear if anyone else has encountered “skipped items” behavior using For Each with Paragraphs or Styles where (1) the loop deleted items, and (2) the loop didn’t refer to items using an index that wasn’t adjusted to account for the deletions. Given that For Each..Next is supposedly much more efficient that For..Next when dealing with collections, I’d hope it would turn out the list of “flawed” collections is a narrow list and we can confidently use For Each with the rest.
  
  Reply | Quote
Viewing 3 reply threads

Reply To: Iterating Word objects efficiently (Word VBA)
You can use BBCodes to format your content.
Your account can't use all available BBCodes, they will be stripped before saving.

Your information:
Name (required):

Mail (will not be published) (required):

Website:

Cancel

Plus Membership

Donations from Plus members keep this site going. You can identify the people who support AskWoody by the Plus badge on their avatars.

AskWoody Plus members not only get access to all of the contents of this site -- including Susan Bradley's frequently updated Patch Watch listing -- they also receive weekly AskWoody Plus Newsletters (formerly Windows Secrets Newsletter) and AskWoody Plus Alerts, emails when there are important breaking developments.

Welcome to our unique respite from the madness.

It's easy to post questions about Windows 11, Windows 10, Win8.1, Win7, Surface, Office, or browse through our Forums. Post anonymously or register for greater privileges. Keep it civil, please: Decorous Lounge rules strictly enforced. Questions? Contact Customer Support.

Iterating Word objects efficiently (Word VBA)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Key Links

Remembering Woody

Iterating Word objects efficiently (Word VBA)

Plus Membership

Search Newsletters

Search Forums

View the Forum

Search for Topics

Recent Topics

Recent blog posts

My Profile

Login and Registration

Key Links

Remembering Woody