• Document Corruption

    Author
    Topic
    #1767796

    I have a user copy and pasting info for several documents for a monthly report. These people use several other documents to create what they send to my user. All parties involved are using Word 97 at least SR1. The problem is some of these supporting documents corrupt the master document. There is no apparent problem with the document; the text and formatting is all intact, but you get a “This document may be corrupt” error. I have tracked down the corruption to certain users, but there is no apparent reason for why this is occuring. Does anyone have any idea, or has anyone seen this before??

    Viewing 5 reply threads
    Author
    Replies
    • #1776200

      My theory (it is only a theory) is the defined (not actual) numbered lists in the document have grown to a huge number and there is no way in VBA to clear this metadata. This appears to be incremented every time a user visits the list gallery and copying and pasting large blocks of text adds the two files together etc. Over the life of text the number grows larger than you could imagine. I have seen around ~1,960 in documents that give that error message. You will discover that you can delete the entire content of the file and still have a huge number of lists in the metadata.

      The way I get rid of them is to save the document down into Word 95 format and then open it and save it in Word 95. Now you can bring it back without the extra metadata. Note: you must open the file with Word 95 as that is the crucial step.

      You can find out how many defined numbered lists are in the file by running the following code

      Sub ListCount()
      '  Dim listgal, i
        Dim sList As String
        sList = "Template Lists = " & ActiveDocument.AttachedTemplate.ListTemplates.count & vbCr
        sList = sList & "Document Lists = " & ActiveDocument.ListTemplates.count
        MsgBox sList
      End Sub

      If you can’t lay your hands on a copy of Word95 let me know and I will describe a method for cleaning within Word97/2000.

      • #1776285

        Can you post your method for cleaning with Word 97/2000 anyway? I would be curious to see if it is what we are already doing.

        • #1776322

          Its a method, not a macro – although it could be made into one. It works because of an oddity I have not yet discovered the key to. If I copy a large chuck of the problem file into a new file then the metadata lists come also. If I copy each part of the document in small enough chunks then the metadata lists get left behind. The problem is “Small enough” chunks defy definition. It can be as large as 10+ pages or as small as 2-3 pages. The only way to know is to paste a chunk and do a list count by running the earlier macro.

          In Word97/2000 to remove the spurious lists I have:
          1. Open the problem file (call it F1) and a new blank file based on the same template (call it F2)
          2. Check the list count in F2 and make sure it is low or zero. If not you have to start with a new blank file and attach the template and update the styles.
          3. Copy pages 1-5 from F1 into F2
          4. Run the macro and check the list count hasn’t shot up.
          5A. If the list count is low – Save F2 and repeat step 3 on the next chunk.
          5B. If the list count is high – Close F2 without saving, reopen the F2 and try step 3 with a smaller chunk.

          When you get impatient and start taking bigger and bigger chunks you will fall into step 5B. I couldn’t be bothered making a macro to do this although it probably wouldnt be too hard. I run both Word95 and Word97 for problems exactly like this and that is easier for me.

          • #1776357

            Hi Andrew:
            While I don’t know the key either, I have noticed that Word responds differently with large chunks of text as oppossed to small chunks. For example, if your trying to delete unused standard styles from a document, you can do so by copying small chunks of text to a new document. Copying large chunks copies the unused styles also. See KB article http://support.microsoft.com/support/kb/ar…s/Q193/5/36.ASP%5B/url%5D

            • #1776431

              Thanks Phil

              Interestinger and interestinger. Microsoft come through with a biological answer to a digital question (APPROXIMATELY 50 paragraphs). They hire anyone these days don’t they.

              A similar methodology to Method 2 could be applied to remove the extra lists too I suspect. Definitely worth further investigation.

            • #1776462

              One way to help to avoid trouble is to skip the last paragraph when copying the whole content from a doubious document to a fresh one. And, clearing/reducing the metadata seems a good idea to me, although I did not see an MSKB article relating it to file corruption.

              WD97: How to Minimize Metadata in Microsoft Word Documents

              http://support.microsoft.com/support/kb/ar…s/Q223/7/90.ASP%5B/url%5D

              WD2000: How to Minimize Metadata in Microsoft Word Documents
              http://support.microsoft.com/support/kb/ar…s/Q237/3/61.ASP%5B/url%5D

            • #1776526

              Hi

              I’ve noticed this thread a bit late in the piece, so my apologies if this has been covered before.

              I’ve noticed similar things when copying-and-pasting into a new document:

            • #1784803

              Yes, all the above are problems with WD97 documents and corruption may be avoided by following the above solutions.

              At my firm (legal industry) the solution is to use styles for EVERY paragraph, then when copying and pasting to a new document, ALWAYS paste special unformatted from the Edit menu and then apply the necessary styles to the pasted data. This does (intentionally) strip the formatting, but after losing many many many documents to corruption using regular copy and paste, use of this method has never corrupted a file!

              If numbering is properly linked to Heading Styles and use of Normal style is avoided, then reformatting the pasted information is quick and easy.

              V.

            • #1784843

              Hi V

              Why do you avoid the Normal style?  Does it cause corruption, or is there some other reason?

              Thanks
              Dale

            • #1784869

              Normal style is the “foundation” of the structure. Always avoid leaving a paragraph in normal style. Look at the document, decide how many styles are needed, create them, and apply as you go. When a document is properly formatted the “foundation”, in this case Normal, is not visible.

              See this Microsoft article for correct style usage: http://officeupdate.microsoft.com/legal/Styles.asp

            • #1784897

              Thanks V

              I understand that the Normal style is the foundation, in that all other styles are (directly or indirectly) defined in relation to it.

            • #1784898

              The reason for not using Normal style is that it is never necessary — text should always be a Body style, formatted heading paragraphs should always be a Heading style, etc. The main reason for this is that one should never use direct formatting, always put your formatting in a style, and not in Normal unless you want to change every paragraph. It does not add another level of complexity to formatting, simply regard Normal as the base. Then, if you want to make a quick change to EVERY paragraph in your document you would modify the Normal style. Normal styles does not cause corruption.

              Please read the Seven Laws of Styles at http://www.woodyswatch.com/office/archtemplate.asp?v4-n20.

              V.

            • #1784915

              You can show me other references all day but I still don’t agree that you can’t use Normal (yes I know its heresy to disagree with any of Bob’s Seven Laws). Using ‘Normal’ is the logical thing to do and I agree with Dale that using BodyText instead just adds another level of complexity.

              Yes modifying Normal will impact any styles based on it but thats the beauty of it. Have you ever realised that any styles based on Body Text will also be modified if you modify the Body Text style? It takes time to create a template full of styles and with sensible effectivity in the style families but doing so should be a once off exercise and then you can use ‘Normal’ to your hearts content.

              Pasting as Text only is too destructive on large files as you lose all Tables and Graphics. If your document doesn’t have any then it might be OK but its still a lot of work to rebuild/reformat and should be a last resort rather than the first thing to try.

              Sorry to disagree with yourself and Bob but I have created plenty of files using Normal and I have no problems. It can be done and I believe it is more logical than using ‘Body Text’.

            • #1784917

              Yes, Mr. Lockton, you are free to disagree, but I am responsible for the integrity of over a million heavily formatted Word 97 and 2000 documents containing many levels of numbering and tables of contents, complicated financial tables, automatic cross-referencing and tables of authorities which are shared by hundreds of users and we don’t have corruption problems, which is what started this whole e-mail thread. My users and I don’t have time to dig in VBA and find “lists of metadata”, we need a solution to the problem, which we have.

              If you understand how Word is designed and use Normal style with direct formatting for everything, then you are simply being stubborn. I myself was dragged kicking and screaming away from WordPerfect and would love to tell you that you can do whatever you like without any trouble.

              Of course if you use Body, and then modify it, every occurrence of Body changes, that’s the whole point. If I want to replace a few shingles on the roof of my house, I don’t start by going down to the basement and chipping away at the foundation do I? That’s what you are doing if you use Normal style everywhere. Styles are layered. I teach my users to change only the layer(s) that need(s) to be changed. Look at your document before you start, plan how many styles you need, use the provided styles or create your own either on-the-fly or in a template, then format quickly, easily, reliably, and durably.

              Go ahead and Paste styled text all you like, but don’t come crying to me when your documents are corrupted. Mine won’t be because my users and I are going to Paste Special Unformatted and quickly and simply apply the necessary styles in the destination document. You shouldn’t usually need to copy large documents and paste them into other large documents unless you are quite disorganized. Tables and graphics are easy and safe to copy and paste individually. Yes, it is work to reformat but that’s why they call it work and I want to do it efficiently. Unless and until Microsoft redesigns Word from the ground up, “Pasting” styled, formatted text will cause you corruption grief.

              I suggest you go back and read the Microsoft document from my previous post and follow that with Leonhard’s “Word97 Annoyances” before you start disparaging references to work done by authors who have fully investigated the problems about which they speak.

              Van Swearingen
              Applications Development and Training Support Manager
              Curtis, Mallet-Prevost, Colt & Mosle LLP
              New York, NY

            • #1784932

              Now thats the spirit Van. I stand corrected, my lifes work in tatters and my diginity crushed.

              I would like to challenge a couple of your statements:
              – I don’t think I espoused that direct formatting had to be used for everything. All I said was that you didn’t need another style called Body Text to do the same thing as Normal. These are two quite separate issues.
              – If you think your hundreds of users only ever paste special as text only (without disabling there Word environment) then you have great faith in people’s lack of ability to find shortcuts.
              – Do you really think that pasting tables is not pasting ‘styled, formatted text’
              – I happen to have already read Woody’s book and loved it. I also posted him a note on the couple of errors I did find there and got a return note to say he would correct them in the next book. Not that it has happened yet but maybe we’ll see one for XP soon.
              – You may also have noticed I haven’t come crying to you on fixing a corrupted document. I realise now that would be pointless as you have never seen one.

              Did I mention that I can quite happily use format painter to apply styles as it can be a whole lot quicker than finding the style in the abominable style drop-down. Whoops thats another law broken.

            • #1784939

              Whatever works for you, but I’m glad you’re not working on my network.

            • #1784919

              Although at our firm we have it closer to what Van describes (we use “Body” as an effective base, which sits on top of Normal), if you’re talking about logically designed templates that make intelligent use of based-on styles, then I’m not convinced that one side or the other of this argument is necessarily better.

              But in most cases, the issue we see with use of Normal style in documents, is that it is the only style used in the document! – which is direct formatting hell, and I think all parties here agree that that’s bad news.

              I have to come down more on Van’s side on the issue of pasting formatted text: I’m in a continous spar with our document center, trying to get them to take the time to paste unformatted text and restyle – but neither they nor the attorney who’s in a hurry want to take the time.

              Here’s the result of this practice – actual case in point, occurred just a couple of hours ago:

              Attorney contacts me; is in a panic, has a 200 page financial document, in which a substantial tract of autonumbered text has lost its numbering, and this needs to go out as final tonight.

              I say I’d be happy to take a quick look (since I built the template, I know how to fix these ).
              It turns out that this tract of autonumbered text came from another firm, and was pasted into our document. There are now dozens of extra styles in the document, that are not from our template. Worse, the document now has 1368 list templates (our template only contributed 19 of these), and worse yet, the autonumbered sections are chockful of hidden, empty autonumbered paragraphs, with ‘hard-coded’ number restarts on the paragraphs that are visible. The numbering styles are not the ones from our template, but rather the ones from the other firm. When I start to make changes to this section, half of the document text becomes bold and centered!

              I close the document without saving.
              Tell the attorney he’d better tell the clients they won’t have the document until the morning.
              I also tell the attorney this document is damaged and is likely to cause further problems (thank god I no longer run the document center, so I’m not directly responsible for these things!).
              And I hand it over to an ace operator in the document center, with advice to cut the offending portions of the document, paste them as unformatted text into a fresh document, reapply styles, and then paste back into the main document (which will still be a messed up document).

              All of the above, because the attorney and the document center were not willing to take the time to paste the text in as unformatted, and reformat it, in the first place.

              True story; just happened.
              Word simply isn’t robust enough to handle this mixing and matching.

              Gary

            • #1785031

              What happens if, instead of running through all these checks, you save the incoming document as RTF format, close it, re-open it and paste from there?

            • #1785038

              Hi Chris

              I understand it can help, but I’ve never had much success with the technique myself.

              However, this could be because I use specialist templates with many macros and many styles (including bullets and numbering styles), which all seems add to the inherent instability.  (You’ve sure got a lot to answer for, Mr Gates.)

              Dale

            • #1785030

              > copying small chunks of text to a new document

              Really works?

              I’m tempted to think of this as an urban myth, although I have no experience whataver in doing this trick.

              As I was reading through, I thought of “unused styles” which is probably “references to unused styles”. and meta-data and all the other sort of junk that Word leaves lying around.

              I figure that the junk is attached to paragraph marks, and that copying a paragraph mark with attached junk brings along the attached junk, whereas avoiding that paragraph mark results in no-junk.

              The Urban Myth would arise if someone had copyied an entire document (Ctrl-A; Ctrl-C), seen the junk prior to deleting unwanted text, and then tried by copying only the desired text and NOT bringing across junky-paragraphs.

      • #1776402

        Aside from the unavoidable loss of formatting(tables mostly) this has been an excellent solution! Thanks for your help!!!!

      • #1784765

        Hi Andrew,

        I have a query about your macro. I just ran this on a one page document that doesn’t contain any numbering.

        There are heading styles that have had numbering applied and three user-defined numbering styles but none of these styles have been used in the document.

        The result of your macro says:
        Template Lists=0
        Document Lists=5

        When I look at the original template it shows:
        Template Lists=5
        Document Lists=5

        I am not sure what the result is actually showing me. This file does cause problems when printing.

        I would really appreciate some clarification.

        Thank you.

        • #1784776

          I’m with Gary, I don’t think it is the lists. I only see the corruption message when the lists are up around 1800. I would have expected the macro to go toes up inside a template as there is no template attached to a template so it is interesting that it actually does something there without crashing.

          I would try diabling your gee-whiz print macro before casting nasturtiums at the metadata lists. Can you slow the steps in the macro to allow time for repagination to take place before the print happens?

          • #1784780

            <>

            – now there’s a turn of phrase you’re not likely to encounter anywhere but here! grin

    • #1776274

      I have to agree with Andrew. The only other time I have seen that error message was when I was coping multiple documents generated by Engineering into one composite document. The problem arose when I started coping numbered list. The only work around I found was down saving like Andrew suggested. The draw back was I lost a lot of formatting that had to be redone. YUCK!!! After enough nights recovering the document, I figured out that by switching the numbered list to normal before coping them to the composite document I avoided the error.

      Since I had a lot of copying to do, I created a little macro that changed the style to normal of the selected text (the numbered list), added a unique string at the end of each paragraph that told me if it was a roman or alpha list and what level. Then when I copied it, I ran another macro that set the correct attributes based on the string. The string was then deleted.

      The end result was that only the numbered lists in the composite document were ever used.

      Hope this helps
      Jay

    • #1776611

      We’ve found through very painful experiences that one thing that causes a lot of files to become (or appear to be) corrupt is the existence of lots of temp files. This too only happens with certain users (and certain user’s PCs). Just recently I found one user who had almost 2,000 tmp files in their windows directory and all documents were corrupting. Once those were deleted, everything was okay. I have put a line in the autoexec.bat that reads “del c:windowstemp*.tmp”

      Just my two cents… HTH

      • #1776623

        This was one of the first things I looked at, thanks. Andrew’s method of saving back and then saving it forward did the trick {)

    • #1777109

      “There are two kinds of Master Documents. Those that are corrupt and those that soon will be Corrupt.”
      http://www.mvps.org/word/FAQs/General/WhyM…DocsCorrupt.htm%5B/url%5D
      http://www.mvps.org/word/FAQs/General/RecoverMasterDocs.htm%5B/url%5D

      You may also want to look at the following for links to work-arounds:
      http://www.addbalance.com/word/masterdocuments.htm%5B/url%5D

      Hope this helps.

      • #1777110

        Oops! I left out the following link which may address your document corruption attempts if they are caused by something other than use of Master Documents:
        http://www.addbalance.com/usersguide/document corruption.htm[/url]

        • #1777145

          Charles,

          It’s a good site. Thankks for the work.

          On this link, you give a tip “Convert the File to Another Format, then Convert it Back to Native Format”.

          You mentioned RTF- I just wanted to add to that.

          I recently had a corrupt document in Excel- so in Excel 2000 (this does not work for 97) I saved it as html, then saved it back as XLS. All my formatting and macros were preserved, and the corruption was gone.

          • #1777150

            Thank you for your kind comments. I’m adding your information to that page. However, while I take pride in the site and the work that I’ve put into it, the basic Legal Users’ Guide is a production of Microsoft and its Legal Advisory Board. I’ve tweaked it and added a couple of chapters, but the bulk of the work was done by others.

      • #1777134

        I found this very useful.. thanks for the help!

    • #1784766

      Hi Karen,

      Not sure why you get the different numbers for the two tests, but it is clear that whatever problem you’re seeing with your document, it’s not caused by having too many list templates – that number would probably have to get past 200 before you’d be likely to see problems.

      So there’s probably some other cause for the problems in your document.

      Gary

      • #1784773

        Thanks Gary. I was just clutching at straws there! I think the problem is really with a print macro that inserts an autotext entry that is a text box at the top of the document, prints the doc and then removes the text box. Sometimes causes Word to crash or lock up with no error message.

    • #1784797

      I picked up this thread from Woody’s Office watch. I think the problem could be solved if the people who are doing the pasting used “paste special” instead of paste. This brings over the contents only with no metadata. You need to use the “unformatted text” option. I know, it will also lose formatting, but you
      lose that anyway. Not a terrific solution, but maybe a beginning for others. Karla

    Viewing 5 reply threads
    Reply To: Document Corruption

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: