• Importing HTML Into Word

    Author
    Topic
    #502369

    1)

    I need advice about how to make my stuff available: 2 files, 3 and 8 Mb.

    2)

    It’s a long story … but I have to produce a hard copy of the content (only) from a very long blog.

    I exported it, and massaged it into an HTML file (details below).

    The HTML file looks fine.

    When I import it into Word, it looks fine at first glance. (I need it in Word for pagination, formatting, and so on).

    But, with a deeper look, it has dropped out at least one whole section of HTML, probably running to dozens of pages in Word. There may be more than one of these events.

    I am figuring that there is something in my HTML code that is confusing Word, but I’m not sure what it can be. I am getting no error messages.

    I’m looking for help troubleshooting this to get the entire HTML file to import into Word.

    * I’m not sure if how I got the content out of the blog is relevant or not. The blog is in Blogger. with several thousand posts and comments over a 10 year period. I used Blogger’s export tool to obtain an XML of the blog. I brought that into Excel. There I isolated the posts and comments from most of the formatting and boilerplate. The XML also export all posts chronologically, followed by all comments chronologically. So I used to Excel to match up the posts with their comments, and sort the whole thing: most recent post, followed by all its comments, followed by the second most recent post, and all its comments, and so on. Then I just copied the raw HTML code into a bare bones HMTL file. That file looks plain but fine in a browser, and appears to be complete. But when I open that file in Word (and it automatically detects it as an HTML file), at least one whole chunk goes missing.

    Viewing 1 reply thread
    Author
    Replies
    • #1529647

      Does it have to be in Word? If not, I would recommend you try Adobe Acrobat instead. It has a feature which produces a PDF from html and can include all the linked pages to multiple levels. If you provided the blog address, someone here could possibly do this for you if you don’t own Acrobat.

      It might be possible to do it the way you are exploring but it is a whole lot of work and as you have found, could fail to give you what you want.

    • #1529732

      There is a level of ridiculousness associated with this job that is coming from the people I report to.

      I could probably do it in Acrobat, but I think they’d just come back and say “Fine … now could you put it into Word”. So I’d only be interested in this option if it made the import into Word work better.

      As to the linked content, I don’t really need that. Just the links will be fine.

      My deliverable needs to be just the posts and comments, suitable for being included into a master document. The actual file with links working as best as possible will probably never get looked at. They may just take the hard copy and slip it into binders to stick on a shelf.

      Like I said, it’s kind of ridiculous for a blog with thousands of posts and comments, but there it is.

      I am only worried about solving this particular problem because the stuff that gets dropped by Word is actually rather close to the front, and someone paging through from the front might see it.

    Viewing 1 reply thread
    Reply To: Importing HTML Into Word

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: