• Understanding Office document formats

    Home » Forums » Newsletter and Homepage topics » Understanding Office document formats

    • This topic has 10 replies, 6 voices, and was last updated 7 months ago.
    Author
    Topic
    #2665257

    OFFICE By Mary Branscombe Inside every Office file is a hierarchy of formats and XML markup. If you understand these structures, you can use that know
    [See the full post at: Understanding Office document formats]

    8 users thanked author for this post.
    Viewing 5 reply threads
    Author
    Replies
    • #2665431

      Interesting article. No mention of OneNote. Is OneNote not part of  the “most Office apps”?

      • #2665485

        no, OneNote doesn’t use the same XML file structure and while the .ONE files are containers for both the content and the structure of a OneNote notebook, you can’t change the file extension to .ZIP and crack them open in the same way.

    • #2665435

      I knew how to get to the zip format for Excel, but never looked for the other formats. I must say you did a great job explaining the file storage structure. Thanks!!!

      1 user thanked author for this post.
    • #2665437

      Mary,

      Great article! This is very useful information.

      RG

      May the Forces of good computing be with you!

      RG

      PowerShell & VBA Rule!
      Computer Specs

      1 user thanked author for this post.
    • #2665571

      Mary, thank you for all the handy information.  This leads me to a question:

      I’m an Office 365 subscriber and occasionally will password protect Word docx files which also encrypts them with SHA-256 military encryption.  It certainly passes the HEX editor test, but as complex as MS Word has become I sometimes wonder if this can create vulnerabilities in the SHA-256 structure.

      What is your take on this?

      Desktop mobo Asus TUF X299 Mark 1, CPU: Intel Core i7-7820X Skylake-X 8-Core 3.6 GHz, RAM: 32GB, GPU: Nvidia GTX 1050 Ti 4GB. Display: Four 27" 1080p screens 2 over 2 quad.
      • #2672143

        TechTango, the complexity of what you’re encrypting doesn’t affect the protection you get from an encryption scheme. Although there’s the possibility of collisions with SHA-256, the numbers involved are astronomically large and the amount of computation required is currently unfeasible assuming the cryptography has been implemented well (and as Mike points out, Microsoft’s implementation has been audited and widely tested). Although there are tools that can crack password hashes, they work best if people have used weak passwords so you can protect yourself by making sure you pick a strong one. In fact, the fact that Microsoft had to make a tool called DocRecrypt for IT admins to switch Office document encryption to be managed by certificates that allow the IT team to unlock documents when people forget their password suggests that the password protection is pretty secure. The main threat is people guessing your password, so again, make sure it’s a good one!

    • #2666589

      IIRC, Microsoft began using the OPC (Open Packaging Conventions) containers to store office files circa ~2006/2007? I think it debuted publicly in Office 2007.


      @TechTango
      AES256 (vs SHA256) would be used for the encryption. I have never personally validated it’s implementation as used for Office files, but I recall writing decryptors for such files in the past and the data successfully decrypted.

      Encryption done correctly (initialization vectors, random padding, feedback, etc.) theoretically mitigates many of the publicly known vulnerabilities. It’s highly likely many have analyzed MSFT’s implementation to confirm it was done correctly. That doesn’t stop MSFT from screwing the code up again later. A larger threat is the push to move to ECC, FIDO/2 and other “just trust us, we deleted the seed values so it’s secure” crypto systems.

      My advice: For anything that must be kept absolutely secure– don’t store it on a computer. If you must store it on a computer, don’t interface with it using Windows.

      Good luck!

      • #2672148

        yes, Office Open XML has been around for a while: Office 2000 and 2003 let you create documents programmatically using .NET and XML but the new file format came in with Office 2007 and then got standardised through ECMA. But it turned out we’d never written up how you could use it for more than just saving files!

    • #2672273

      The main threat is people guessing your password, so again, make sure it’s a good one!

      Thank for your detailed response.  VERY helpful, and yes, my PW is a super solid assortment of numbers, characters, upper & lower letters.  20 of them = brute force over 19qn years.

      password-cracking-chart-2024-1

      Desktop mobo Asus TUF X299 Mark 1, CPU: Intel Core i7-7820X Skylake-X 8-Core 3.6 GHz, RAM: 32GB, GPU: Nvidia GTX 1050 Ti 4GB. Display: Four 27" 1080p screens 2 over 2 quad.
      • #2711183

        returning to this to note that NIST guidance on secure passwords is changing from the hard to remember mix of special characters and cases to longer strings made up of a multiword phrase; PickAMemorablePhrase (or the classic BatteryHorseStaple) should be easier to remember but just as hard to crack. Making security simple enough for people to use properly improves security.

        1 user thanked author for this post.
    Viewing 5 reply threads
    Reply To: Understanding Office document formats

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: