-
Understanding Office document formats
OFFICE
By Mary Branscombe
Inside every Office file is a hierarchy of formats and XML markup.
If you understand these structures, you can use that knowledge to extract information directly from most Office app files.
When Word, Excel, and PowerPoint first came out, they stored documents in proprietary binary file formats, with text, styles, page layout, and multimedia all encoded in the same file. That was fairly efficient: the binary file is compact, and there’s only one file to copy per document when you want to move it around or share it with someone.
Read the full story in our Plus Newsletter (21.18.0, 2024-04-29).