In this issue OFFICE: How to dig into Excel files using the Office XML file format Additional articles in the PLUS issue PUBLIC DEFENDER: Crooks can take over your video doorbell by pushing a button ON SECURITY: Debug your browser OFFICE How to dig into Excel files using the Office XML file format
By Mary Branscombe • Comment about this article The tools built into Excel assume colleagues might make honest mistakes. If you suspect something more nefarious, look in the XML of the file for clues. Last time, I looked at what’s inside an Office document. In essence, it is a package of different files that contain both the content and the formatting of your document, kept together in what is effectively a ZIP file. For Excel, this file collection also includes a lot of information about how a spreadsheet was put together. If you need to audit a critical spreadsheet to see whether you can rely on it, being able to see things such as the order in which the data was entered can tell you a lot about whether it has been edited normally or might have been deliberately manipulated in ways that mean you can’t trust the figures. Change tracking tools in Excel
The usual way to see how a document has changed is to turn on Track Changes, but you can do that only for a shared spreadsheet that multiple people edit simultaneously. If you are using an older version of Excel, Track Changes is in the Review tab of the ribbon; after Excel 2016, it’s been hidden in favor of encouraging users to save the document to OneDrive for co-authoring.
Someone who’s planning to improperly manipulate data in a spreadsheet isn’t likely to help you spot those changes by turning on Track Changes in the first place, but someone who has made an honest mistake entering data might have done so. It’s worth checking that first. If everyone is using Excel as part of Microsoft 365, then you will see Show Changes on the Review tab, which will show you who made what changes to cell values or formulas and when. That can be a big help in keeping track of what’s going on in a spreadsheet that different people are working on. However, it covers only the previous 60 days, so it doesn’t help for older files that you need to examine. (The change history will be stripped if the file author uses Save As. You can also remove the change history using File | Info | Reset Changes Pane in the Web version of Excel.)
If the file is saved on OneDrive or SharePoint, you can click on the file name in the title bar and choose Version History to see previous versions of the files and what changed in each one. You can use Show Changes to navigate those, but again, you must have access to the original file in its original location.
If you just need to understand how the formulas in a spreadsheet work, look in the Formula Auditing group on the Formulas tab and pick Trace Precedents or Trace Dependents. Excel will draw arrows that show you which cells are used for which calculations. Use Evaluate Formula to step through a complex formula, one calculation at a time.
If you’re writing a complex formula and want to leave a reminder about how it works or why it’s there, use the N function to add a “comment” inside the formula itself. N is intended to turn text into numbers. However, because any text that isn’t actually a number is evaluated as zero, adding +N (“this is how the formula works”) to the formula won’t change the calculation. Excavating Excel XML
When you can’t use Excel’ built-in tools to find out how a spreadsheet has been edited, it’s time to try cracking open the XML file it’s stored in to see what that can tell you. Follow the instructions in my previous article, Understanding Office document formats, to rename and open the file you’re interested in. You’ll find several XML files that can give you more clues about how the workbook has been edited. Inside the xl folder is a file called calcChain.xml, which tracks the order in which formulas in the spreadsheet need to be calculated in — that is, the “calculation chain.” That’s important for performance because one formula in a spreadsheet will often depend on the result of another formula. If Excel knows which formula to work out first, it doesn’t waste time by working out a sum and then having to update it after it calculates the next sum. It keeps track of the dependencies for each formula and the cells they rely on, which is where it gets the data for putting the arrows on screen, as shown in Figure 4. The details of the file may appear impenetrable, but they act like a log of how formulas have been added to the worksheet, starting at the bottom of the file. You can use these details to work out whether the cell or a row that contains a formula has been moved — and where it came from, because while calcChain.xml will show the new cell location, the calculation order won’t change. So if you see a long sequence of cells in the calculation chain, with one or two that are out of order (for example, a list such as D10, D9, D8, D7, D6, D5, D4, D11, D2), then it’s a fair bet that the data now in D11 used to be in D3. Accidentally or purposely, something that was close to the top of the list is now at the bottom. You can see this in action in a study about honesty (of all things) that’s still the subject of an on-going lawsuit. Here, the calculation chain suggests that data in the study moved from one cell to another in a way that would affect the outcome of the study. Changes won’t always be that obvious and neat, but if you see a list of cells that jump from D to C to B to E and back to B the way they do in the image, you can tell that someone has been shuffling formulas around on the worksheet. Maybe they decided it made more sense to have the figures in a different order, and you just need to check they’re all still the right figures. Or maybe you need to go back and look at the original sources.
Also in the xl folder, the sharedStrings.xml file has a list of all values in the cells of your spreadsheet, in the order in which they were entered. They’re saved here because, if you have values that get used over and over again (such as the names of the months, business departments, or expense categories), they need to be stored only once, thus saving space. It’s not a complete list of all the data that’s ever been in the spreadsheet. If a cell is deleted or the content is changed, the entry in sharedStrings will be deleted or changed as well. But what it does do is give you a timeline for how and when the spreadsheet was created and edited. You can tell whether cells that are now in alphabetical order were entered more haphazardly, which means they’ve been sorted or moved. Either process could introduce an error, such as leaving a number next to the wrong label if someone forgot to select all the cells in a row.
And if you’re looking for who made a mistake (or deliberately fudged some figures), seeing the order in which cells were entered or edited might let you spot who was entering data just before or after the change you’re interested in. It’s much more likely that mistakes in a spreadsheet are just mistakes, as opposed to deliberate attempts to fool you. But if the figures just don’t seem to add up, a few minutes spelunking around inside the Excel XML might let you work out what’s happened and how to get the right figures back. And perhaps tell you which of your coworkers might need a bit more training in how to work with data.
Mary Branscombe has been a technology journalist for nearly three decades, writing for a wide range of publications. She’s been using OneNote since the very first beta was announced — when, in her enthusiasm, she trapped the creator of the software in a corner.
The AskWoody Newsletters are published by AskWoody Tech LLC, Fresno, CA USA.
Your subscription:
Microsoft and Windows are registered trademarks of Microsoft Corporation. AskWoody, AskWoody.com, Windows Secrets Newsletter, WindowsSecrets.com, WinFind, Windows Gizmos, Security Baseline, Perimeter Scan, Wacky Web Week, the Windows Secrets Logo Design (W, S or road, and Star), and the slogan Everything Microsoft Forgot to Mention all are trademarks and service marks of AskWoody Tech LLC. All other marks are the trademarks or service marks of their respective owners. Copyright ©2024 AskWoody Tech LLC. All rights reserved. |