How to Salvage Text Data from Damaged DOCX Files

Even though there are now so many Microsoft Office alternatives such as LibreOffice, it continues to stay as the most desired office suite in the world. The reasons are obvious – it offers an excellent set of applications that work on various platforms, comes with a plethora of practical features and provides both offline and online support. Then there are tons of books, certificate courses, digital media and other material available for learning Microsoft Office.

If you also have been using Microsoft Office and your Microsoft Document DOCX have become corrupt then there is an easy way to extract the text data from these files. There are two methods for this:

1. Manual Method:

  1. Install 7-Zip on your system. You can download 7-Zip from https://www.7-zip.org/.
  2. Right-click on the corrupt DOCX file and select 7-ZipExtract to folder. Instead of folder, it will display the name of the file as the new folder. For example, in case of “sample.docx”, we extract it to a folder named “sample”.Extract Text Data from DOCX
  3. Inside the folder where the files have been extracted from the DOCX file, find document.xml located inside word sub-folder and rename it to have HTML extension. Extract Text Data from DOCX
  4. Now you can open this HTML file inside any web browser to see the text data.

2. Using DOCX Salvager application

  1. Download Corrupt DOCX Salvager application from https://sourceforge.net/projects/damageddocx2txt/.
  2. Install Corrupt DOCX Salvager on your system and then launch it.
  3. In the Corrupt DOCX Salvager window, select FileOpen and select the corrupt DOCX file.Corrupt DOCX Salvager
  4. It will take some time to analyze it and then display the text in its window. You can save this text in a plain text file.

The Corrupt DOCX Salvager application also uses the same method to salvage any remaining text from a corrupt DOCX file as the first method which is done manually. Its installation package comes with a command line version of 7-Zip which is used internally by this application. You should use one of these methods that you find easier.