Lexbe logo spacer Login
Lexbe Home Lexbe Online Electronic Discovery About Us Lexbe Home
Litigation Support Software and Technology

Redacting PDFs The Right Way
in Litigation Discovery Productions

PDF (portable document format) files have become very popular in litigation management.  The PDF file accurately represents a scanned page, but can be created from a conversion from other file types as well, such as Word, WordPerfect or Excel. The PDF file can also include text to make the file searchable.   PDF files appear deceptively simple, while including complicated functionality hidden to the casual user. 

But the complexity of the PDF has also created a trap for the unwary with regard to redacting documents for litigation productions.  There are correct and incorrect ways to redact PDFs.  Proper PDF redaction is possible with a variety of tools, but has become easier with the  newest version of Acrobat (Professional Version 8) and some third party programs.  This article discusses the problem of incorrect redaction and how to redact PDFs properly.

Many Advantages of PDFs
PDFs have many advantages over native files (e.g., Word, Excel, etc.), TIFFs ('tagged image file format') used by some litigation support software), and other file formats.  A PDF file can accurately represent a document originally created in another file format (like Word), but not require the original application for viewing.  Instead, the person viewing needs only have Adobe Acrobat Reader installed, which is free and ubiquitous. 

PDF files can also be created from scanned documents, and made searchable with optical character recognition (OCR) software.  It's also possible to include usage restrictions on PDFs, so that a password is required to open or modify a file, or printing can be prohibited.  Digital signatures can also be applied.  Because of these many advantages, PDFs have become very popular in litigation.  Many courts now accept or even require that fillings be made electronically as PDFs.

Background on Raster Images and TIFFs
Many older litigation software systems were originally designed before PDFs become popular and use an older image file type called TIFF (tagged image file format).  Adobe developed PDF as a more functional replacement for the TIFF and Adobe has been mostly successful in that regard.  However, TIFFs are still the workhorses of many popular, older litigation software systems.  These legacy systems usually support PDFs to a degree, but work best with TIFFs.  

TIFFs are known as a type of 'raster image', and can best be understood as a 'snapshot' of a document.  Like a snapshot, letters are made up of many small dots of ink that form letters (when compared with the white parts).  raster image litigationYou can see this by enlarging an image of a letter.    Raster images are used in litigation productions because they create an accurate representation of a scanned page.

Because a TIFF is a 'snapshot' only, it is not computer searchable.  Older litigation support software creates a separate text file (which is searchable) and associates the searchable text file in the software in what is known in the industry as a 'load file'). 

PDF files improved on the searchability of TIFs by including text and a raster image all in one file, known as a 'text under image' file or a 'searchable PDF'.  It is very convenient to be able to include text and image in one file.  The PDF looks like the original scanned document (because it is a 'snapshot' raster image), but also is searchable (because text is embedded in the file).
 

Redaction the Wrong Way: A 'Trap for the Unwary'
But before Acrobat 8 Professional and other third party tools, this functionality came at a price, as redaction of PDFs could be a trap for the unwary.  When you look at a PDF, you do not immediately know if it is an 'image only PDF' (created from a scan without applying OCR), a 'text under image PDF' (created from a scan with OCR applied) , or a text PDF created by converting from a native file format like Word or Excel.

This lead to a potential problem in which a user used the graphical tools in Acrobat to markup a document to make it appear that it had been redacted, when in fact it had not.  While a document would appear to have removed text to be redacted, in fact the text remained in the PDF file and was available to someone receiving the file.
 
Acrobat-Markup-Bar-Redaction Acrobat Professional (not the free version) has long included a number of tools to allow users to annotate documents.  For example, you can use a highlighter to mark over text or photos, or a design tool that allows you to place a rectangle over text or images in a document.  It is tempting to think that highlighting text in black, or placing a black rectangle over text is an effective way to redact.  Unfortunately this is not the case.  Highlighting in black or overlaying a black text box removes the text to be redacted from easy view, but does not remove it from the file.  The un-redacted text is still in the file and can be found in a number of manners.  Someone thinking they are properly redacting with these tools actually is providing an adversary with the entire document.

This probably has happened a lot.  In one documented case in 2002, the Washington Post released a handwritten note that had been scanned and portions redacted with Acrobat's black marker.1   In another, the New York Times released a CIA document and tried to redact the identities of agents with a black rectangle from Acrobat's markup up tools.2  Unfortunately, in both cases the redaction was done incorrectly and the text information remained in the file, even though visually it appeared to be gone.  Anyone could find the text by searching, copying and pasting or other text extraction processes.   

Redacting the Right Way 1: Low Tech
So, how should you properly redact PDFs?   First is the low tech method.  Print out the PDF document youLegal Redaction want to redact, mark through the areas you wish to redact with a black marker, and then rescan the redacted version to PDF.  This is pretty idiot-proof and works just fine if there are only a few documents to be redacted.  And in most cases, there are not that many documents that need to be redacted in a discovery production.  In most cases, out of the entire relevant document set, only a small percentage of documents must be redacted.  

Redacting the Right Way 2: Converting to TIFF and Back Again to PDF.
For users of Acrobat prior to Version 8 Professional, Adobe has published a recommended redaction procedure included in a technical note to address the issue of misuse of its drawing tools to create incorrect redactions. 3  This procedure involves the following steps:

  • Use the black rectangle tool or the black text highlighting tool to mark out the text that one wants to redact.
  • Save the market PDF to TIFF by using the File>Save As menu. This will create individual TIFF files for each page in the marked PDF.
  • Reassemble and create a new PDF file from the individual TIFF files.

This method of redaction removes all hidden text, metadata, etc., and creates a cleanly redacted document. The newly created document will not be searchable, as it will be an image-only PDF (based on the TIFF raster images. If searchability is desired, then the document can be made searchable in Acrobat Professional (Document>Recognize Text Using OCR), or with other OCR programs. This method of redaction is cumbersome and time-consuming, but simple enough in concept.

Redacting the Right Way 3: With Acrobat 8 Professional
Adobe added new specific redaction tools with Acrobat Professional 8.  With this newest version, a user can redact a PDF safely with a specialized tool palette within Acrobat, and not need to go through the PDF>TIFF>PDF procedure discussed above for versions 7 and earlier.

Acrobat Professional 8 adds a new redaction toolbar.  Redaction with Acrobat BarTo begin a redaction, you choose 'redaction'  from the shortcut menu to open the redaction toolbar.   Next you select the 'Mark for Redaction' tool. Searching for Redacted TextTo select text to redact, you move the pointer over the page and drag to select content for redaction. You select as little as a letter or word and as much as a page.  When you are finished selecting items, you click "Apply Redactions."  Adobe Acrobat Professional 8 removes the underlying text out of the document that you have open. You should save the redacted version as a new document so  you'll have two files -- the original and a redacted copy.
Redacted Block of Text
Acrobat Professional 8 also allows you to search the document for words or phrases to redact, and the search includes metadata, file attachments, hidden text, bookmarks and embedded search indexes.

When blocking out text for redaction, you can merely mark it black, or place a text message in the redacted text's place, which is helpful in litigation to show specifically where redaction has occurred.

Redacting the Right Way 4: Other Third Party Tools
Acrobat 8 Professional is not the only game in town when it comes to redacting PDFs. For example, Appligent offers a program called Redax that works with Acrobat Standard and Professional versions 6, 7 and 8.  A user can redact without upgrading to the latest Acrobat 8.

Have a Quality Control Procedure in Place
A good redaction procedure should also include quality control checks.  The errors discussed above might have been avoided if a quality control check had been in place.  The specific quality control procedure that is appropriate depends upon the type of redaction procedure used and the number of redacted documents to be reviewed in a quality control procedure.  One approach is to take the final redacted document, make it text searchable through OCR software, and then search for redacted text to make sure it is not still included in hidden text in the file.  Another approach is to save the redacted document as a text file and then look at the resulting text document for text remaining in the PDF. Of course a visual manual review may also be done as well to make sure redacted text is not visibly apparent.  For a large number of documents this may need to be done on a sampling rather than on every document. 

Conclusion
Redaction of PDFs has always been a tricky procedure.  Mistakes can be made if the user does not understand that the PDF file format is complex and often stores information even when not visible.  Fortunately, there are several right ways to redact a PDF file.  One way is the trusty black marker, but this is decidedly old school.  Users of Acrobat 7 or earlier can mark out or cover text using Acrobat's drawing tools, extract the document to TIFF and then re-convert the TIFF pages to PDF.  Users of Acrobat 8 Professional have a more sophisticated toolkit that allows PDFs to be properly redacted within the document, without conversion to TIFF.  And there are third party tools available as well.  Finally, any redaction procedure should include quality control of the redacted files to assure that the end product is as expected.   Use of a proper redaction technique will assure you that your confidential or privileged information remains in your control.



Footnotes
1. "Washington Post's scanned-to-PDF Sniper Letter More Revealing Than Intended: Posted version Allows Easy Removal of Blacked-Out Details", http://www.planetpdf.com/mainpage.asp?webpageid=2434 (Oct. 26, 2002).

2. "PDF Secrets Revealed: PDF File Redaction Snafu Exposes Agents' Identities", http://www.planetpdf.com/mainpage.asp?webpageid=808 (2000).

3. "Technical Note: Redaction of Confidential Information in Electronic Documents, How to Safely Remove Sensitive Information from Microsoft Word Documents and PDF Documents Using Adobe Acrobat, http://www.adobe.com/devnet/acrobat/pdfs/Redaction.pdf