Redacting PDFs The Right Way
in Litigation Discovery Productions
PDF (portable document format) files have become very popular in litigation management. The PDF file accurately represents a scanned page, but can be created from a conversion
from other file types as well, such as Word, WordPerfect or Excel. The PDF file
can also include text to make the file searchable. PDF files appear
deceptively simple, while including complicated functionality hidden to the
casual user.
But the complexity of the PDF
has also created a trap for the unwary with regard to redacting documents for
litigation productions. There are correct and incorrect ways to redact
PDFs. Proper PDF redaction is possible with a variety of tools,
but has become easier with the newest version of Acrobat
(Professional Version 8) and some third party programs. This article
discusses the problem of incorrect
redaction and how to redact PDFs properly.
Many Advantages of PDFs
PDFs have many advantages over native files (e.g., Word, Excel, etc.), TIFFs
('tagged image file format') used by some litigation support software), and
other file formats. A PDF file can accurately represent a document
originally created in another file format (like Word), but not require the
original application for viewing. Instead, the person viewing needs
only have Adobe Acrobat Reader installed, which is free and ubiquitous.
PDF files can also be created from scanned documents, and made searchable with
optical character recognition (OCR) software. It's also possible to include
usage restrictions on PDFs, so that a password is required to open or modify a
file, or printing can be prohibited. Digital signatures can also be
applied.
Because of these many advantages, PDFs have become very popular in
litigation. Many courts now accept or even require that fillings be
made electronically as PDFs.
Background on Raster Images and TIFFs
Many older litigation software systems were originally designed before PDFs
become popular and use an older image file type called TIFF (tagged image
file format). Adobe developed PDF as a more functional replacement for the
TIFF and Adobe has been mostly successful in that regard. However, TIFFs
are still the workhorses of many popular, older litigation software
systems. These legacy systems usually support PDFs to a degree, but work best with TIFFs.
TIFFs are known as a type of 'raster image', and can best be understood as a
'snapshot' of a document. Like a snapshot, letters are made up of many
small dots of ink that form letters (when compared with the white parts).
You can see this by enlarging an image of a letter.
Raster images are used in litigation productions because
they create an accurate representation of a scanned page.
Because a TIFF is a 'snapshot' only, it is not computer searchable.
Older litigation support software creates a separate text file (which is
searchable) and associates the searchable text file in the software in what
is known in the industry as a 'load file').
PDF files improved on the searchability of TIFs by including text and a raster image all in one file,
known as a 'text under image' file or a 'searchable PDF'. It is very
convenient to be able to include text and image in one file. The PDF
looks like the original scanned document (because it is a 'snapshot'
raster image), but also is searchable (because text is embedded in the
file).
Redaction the Wrong Way: A 'Trap for the Unwary'
But before Acrobat 8 Professional and other third party tools, this
functionality came at a price,
as redaction of PDFs could be a
trap for the unwary. When you look at a PDF, you do not immediately
know if it is an 'image only PDF' (created from a scan without applying OCR),
a 'text under image PDF' (created from a scan with OCR
applied) , or a text PDF created by converting from a native file format like Word
or Excel.
This lead to a potential problem in which a user used the graphical tools in
Acrobat to markup a document to make it appear that it had been redacted,
when in fact it had not. While a document would appear to have removed
text to be redacted, in fact the text remained in the PDF file and was
available to someone receiving the file.
Acrobat Professional (not the free version) has long included a number of tools to
allow users to annotate documents.
For example, you can use a
highlighter to mark over text or photos, or a design tool that allows you
to place a rectangle over text or images in a document. It is tempting
to think that highlighting text in black, or placing a black rectangle over
text is an effective way to redact. Unfortunately this is not the
case. Highlighting in black or overlaying a black text box removes the
text to be redacted from easy view, but does not remove it from the file.
The un-redacted text is still in the file and can be found in a number of
manners. Someone thinking they are
properly redacting with these tools actually is providing an adversary with
the entire document.
This probably has happened a lot. In one documented case in 2002, the
Washington Post released a
handwritten
note that had been scanned and portions redacted with Acrobat's black
marker.1 In another, the New York Times released a
CIA document and
tried to redact the identities of agents
with a black rectangle from
Acrobat's markup up tools.2 Unfortunately, in both cases the
redaction was done incorrectly and the text
information remained in the file, even though visually it appeared to be
gone. Anyone could find the text by searching, copying and pasting or
other text extraction processes.
Redacting the Right Way 1: Low Tech
So, how should you properly redact PDFs? First is the low
tech method. Print out the PDF document you
want to redact, mark through
the areas you wish to redact with a black marker, and then rescan the
redacted version to PDF. This is pretty idiot-proof and works just
fine if there are only a few documents to be redacted. And in most
cases, there are not that many documents that need to be redacted in a
discovery production.
In most cases, out of the entire relevant document set, only a small percentage
of documents must be redacted.
Redacting the Right Way 2: Converting to TIFF and Back Again to PDF.
For users of Acrobat prior to Version 8 Professional, Adobe has
published a
recommended redaction procedure included in a technical note to address the issue of misuse of its drawing tools to
create incorrect redactions. 3 This procedure involves
the following steps:
-
Use the black rectangle tool or the black text highlighting tool to
mark out the text that one wants to redact.
-
Save the market PDF to TIFF by using the File>Save As menu. This will create individual TIFF files for each page in the marked PDF.
-
Reassemble and create a new PDF file from the individual TIFF files.
This method of redaction removes
all hidden text, metadata, etc., and creates a cleanly redacted document. The newly created document will not be searchable, as it will be an image-only PDF
(based on the TIFF raster images. If
searchability is desired, then the document can be made searchable in Acrobat Professional (Document>Recognize Text Using OCR),
or with other OCR programs. This method of redaction is cumbersome and time-consuming, but simple enough in concept.
Redacting the Right Way 3: With Acrobat 8 Professional
Adobe added new specific redaction tools with Acrobat Professional 8.
With this newest version, a user can
redact a PDF safely with a specialized tool palette within Acrobat, and not need
to go through the PDF>TIFF>PDF procedure discussed above for
versions 7 and earlier.
Acrobat Professional 8 adds a new redaction toolbar.
To
begin a redaction, you choose 'redaction' from the shortcut menu to open the
redaction toolbar. Next you select the 'Mark for Redaction' tool.
To select text to redact, you move the pointer over the page and drag to select content for redaction. You
select as little as a letter or word and as much as a page.
When you are finished selecting items, you click "Apply Redactions."
Adobe Acrobat Professional 8 removes the underlying text out of the document that you
have open. You should save the redacted version as a new
document so you'll have two files -- the original and a redacted copy.

Acrobat Professional 8 also allows you to search the document for
words or phrases to redact, and the search includes metadata,
file attachments, hidden text, bookmarks and embedded search
indexes.
When blocking out text for redaction, you can merely mark it
black, or place a text message in the redacted text's place,
which is helpful in litigation to show specifically where
redaction has occurred.
Redacting the Right Way 4: Other Third Party Tools
Acrobat 8 Professional is not the only game in town when it comes to
redacting PDFs. For example, Appligent offers a program called
Redax that works with Acrobat Standard and Professional
versions 6, 7 and 8. A user can redact without upgrading to the latest
Acrobat 8.
Have a Quality Control Procedure in Place
A good redaction procedure should also include quality control
checks. The errors discussed above might have been avoided if a
quality control check had been in place. The specific quality
control procedure that is appropriate depends upon the type of redaction
procedure used and the number of redacted documents to be reviewed in a
quality control procedure. One approach is to take the final
redacted document, make it text searchable through OCR software, and
then search for redacted text to make sure it is not still included in
hidden text in the file. Another approach is to save the redacted
document as a text file and then look at the resulting text document for
text remaining in the PDF. Of course a visual manual review may also be done as
well to make sure redacted text is not visibly apparent. For a
large number of documents this may need to be done on a sampling rather
than on every document.
Conclusion
Redaction of PDFs has always been a tricky procedure.
Mistakes can be made if the user does not understand that the PDF file
format is complex and often stores information even when not visible. Fortunately,
there are several right ways to redact a PDF file. One way is the
trusty black marker, but this is decidedly old school. Users of
Acrobat 7 or earlier can mark out or cover text using Acrobat's drawing
tools, extract the document to TIFF and then re-convert the TIFF pages to
PDF. Users of Acrobat 8 Professional have a more sophisticated
toolkit that allows PDFs to be properly redacted within the document,
without conversion to TIFF. And there are third party tools available
as well. Finally, any redaction procedure should include quality
control of the redacted files to assure that the end product is as
expected. Use of a proper redaction technique will assure you that your
confidential or privileged information remains in your control.
Footnotes
1. "Washington Post's scanned-to-PDF Sniper Letter More Revealing Than Intended:
Posted version Allows Easy Removal of Blacked-Out Details", http://www.planetpdf.com/mainpage.asp?webpageid=2434 (Oct. 26, 2002).
2. "PDF Secrets Revealed: PDF File Redaction Snafu Exposes Agents' Identities", http://www.planetpdf.com/mainpage.asp?webpageid=808 (2000).
3. "Technical Note: Redaction of Confidential Information in Electronic Documents, How to
Safely Remove Sensitive Information from Microsoft Word Documents and PDF Documents Using Adobe Acrobat, http://www.adobe.com/devnet/acrobat/pdfs/Redaction.pdf