e-Discovery Documents Production Formats:
Native, TIFF and PDF
How should attorneys address electronic files now as
part of litigation discovery?
The recent
e-discovery
rule amendments
to the Federal Rules of Civil Procedure are altering
how attorneys conduct litigation, now requiring that parties routinely
retain and, if requested, exchange relevant electronic documents files.
Similar rules have been enacted by many states. These new rules reflect the reality that most
documents created in the last several years exist in electronic form (some estimate 95% or
more). e-Discovery will increasingly be discovery.
An 'electronic search' approach to discovery
requires that all documents be converted to an electronically
searchable form and that a method of searching across all files
is
available. For electronic documents delivered in
native file format, search is usually possible in some form or another. This is
particularly true for standard Microsoft Office documents. Email
presents more difficulties, as email attachments may need to be deconstructed
from the electronic file holding the email to be searched.
Paper-based documents must scanned and OCRed
to make them searchable as electronic files. The OCR process
inevitably introduces OCR errors, which diminishes the effectiveness
of the electronic search, as compared with the search of native
files or electronic documents based on native files.
A 'electronic search' approach also requires that all documents
are addressable as a collection from a single search query. A number of
systems are in use today by
litigators. At the high end,
litigation document repositories may be established to make all
documents accessible and searchable, often between multiple parties
in different locations. These systems may be
comprehensive and expensive. Alternatively, a law firm may make documents
searchable from a file server on its local area network, or run
LAN-based
case management software, which may allow for indexing and searching
of litigation files. For a small
case, all documents might be stored on a single CD or DVD, or kept
on a portable hard drive, and searched from the Windows operating
system.
Past Practices in Question
First, several practices that may have
worked in the past, but may now be called into question, should be
addressed.
-
Paper Only Production. In the
past, electronic documents were printed and delivered in paper form.
Printed discovery alone, however, does not produce the associated metadata
for a document. Also, paper based files can be electronically searched only if
scanned and OCRed. This adds
to the cost of reviewing electronic documents received in discovery, reduces
accuracy because of OCR errors and does not produce associated metadata. This procedure
should not be acceptable anymore, unless
the parties agree. As litigators learn the importance of metadata,
this approach will become increasingly unsatisfactory.
-
Print and Rescan to PDF or TIFF.
PDF
is a modern format developed by Adobe Systems and is potentially a good
choice for discovery of
electronic files.
If created directly from the native file, the original underlying
text is retained, making the file searchable in PDF without needing to
perform OCR. However, some litigants in the past have delivered PDFs
or TIFFs of electronic files, not as a result of a straight file conversion
(which retains the underlying text and perhaps some of the metadata), but instead
by printing all the documents into paper form, and then
rescanning the documents and saving them as electronic documents in a TIFF
or PDF file formats. While electronic documents are so created, this method
of production strips out the text and the metadata from the original
electronic file.
Current
Practices of How Documents Should Be Handled
Attorneys are now taking several approaches to
e-Discovery when searchability or metadata are important. Each
approach has its own advantages and disadvantages.
-
TIFFs. A common practice has been to take electronic files and save them as imaged based electronic files known as
TIFFs
or TIFs. TIFFs are electronic files, but as
raster images,
they are like a picture of the electronic document, and no text or metadata
is retained as part of the file.
Advantages of TIFF Format Production
Ease of Bates Numbering.
Bates Stamping
is used to identify which documents have been produced, particular documents
and pages in connection with witness examinations, and which documents have
been withheld for privilege. TIFFs can be single or multi-paged.
Historically, litigation support vendors have often scanned paper documents,
or converted electronic documents into single-paged or multi-paged TIFFs, with each file name being the Bates Number
or Bates Number Range. Each individual page in a production would have
its own Bates Number.
-
Easy of
Redaction. Documents sometimes need to be partially
redacted
to remove references to privileged information, work product or trade secret
information, identify which documents have been produced, particular
documents and pages in connection with witness examination, and which
documents have been withheld for privilege. As a raster image, TIFF
files are relatively easy to redact, as compared with native files or PDF
files. However the recent release of Acrobat Professional 8 with a
built in PDF redaction tool has lessened this advantage of TIFF files.
-
Requirements of
Legacy Litigation Support Systems. Several legacy litigation
support management systems work best or exclusively with TIFF files because
these systems were designed when TIFF files were the only viable option.
These systems predate the development and popularity of PDF and native file
review tools.
Disadvantages of TIFF File Productions
TIFF-based productions are still very popular, TIFF productions have a
number of disadvantages compared with PDF and native productions.
-
Complex Load Files. Because TIFF files
are raster images, they do not retain computer readable text as part of
the file
-
Not Very Usable Outside of Legacy Systems.
Because of the complexities of the TIFF load file, these files are not
very accessible or usable outside of the legacy litigation management
systems for which they were designed.
-
Metadata Not Retained in TIFFs.
Metadata is not retained as part of a TIFF conversion. To address
this shortcoming, many e-Discovery providers now separately save file
metadata in a database prior to a TIFF conversion.
-
Cost of TIFF Conversion and Load File Creation.
Because of the shortcomings above, a TIFF production requires that the
producing party pay to convert electronic files to TIFF images and
create the associated text load file so that TIFF-based litigation
management systems can read it. This can be very expensive in
large productions.
-
Production in Converted PDF files. A
more modern approach is to convert electronic files to searchable PDF files for a discovery production. PDF files overcome many of the limitations of working with native files.
Indeed, Adobe created both the TIFF and PDF formats and designed PDF as a more
functional replacement for the TIFF. PDFs have become ubiquitous in
business and in law.
Advantages of Converted PDF File Production
-
Viewable in Adobe Acrobat. Files are searchable and easy to work with.
Anyone with Adobe Acrobat can view a file without the need to worry about having the right
application program or viewer installed.
-
Bates Stamping.
Documents can be bates-stamped and pages specifically identified using a
variety of software tools.
-
Redaction. Pages or
specific passages can be redacted with Adobe's latest version 8 of its Acrobat Professional program.
-
Some Metadata Retained.
A PDF conversion can be set up to retain some of the metadata and then it
can be viewed reviewing certain properties in the PDF file.
Retention of metadata in a PDF file is not automatic, and is dependent on
the conversion software used and settlings used in the conversion process.
Disadvantages of Converted PDF File Production
-
Conversion Cost.
As with TIFF files, conversion of electronic files to PDF requires expenditures, as compared with simply
delivering native file format.
-
Not all Metadata Available.
A standard PDF conversion only captures some of the available metadata.
Information such as the document author and title typically may be captured.
The document creation date may be changed to the date the PDF is created.
Other key metadata, such as last save, last print, edit time, deletions,
comments and hidden text usually are not captured in the PDF copy.
-
Production in Native File Production. Some practitioners
pursue discovery in
native file format,
the original file format in which the electronic file was produced, such as the
Word, Excel or Outlook. This has become more popular since the the new
federal e-Discovery Amendments as it provides the requesting party greater
leeway in requesting and files in native format.
Advantages of Native File Production
-
No Conversion Expense.
Unlike TIFF or PDF productions, there is no conversion expense in delivering
files in native format.
-
All Metadata Retained.
All file metadata is can be retained in a native production.
-
Text Searchable. Text
is usually searchable the best in native format. There is no chance of
text being lost or corrupted in a file conversion to PDF, or a TIFF load
file, or the introduction of OCR errors.
-
Some Documents Don't Display Well
in other Formats. Native may be the only practicable format for
some file formats, such as spreadsheets. Excel and other spreadsheet
files are notorious for converting poorly to TIFF or PDF, often becoming
unintelligible. Plus, spreadsheet formulas, hidden cells and hidden
text usually do not make the conversion to other formats.
Disadvantages of Native File Format Production
-
Difficulty of Pre-Release Review of Metadata.
Metadata, by design, are not easy to review in native file format. Some
metadata in Office files can been found by clicking through various property
screens, but this is time-consuming, requires a consistent methodology to
view all viewable metadata, and end the end does not access all available metadata
available in the file. Newer litigation management systems will
display metadata of native files.
-
Difficulty in Bates Stamping at the Page Level.
Documents in native file format cannot be easily Bates-stamped, and any Bate
stamping will change the metadata. Often Bates stamping of native
files is handled instead through a file naming convention, in which the file
name is modified to include a Bates designation. This can work well,
but does not allow for page-level identification.
-
Inability to Easily Redact. Documents produced in
native file format cannot be easily redacted. For this reason, in a
native production, documents that need to be redacted are often handled in a
different manner, such as converting redacted documents to another format
that can be redacted, such as PDF.
-
Difficulty of Pre-Release Review.
Attorneys for the party producing electronic files must review the files to
see if they are responsive to the discovery request or include privileged
information or trade secrets. This can be
difficult as electronic files may have been created in multiple
applications. Modern litigation support applications allow most native
file formats to be reviewed without installing the applications that created
the file. Plus, modern litigation support applications allow metadata
of native files to be reviewed in an easy fashion.
Conclusion
Advances in technology are reshaping how litigation discovery is handled.
Use and availability of electronic documents is changing
how discovery is done, with an increasing emphasis on search.
Additionally, metadata availability in electronic files requires that litigators
find effective tools to review and analyze this new source of information. New
discovery rules reflect the reality of available technology. Prior
paper-based approaches are ineffective and becoming outmoded.
The best file format for discovery will usually turn on
how the attorneys and litigation team staff plan to review the files.
Document management systems usually are optimal for files in certain
formats. Plus, consideration should be given on how Bates numbering
and redaction will be handled.