View ‘A Lawyer’s Guide to eDiscovery Processing’ OnDemand
A Lawyer’s Guide to eDiscovery Processing: What You Should Know to Competently Handle Your Case
With the explosion of electronic documents and electronic stored information in modern complex litigation, the how and why of eDiscovery processing has become increasingly important. When done effectively, large document collections can be defensibly reduced, made fully searchable, and normalized to an easily reviewable format to speed and ease document review.
The nuts and bolts of how this is done can make a difference so litigators should be knowledgeable of at least the basics of processing terminology and concepts. Arcane terms like deNISTing, hashing, culling, email family associations, metadata extraction and fielding, control numbering and dual-indexing be confusing until the underlying concepts and processes are explained. The quality of eDiscovery processing can be critical in accessing the adequacy of discovery and productions done by either side. This educational webinar will explain what to expect when you order or evaluate eDiscovery processing and how to best utilize in your complex litigation cases.
To view the webinar, fill out the form at right and you will be redirected.
This Webinar is part of our eDiscovery Webinar Series, with new installments monthly which cover a variety of eDiscovery concepts and Best Practices. You can view any of our past Webinars OnDemand.
Follow Lexbe on LinkedIN to get notice of future installments of the eDiscovery Webinar Series, as well as access to our library of past presentations (video, MP3 and PDF).
Request a complimentary eDiscovery expert consultation, a demo, a free trial or a price quote here.
6. | Growth of Volume of ESI | Goals of eDiscovery Processing | Processing in Overall EDRM Workflow | Scalable Processing | Processing Steps Overview | Processing Steps Details | QC and Security | Review Agenda
7. Voip Email iPhones Peer-to-Peer Online Storage Digital Cameras Facebook | LinkedIn DropBox | Backup Devices Elastic Storage | SaaS | Google Streets Personal Blogs | Skype | World Satellite Images Personal Scanners | Customer Service Recordings Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers | PaaS Digital Information Created, Captured, Replicated Worldwide Zettabytes* 4 321200520102015 Source: IDC Digital Universe Study (2012) ESI = Electronic Stored Information *1 Zettabyte =1 Trillion Gigabytes Exponential ESI Growth
8. GBs of ESI in a Typical Commercial Case Low High19952000200520102015 Enron Criminal Trial (2005) ○ Source ESI:100M pages (~4 TBs) ○ Brought to Trial:1M pages (~40 GBs) ○ Extraordinary at time ○ Not now Microsoft (2011) ○ Microsoft collects 45 custodians per matter average (2011) ○ Almost1 TB per matter, average Average Case Size & Collection Rising
9. | Extract metadata & support data reduction (deNIST, culling, dedup) | Standardize all documents into a review format & allow load to a review platform | Comply with production requirements or other guidelines for standard production formats and loadfiles | Create high quality search indexes & support data analysis | Legally defensible processes | Fast & cost-effective eDiscovery Processing Goals
10. Processing in the EDRM
11. | Data volume moves from larger to smaller and less relevant to more relevant along EDRM model as irrelevant data is removed | Processing is needed during the review and analysis phase to promote more precise data reduction, speed review, and to support legally defensible practices | Review in Native, Near-native, HTML, PDF or TIFF; Choice driven by review platform capabilities | Processing facilitates analysis of data and formulation of review strategy | Information gleaned by processing and analysis of data allows for a faster, more efficient, review which saves time and controls cost Processing in the EDRM
12. ESI ID & Collection EDA & Culling/Filtering Review & Production Use (Depos, Motions, Trial) High Volume | Low Relevance Low Volume | High Relevance Time Early Efforts Here Result in Improved Quality and Reduced Costs Here Manage the ESI Data Funnel
13. Quality — Processing must deliver usable and complete outputs to support accuracy and efficiency in data reduction and review, and to support the subsequent stages of discovery. Speed and Scalability — Available capacity needs to meet demand. The faster collections can begin and finish processing the sooner review can begin. Budget — eDiscovery processing expenditure should be predictable and within budget, and should result in data reduction and review efficiency that cuts overall project costs. Integration — Output data should move smoothly into ECA and litigation review platforms to avoid additional time delays and expenses. Processing Requirements Overview
14. ○ Available on demand, as needed, with no costly set up or wait times ○ ESI collections can be broken into smaller pieces and processed simultaneously in parallel server environments ○ Scalable, proprietary architecture allows for instant access to near unlimited computing power. ○ This means faster processing, hours and days vs. weeks. Incoming ESI Reviewable Documents Scalable Processing Engine
15. | Archive/Container Expansion | File Repair | Metadata extraction & fielding | MD5 hash code generation | System file identification & DeNIST | Deduplication Processing Key Steps / Features | Email attachment extraction & parent email association | Time Zone Offset for Emails | Custodian Assignment | Native text extraction | OCR of images | Indexing of extracted & OCRed text
16. | What are Hash Values ○ An MD5 hash (also SHA) is a128-bit number that (like a fingerprint) uniquely identifies an electronic file. ○ MD5 example: 417BCBDG845179C10D9BBD1C23294198 | How are Hash Values Used ○ Chain of Custody and authentication ○ DeNIST ID and removal ○ Exact Duplicate ID for deduplication | Special Issues with Emails ○ MS Outlook email container files (PSTs) change the included email MSG files every time separated. ○ So standard hashing does not work (different numbers) ○ Instead Hash Values are created from Metadata strings, which are more stable through transmission. Hash Values Used in eDiscovery
17. Metadata Use in Processing Metadata is field-level file information used in review and often delivered with a production as part of a load file (Concordance/Relativity DAT, Summation DII and Lexbe XLSX) Metadata Field Name Type Use in Review Date/Time Sent, Date/Time Received Email Show when emails sent and received Sender, Recipients, CC and Bcc Email Show who sent and received emails Doc Source, File Path, Custodian Email & Native Custodian and chain of custody Date Last Modified Native Usually best date field for files collected normally (without a forensic collection) File Extension / File Type Email & Native Show type and quantity of ESI produced For a full listing of standard loadfiles: http://lexbe.com/support/technical-resources/ Key Metadata Fields & Use
18. Fielding Metadata Email Metadata (sender, receiver, date, time, subject, etc.) is extracted and then fielded to the litigation database for review Email Metadata in Outlook Header In Outlook Email Fielded to the Litigation Database
19. | With a Dual-Index approach the search engine indexes both text extracted from Native files (email, attachments, spreadsheets, etc.) and imaged file OCR text (TIFF, JPG or PDF). | Most comprehensive approach minimizes potential for lost and unsearchable data, finds more privileged documents, more PII, and improves the accuracy and quality of culling. Index Method Captures Embedded Text Captures Text Excluded From Print Captures Hidden Text Imaged/OCR Yes No No Native Extraction No Yes Yes Dual Index Yes Yes Yes Benefits of Dual Index Approach Benefits of Dual Indexing
20. Benefits of Dual Indexing HTML Text OCR Text Both searchable
21. Purpose Defensibly remove files from process that are unlikely to lead to responsive documents Culling Processes | DeNIST, deduplication | Filter by file type & date | Keyword filtering | Linear vs. dynamic culling Issues Keyword selection & testing, concept searching, process documentation, repeatability, culled file retention Reduction ESI may reduce 95% at this stage from raw data size Reduce Docs with Culling
22. | Why Duplicates Exist ○ Collection from multiple stores (e.g. Outlook and Gmail) ○ Duplicates between custodians ○ Email attachments ○ Email chains (near duplicates) | Types of Duplicates ○ Exact Duplicates ○ Near Duplicates | Role of Hash Values ○ Separates exact dups from near dups | Vertical vs. Horizontal Dedup ○ Within custodian vs. across custodians Deduplication
23. Deduplication Custodian1 Custodian2 Custodian1 Custodian2 | Horizontal dedupe ○ dedupes across all custodians ○ May result in more data reduction ○ Can cause data holes and production gaps | Vertical dedupe ○ dedupes only within custodian ○ May result in less data reduction ○ Unlikely to cause data holes or production gaps
24. | Control numbers/Bates Stamping | PDF & TIFF creation | Placeholder creation | Native extracted, PDF and TIFF loadfile generation in multiple formats: Processing Key Steps / Features ○ XLSX (Lexbe) ○ DAT/OPT (Case Logistix, Concordance, iPro Allegro, Ringtail, Relativity) ○ DII (Summation)
25. Latest security technologies and best practices: ○ Encryption Data encrypted (256-bit or above) in-place and in-transit. ○ Data Center Certifications U.S. data centers are certified and follow industry best standards, etc. ○ Clear Ownership Rights Service agreements clearly acknowledge client data ownership. ○ Redundant Back-Ups; Recovery Robust and redundant backup & recovery protocols. Security
26. Processing Workflow
27. | Self-administration | Native (Office, etc.) processing | Automatic OCR | Early case analysis | Dual-index search | Exact & near-dup ID | Doc Review & issue tagging | Blended productions | Transcript management | Timelining, depo prep | Dispositive motions | Trial document management Web-based litigation document management software FEATURES Post-Processing Document Review
28. | Processing is an essential element of discovery and the foundation of high quality reviews | Processing procedures must be legally defensible and result in outputs which comply with the requirements for your case | Scalable processing solutions are crucial for meeting deadlines and controlling costs | Processing done right should include a number of basic steps such as metadata extraction and fielding, MD5 hashing, and dual-indexing in order to facilitate fast, accurate, and defensible data reduction and review | Processing done right saves time and money in the overall eDiscovery process by facilitating data reduction and review efficiency
Submit to access immediately