Industry Leading
eDiscovery Insight

Learn from renowned eDiscovery thought leaders


Learn More

Understanding your eDiscovery Index and how it finds (or misses) evidence

How your eDiscovery platform parses and organizes your electronically stored evidence can be the difference between finding or missing that smoking gun. Or worse, unwittingly handing a smoking gun to opposing counsel. Pulling back the curtain on how an eDiscovery platform ingests electronically stored documents and makes the text within documents searchable reveals hidden places where evidence may be hiding. This article explains indexing and breaks down the types of search indexes used in eDiscovery software platforms, discusses the pros and cons of each, and offers solutions to ensure that you never miss crucial evidence.

Indexing occurs during the upload of your documents to your eDiscovery review platform. A number of processes run which separates and organizes your data. The text, in particular, is extracted from your documents and filtered into a database or index. When you enter a search query your software does not review each document searching for the word; that could take hours or days. Rather your software refers to the index (just as you would in a textbook) in order to quickly pull the relevant documents for your review. The process by which the text is extracted from your documents to be placed into that index is critical to the quality of search results.

There are 2 basic indexes used in eDiscovery software platforms, an OCR Index or a Text-based (also called Native extraction) Index.

OCR stands for Optical Character Recognition. In this process, your electronically stored documents could be originally scanned or saved from a native document through a virtual print driver. Specialty OCR software recognizes alpha-numeric text patterns. For example, a Word doc uploaded would be “printed” within the software engine and the text that appears on that virtual print would be lifted off the page and indexed.

Text-based Indexing is also called Native Extraction Indexing because instead of processing the document as a printed page it rather looks at all of the underlying code and data within a document. Where OCR sees the document as a print, Text-based indexing lifts the hood and extracts all of the computer-embedded text in a file and additionally will capture the data that you do not see, such as comments.

The pros of one indexing approach are the cons of the other and vice versa. Specifically, an OCR-based index may miss hidden fields, such as hidden columns on an Excel spreadsheet, while a text-based index would not. Conversely, a Native extraction-based index will not read (index) the text on an image, including scanned or PDF’d documents, where an OCR index will.

This is an example of a native PowerPoint document. When you receive this doc as a .ppt file an OCR-based index would create a virtual print of each slide and lift any text that appears on that print for indexing. The embedded images with text, like this chart titled “Load Growth Model”, would have all text that appears on the chart indexed. Speaker notes, however, like this one regarding “November Data”, could be missed as notes do not normally show on a print, by default.

Conversely, a native extraction-based index would only recognize the .jpg title of the image of the chart and index that file name as text. It cannot “read” an image (as OCR can) and so none of the text appearing on the chart would be indexed. It would, however, pick up the speaker notes regarding November Data. When you search for the company name “CAISO” an OCR-based Index would retrieve this document but a Native Extraction-based index would not. When you search for “November Data” the Native Index would retrieve this document, but an OCR index would miss it. If you were to perform a Boolean search for “CAISO AND November Data” neither index alone would return this document as responsive as it would only see one term or the other.

Some modern eDiscovery software providers will offer both indexes, however, they are siloed and so you would have to run your entire search twice, once through each index. This not only doubles your search time but still leaves you vulnerable to miss evidence when you are using Boolean searches to narrow results. Some eDiscovery vendors will instruct you to write additional language into your ESI order in an attempt to mitigate the loss of potential evidence. Unfortunately, the more complex an ESI request the more likely that mistakes will be made and evidence missed.

Lexbe has solved this false ‘index dilemma’ by creating the first concatenated eDiscovery search index, our Uber-Index℠. At ingestion, documents are run through both OCR and Native extraction indexing simultaneously. Then the OCR and Native-Extracted indices are compiled into one single, searchable database. All text is captured by these two complementary processes, and all evidence is searchable.

Additionally, Lexbe offers an integrated translation feature which is also included in our Uber Index for seamless search in either language. Whether you opt for Lexbe to perform your document translation or upload your own translated docs, our software will tie the original doc to the English translated one for integrated search and document review.

Finally, Lexbe also performs an advanced metadata extraction at ingestion for precision searches. Details such as the author of a document are extracted and will be searchable.

Features OCR Index Text-Based Index Lexbe Uber Index
Embedded Text
Charts
Budgets
Scanned Docs
Hidden Cells/Sheets
Comments
Tracked Changes
BCC Field
Meta-Data Extraction
Translated Text

With the Lexbe eDiscovery platform, your search is faster and more complete than with any other index on the market. For more information on how indexing works watch our webinar Best Practices to Avoid Missing Evidence in Large Document Reviews, part of the Lexbe eDiscovery Webinar Series.

Exploring FRCP Rule 37(e) and Avoiding Spoliation Sanctions

In a recent eDiscovery webinar, Avoiding Spoliation Sanctions in 2017 Under New FRCP Amendments, the Honorable Xavier Rodriguez spoke with Lexbe CEO, Gene Albert regarding the intricacies of Rule 37(e). Judge Rodriguez offered insight into how courts are interpreting Rule 37(e), and how the amendment has changed the landscape for attorneys with regard to data loss and sanctions.

Prior to the FRCP’s 2015 amendments to Rule 37(e) courts were inconsistent in how they imposed sanctions due to lost data. As a result, many attorneys, fearing consequences, opted for a “save it all” approach when a preservation letter landed in their laps. With the growth of data and the high cost of storage, the “save it all” approach became prohibitively expensive.

Given the expense of managing preservation combined with the inconsistent application of sanctions, the FRCP’s 2015 amendments to Rule 37(e) aimed to address the “excessive efforts and money” spent on preservation and also provide a framework in which to evaluate actual damages resulting from lost data.

In the webinar discussion, Judge Rodriguez explained that the FRCP advisory committee suggests that courts consider “proportionality” across the entire spectrum of Rule 37(e). The committee notes for the rule suggest that courts look at the parties’ technical sophistication, their resources, and the weight of the ESI to the claim or defense when considering the appropriate and proportionate remedy.

With proportionality in mind, Judge Rodriguez suggests that a court needs to ask three questions before determining whether there is cause for prejudice (see infographic).

In making this determination, a court will seek to determine how relevant the data loss is to the case and a proportional remedy to the party experiencing prejudice. Remedies could include requiring additional depositions at the spoliating party’s expense or the preclusion of evidence (preventing the spoliating party from entering evidence).

In the webinar, Judge Rodriguez emphasized that intent to deprive must be deliberate. Negligence, even gross negligence, does not necessarily meet the strict requirement of actual intent. If, however, intent is found, the court has three options from which to choose: (A) “presume that the lost information was unfavorable to the party;” (B) “instruct the jury that it may or must presume the information was unfavorable to the party;” or (C) “dismiss the action or enter a default judgment.”

For more information, watch the recorded webinar on-demand: Avoiding Spoliation Sanctions in 2017 Under New FRCP Amendments.

Litigation Finanace: Best Practices and Emerging opportunities

Legal finance plays an increasingly common role in high-stakes commercial litigation. No longer merely a tool of necessity for cash-strapped claimants, financing is used by sophisticated lawyers and their clients as a more efficient way to manage the cost and risk of both plaintiff and defense matters while also unlocking working capital for the firm or business.

This presentation draws on Burford Capital’s experience as the world’s largest provider of commercial litigation finance and its work with nearly 75% of the AmLaw 100, and provides a broad overview of key concepts as well as emerging trends and opportunities for firms and clients to leverage legal finance in ways that enhance success and fuel growth.

Key Points

  • Defining litigation finance
  • What’s driving the growth of litigation finance?
  • Benefits of litigation finance
  • How it works
  • Case studies and trends

About the Speaker

Sarah Lieber is a Vice President of Burford Capital’s underwriting and investment arm with a particular focus on originating investments and providing strategic guidance to leading law firms and global corporations on deploying Burford’s capital in ways that address their litigation and financial needs. Prior to joining Burford, Ms. Lieber served as Deputy General Counsel for CIFG Services, a monoline bond insurer, and practiced law at Jones Day. She received her JD from Fordham University School of Law.

Read More

Protecting eDiscovery Privilege, the Case Against File Sharing Sites

File sharing services, such as DropBox, have become increasingly used as eDiscovery repositories for incoming data and outgoing productions. With easy sharing, via a simple URL link, it’s understandable why these tools appear to offer an optimal solution for sending and receiving massive amounts of data as one does with eDiscovery litigation. Unfortunately, this “solution” can become a massive liability and we caution clients against using these services because it is simply too easy to accidentally share privileged documents. In fact, there are several cases in which information has been inadvertently shared and the results were disastrous for the offending party.

What’s the problem?

It is not that it can’t be done correctly, it is more that one is asking for problems with an open platform like this. With default settings in place, the “owner” of a file relinquishes control of the data within the file when shared with other users. Once shared, the data within the file can be copied, changed and shared without the owner’s permission. New users can be added to the file to view the data and, with seemingly unlimited “cooks in the kitchen,” it is too difficult to maintain chain of custody and ensure responsible sharing. A few specific issues with file sharing services include:

  1. Shared files and folders are not static. This is not the equivalent of sending a document attachment via email. The shared file or folder remains “live”, thus any future additions or changes can still be seen by people with the link into perpetuity.
  2. On many platforms, user groups are created and can be duplicated to other folders with a simple click. For example, if several users have access to “Case X Final Production” folder, another attorney could grant access to all users in that file to “Case X Notes”- not realizing that opposing counsel was part of the original group.
  3. The link is not automatically password protected so anyone with the link can view the file unless proper authentication measures are manually enabled. This literally means that without setting up a password, anyone on the internet could potentially access your file.

What have the courts said?

In Harleysville Ins. Co v. Holding Funeral Home, Inc., Case No. 1:15cv00057 (W. D. Va. February 9, 2017), an insurance company refused a funeral home’s fire damage claim after determining the fire was caused by arson. An investigator for the insurance company uploaded video taken at the scene to a platform sharing site, box.com. The investigator sent the link to the insurance company attorney who then shared it with the funeral home attorney in order to substantiate their arson claim. Later, however, the insurance investigator uploaded additional files to that same folder, which the funeral home attorneys still had access to. The court found that because the link and files within were not properly password protected the insurance company had, in essence, “left the files on the park bench” in a virtual sense and thus waived privilege.

From the court:

Whether a company chooses to use a new technology is a decision within that company’s control. If it chooses to use a new technology, however, it should be responsible for ensuring that its employees and agents understand how the technology works, and, more importantly, whether the technology allows unwanted access by others to its confidential information.

What does Lexbe Recommend?

We have developed the Lexbe eDiscovery Platform to include a number of checks against inadvertent disclosure of privileged docs. We create a secure encrypted link specific to each production that can then be safely shared. By insulating exports with secure production links, we help prevent user error that could result in sharing documents not meant for opposing counsel or outside parties.

Best Practices to Avoid Missing Key Evidence in Large Doc Review (Uber Index)

Nothing can be more disastrous than showing up at a depo and finding that your team missed a key piece of evidence in document review, but your opposition found it. Not finding that “smoking gun” admission, or allowing critical privileged information to slip through into a production can be avoided, but only if your search tools are fully functioning. Many attorneys assume that all eDiscovery processing approaches and search tools and techniques are basically the same, but nothing could be further from the truth. In this webinar learn crucial search functions and critical indexing differentiators that will protect you from inadvertently missing important evidence during eDiscovery.

Key Points

  • Overview Modern Search and eDiscovery Indexing Technologies
  • Basic and Advanced Search Options in Use Today
  • Search Indexing ‘Gotchas’
  • Pitfalls of Relying Solely on Image-Based OCR Indexing
  • Why Native Extraction Alone is Ineffective
  • Complexities of Working with Foreign Language and Translated Text
  • Optimal Seach with a Concatenated Indexing Approach
  • Takeaways

About the Speaker

Erin Derby is a Certified eDiscovery Specialist (ACEDS) and member of the Technical Services team with Lexbe LC. She specializes in working with clients handling eDiscovery in complex litigation and provides a high level of precision and expertise.

She provides guidance for technical discovery issues and procedures and ensures compliance with all court-ordered ESI guidelines. Prior to joining Lexbe, Ms. Derby was a Litigation Paralegal for 10 years for both plaintiff and defense law firms.

Read More

Latest Blog

Subscribe to LexNotes

LexNotes is our monthly newsletter of eDiscovery and legal document management and review tips and best practices.