Concept Search in Litigation Support
What is it?
'Concept search' in litigation refers to the search of electronic documents
on the basis of ideas they contain, rather than just specific keywords. Concept searching is usually implemented by broadening a
keyword-based search to include
synonyms or using a thesaurus to include results related to the ideas in the search
keywords,
even though not directly derived from the keyword search term.
Concept searching can be helpful in analyzing documents in a legal
proceeding because a search based on 'concepts' may include results from relevant documents that might otherwise be missed in
a standard
keyword search. Standard keyword searches will return a positive result only
if the exact keyword or a close derivative is specified. Search derivatives
returned by litigation support search engines commonly include 'stemming' and 'fuzzy searches'. Stemming includes
grammatical variations on a word, such that a search for "applied" would
also return "applying", "applies", and "apply". Fuzzy searches return
results even if the text to be searched is slightly
misspelled. Fuzzy
searches are helpful in returning a result even if the original text has been corrupted
thorough an optical character recognition (OCR) error, which is common
in scanned documents.
Why is Concept Searching Important?
Research has shown that simple keyword searches may not be sufficient to return
many potentially relevant documents. This is because lawyers and litigation teams may be unable, despite
their efforts and best
intentions, to think of all the search keyword terms that might result in relevant
documents.
A study conducted the 1980s casts doubt on the ability of litigation
teams to accurately determine a set of comprehensive search terms that
will return all or even most potentially relevant documents. Attorneys
and paralegals involved in a subway accident case used a keyword
methodology to search a discovery database consisting of 350,000 pages
in 40,000 documents. The litigation team believed that they had located
about at least 75% of the relevant documents. A separate manual review of documents was conducted
and found that the litigation team had only
identified 20% of potentially discoverable documents through keyword searching
alone.
Blair & Maron, An Evaluation of Retrieval Effectiveness for a
Full-Text Document Retrieval System, 28 Com. A.C.M. 289 (1985)
(publication of the Association for Computing Machinery). Many of the missed documents included terms
that the litigation team had not anticipated. Participants working
for the subway
in internal communications referred to “the unfortunate incident,” while victims referred
to it as the “disaster”. Relevant documents also included oblique references to
the “event,” “incident,” “situation,” “problem,” and “difficulty.” Many documents were missed that might have been
identified with a thorough manual search or more inclusive searching approaches, like concept searching.
Manual Review is No Panacea
The influential
Sedona
Conference has discussed these issues in
its recently released
Best Practices Commentary on the Use of Search and Information
Retrieval Methods in E-Discovery (August 2007). The Sedona
Conference authors note that while
most lawyers consider "manual review [to be] the gold standard by which all
searches should be measured," that in fact manual search is no panacea
and a manual review methodology inevitably results in its own errors of
missed documents. "Human review of documents in discovery is
expensive, time consuming, and error-prone. There is growing consensus
that the application of linguistic and mathematic-based content
analysis, embodied in new forms of search and retrieval technologies,
tools, techniques and process in support of the review function can
effectively reduce litigation cost, time, and error rates." While
keyword searching alone can miss many documents, so does manual review.
A mixed approach is often best with construction of better search
keywords through an iterative methodology of computer searches and then
manual review of returned documents to develop an expanded list of
keywords.
The Sedona Conference and Courts Support Concept Searching
The authors of the Sedona commentary also maintain that simple keyword searches can be substantially
improved by "using conceptual searching [emphasis added], which makes use
of taxonomies and ontologies assembled by linguists; and using other machine learning
and text mining tools that employ mathematical probabilities."
Concept searching has gained greater importance as the bench and
attorneys have begun to recognize that regular keyword or boolean
searches may miss much relevant evidence.
Concept searching is also beginning to get the attention of the bench.
Federal Judge Facciola noted in a recent case that "concept searching,
as opposed to keyword searching, is more efficient and more likely to produce the
most comprehensive results." Disability
Rights Council of Greater Wash. v. Wash. Metro. Area Transit Auth., 2007
WL 1585452 (D.D.C. June 1, 2007).
How Concept Searching is Implemented in Lexbe
Lexbe Online includes concept searching as an option in its web-based litigation
support application. If concept searching is not selected, then a search is
conducted as a normal keyword search, modified to include results obtained
through stemming and with fuzziness included to compensate for possible OCR scan
errors.
A user can apply concept searching by checking the 'Concept Search' box on the left
side of the search screen. (It is left unchecked by default.)
Once checked, the search will be expanded through reference to an extensive
lexical and semantic network of the English language. This includes reference to
synonyms, search term definitions and links between words other than synonym relationships
(e.g., antonyms, hyponyms).
For example, a search of the term 'contractual' without the 'Concept Search' checked
would return documents including the word 'contractual' and derivatives: 'contract',
'contracting', 'contracted', etc. (Minor misspellings would also be returned:
e.g. contractua1.) If the 'Concept Search' box were checked, the search would
be expanded to include terms like 'agreement', 'arrangement', 'compact', 'covenant', 'obligation', 'pledge', 'promise', 'understanding', as well as derivatives of these words. Concept searching can find documents that might otherwise be missed, but
use of it involves a tradeoff as it can lead to a number of false
positive results as well. Knowing the tools and tradeoffs
will help lawyers and litigation teams decide when and where concept
search can help benefit the review process.