What is the Future of E-Discovery?
Article, Barrister Magazine – January 2009
Legal / Disputes
James Stanbury, Partner poses the question, What is the future for litigation technology?
2050; a plain room in a Nottinghamshire farmhouse overlooking the sea. A middle-aged man walks in and sits at the UltraWood™ desk and waves his hand through a sensor field above the desk. The field detects and authenticates the e-Chip™ embedded in his hand and verbally identifies and welcomes – J. to his iChambers™. Simultaneously, multiple beadlike projectors mounted within the SynthStone™ walls beam into life a three dimensional scene and the room becomes a wood panelled court room. The window shows a view of the, long since demolished, Royal Courts of Justice, the shining sun still its normal shape. The Judge dons his iWig™ which immediately starts communicating with the central nervous system terminals in his neck and thus stimulating his corneal implants. The court room now becomes populated as the AI system displays the avatars of the jury, barristers and other attendees. A level toned but disembodied voice speaks, “All rise…”
A fanciful view of the future of court proceedings perhaps but, as computer technology advances, what are the implications for a problem apparently both caused by and being solved by technology: that of e-discovery?
The rise of e-discovery and electronically stored information
The rise of e-discovery as a jargon term on the lips of almost all litigation lawyers has its roots in the expression ‘paperless office’. The phrase, which came to prominence around 30 years ago, was the dream of office managers and technologists where they could envisage the new computer technology completely replacing paper within years. However, any litigator will tell you it was a naught but a pipe-dream, in time-scale at least, as they still have to review rooms full of archive boxes to discover the relevant parts. Sometimes, the only recourse was to throw staff and hundreds of chargeable hours at the problem.
However, the eventual proliferation of the microprocessor and electronic storage media has lead to a gentle shift in the balance of power between paper and electronically stored information (ESI). While the concept of the paperless office is still not wholly based in reality—just about any established company that proudly boasts paperless systems will still have large numbers of printed documents in a deep warehouse archive—now the focus of the litigator’s problem has largely shifted to the new virtual world of such things as email archives, CRM systems, transactional databases, shared network folders, backup tapes, backup disks and even a thing that is actually called “virtualised storage."
It appears that litigation support teams are in a difficult period at present. Technologically speaking, paper based records are relatively easy to deal with—most lawyers will be familiar with the process of scanning paper into electronic format and being able to review it in an online system. The age-old problem remains of how to search through it all. This problem has been exacerbated hugely by now having to add in the exponentially increasing amounts of ESI. It seems paradoxical that the very systems put in place to make our lives easier and more “productive” are making things harder for us, and, currently, there are no “easy” (or cheap) ways of taking these virtual vats of data and making sense of them.
Let us look at some of the particular difficulties presented to the various involved parties: lawyers, forensic technology teams and forensic accountants.
The problem for the lawyer, as mentioned above, has always been getting the best, most relevant information in the least time. There is a trade-off between applying expensive technological solutions, which may or may not get the most relevant information (see below), or doing a page by page review which takes huge resources and experience. Additionally, at a recent seminar on this subject, the point was raised that junior staff performing the initial review may be less efficient than a senior, more experienced, lawyer but cheaper.
For the forensic technologist, who is tasked with collecting the information to begin with, the problems tend to be more logistical. It follows that, the more data there is, the longer it will take to collect and collate and, correspondingly, the longer it will take to process into a reviewable form. Obviously, when the term longer is used, this means more expensive. The collection guys also have to race to keep up with technological advances. The old days of performing disk images onto a tape drive, when a 6GB hard disk would take all day are long gone. Now hard disks in new laptops routinely top 200GB in size and there is a struggle to keep up to date with expensive forensic hardware to make acquiring such volumes of data possible within the deadlines and also economically viable for the litigator client.
When looking at the task of retrieving relevant data from a large collection of documents, or data set, the words precision and recall are often used. The precision of a search determines the relevance of the document(s) returned but does not take into account whether all relevant documents have been retrieved. Conversely, recall measures how well a search has performed in terms of the number of documents retrieved. However, it does not measure the relevance of the documents. This principle is amply demonstrated when using the traditional method of interrogating a large data set without doing a page by page review, that of key word searching. A client of ours would regularly ask us to do word searches across computer evidence data and they would invariably ask for the word “Spain” to be included. “Spain” always returned hundreds (or occasionally thousands) of positive hits – a high recall factor but with low precision. However, by clarifying the search term to, say, “villa and Spain” we might hope to get a higher precision score.
Thus, the struggle faced by litigators is to balance these twin factors whilst applying the most technologically efficient way of actually performing document retrieval. However, compared to some of the emerging technologies, it must be said that searching for key words is rather a poor relation. The industry now abounds with terms like contextualisation, conceptualisation, categorisation, threading and near duplication. Apart from the latter, these are all technologies designed to make the litigators life easier and are broadly aimed at grouping similar documents in a data set together, sometimes via a visual interface. The theory goes that, if all emails suggesting a quick sojourn to the local hostelry are clustered together then that grouping can be safely ignored. Unless, of course, the whole issue is looking into low staff productivity in which case the reviewer has hit gold! These methods are great but, because there is usually rather a large cost associated with their application, they are usually reserved for those cases where the data set is truly enormous.
Back to the Future
So, what of the future for litigation technology? It is fairly safe to assume that the needs of the litigator will remain fairly constant. This holds true for the expert as well – those of learning, distilling and processing the facts into a coherent, logical and persuasive presentation that retains integrity in court or other litigation.
Experience will still count, not only in deciding what the best course of action is but also in analysing and getting to the nub of a case. For the Forensic Accounting Expert, technology can forge highly pertinent links and equations between data, but it has yet to replace their intuition in finding and pursuing a line of enquiry that may support or undermine an argument, or indeed a whole litigated case.
That's not to say that someone will eventually somehow manage to capture the knowledge and skills of a litigation professional into an automatic system - this actually the subject of some current research where decision-making in a document review is set to algorithms. The bloodhound is yet to be replaced by K9!
Where we can hope to see light at the end of the tunnel is in the routine handling of information within organisations. At the moment most ways of automatically classifying or tagging documents lie in retrospectively applied technologies, which are only brought in when needed. If we assume that data is not going to get more structured, indeed the current pattern indicates a move to less overall structure in data storage – gone are the days when each department would have its own file server – it is imperative that systems which automatically make sense of the unstructured become commonplace. We can see a forerunner of this in a small utility program called Calais. This is a ‘fact extractor’, which will analyse a document and record such information as names, events and places which could then be fed into an analytical engine. Document repositories with this built in will be able to organise and present a structured view of the information at a single command.
Finally, developments in artificial intelligence mean that computers are already nearing the capability of passing the Turing Test which determines the ability of a computer being able to fool a human observer into thinking they are interacting with another human. Additionally, robotics technologies are becoming more and more sophisticated and cheap. Maybe our online judge from the opening paragraph will be replaced with a U.S. Robotics NS-6 (the “I, Judge”™), and the multiple Terabytes of data have been subjected to a page by page review by a huge bank of RoboParalegals™.
This is a full version of the article first published in The Barrister magazine in January 2009.