Jack Seward's "Avoid Being Duped"
- George Socha Responds

The points Jack Seward raises in the opening and closing paragraphs of his article, "Protecting Yourself Against E-illiteracy: Avoid Being Duped" are laudable ones. By now it should be axiomatic that litigation attorneys ought to educate themselves about what discovery information may be available in electronic form and that they should make use of available and emerging strategies to collect and evaluate that information. Mr. Seward's thesis that deduplication equals duplicity, however, is a whole different kettle of fish.

Mr. Seward appears to work from several key assumptions for which he offers no support. The tone of his comments suggests that he views the world of electronic discovery as falling into two camps, the good guys and the bad guys. The good guys are the parties who seek electronic discovery. All these folk want is the other side's relevant materials. They know little, however, about electronic discovery and even less about deduplication. In the second camp are the bad guys - producing parties and EDD vendors. In the world depicted by Mr. Seward, they invariably turn to deduplication to avoid producing response materials. Of course there are situations where producing parties abuse the discovery process; this is, after all, a problem that predates electronic discovery by many decades. As far as I know, however, there is no evidence to suggest that this behavior is the norm. If it is, we have much greater problems than deduplication to worry about.

If I understand Mr. Seward correctly, deduplication is anathema to him. His advice to requesting parties is that they should never accept a production of electronic materials unless the other side has produced all copies every item. This advice ignores some basic issues. Requesting and producing parties alike are wrestling with a rapidly increasing volume of material that needs to be considered for production. A few years ago, a gigabyte of data was an unwieldy amount of data. In 1996, for example, I seemed to make a regular habit of crashing computers by feeding them between 5 and 10 of electronic files. Today there are cases where 10's of terabytes of electronic information have been collected for potential processing and review. In addition, with reports of duplicate materials reaching 40, 60 and even 80 percent of some populations of electronic materials, it should come as no surprise that everyone involved is seeking ways to reduce the volume. Deduplication is an obvious approach is those circumstances and if properly done can lead to many millions of dollars in savings for everyone involved.

Mr. Seward seems not to recognize that deduplication means different things in different circumstances. What constitutes a duplicate depends on the circumstances and needs of a particular lawsuit. In some situations, files might be considered to be duplicates only if they had identical MD5 hash values - in essence, that they were identical in virtually all respects and just happened to have been found in different locations. In other situations, files might be deemed duplicates if they had a limited number of factors, such as email message IDs, in common. Neither approach is inherently better than the other; what matters is what works best given the requirements of a particular lawsuit.

Mr. Seward also appears to believe that all EDD vendors handle duplicates in the same fashion, one which ultimately means that the requesting party loses all chance of gaining access to them later on. In actuality, EDD vendors offer a wide range of approaches. Some do not handle deduplication well. Others attempt to preserve all duplicates, produce one copy of the duplicate file to the other side, but also make available information about the copies that were not produced; should an issue arise later about a duplicate that was not produced, the duplicate and the information about it still are there.

Finally, Mr. Seward does not seem to consider the possibility that the parties could work out deduplication issues themselves in a way that meets the needs of all involved. A broad range of mechanism exist to facilitate this, from information correspondence and meetings, to initial conferences with the judge, to requests for production of documents, to meet-and-confer sessions, to deployment of electronic discovery special masters. If parties approach this issue openly and early, they stand a good chance of making it a non-issue.

In the end, deduplication is neither good nor bad, right nor wrong. It is just one more tool in the discovery process. It can be used to reduce costs, streamline discovery, and help parties focus in on what information matters most for the lawsuit. Or it can be used to obfuscate and obstruct. Just like the rest of discovery.

George J. Socha Jr., Esq.
Socha Consulting LLC

"Informing digital discovery decisions"

1374 Lincoln Avenue
St. Paul MN 55105
Tel 651.690.1739
Cell 651.336.3940
Fax 651.846.5920
george@sochaconsulting.com
www.sochaconsulting.com

September, 2004