When tackling electronic discovery, many attorneys and their teams know they need to focus on collecting, reviewing and producing potentially responsive electronic data. However, the scope of discovery is not just limited to emails, user files and other similar materials. Organizations actually create, maintain and store two types of data: unstructured and structured. Understanding how structured data is created, what it is, where it exists and how to collect it is critical in ensuring a defensible and thorough discovery review.
Structured and unstructured data exist in two extremely different formats. According to the Sedona Conference's “Database Principles: Addressing the Preservation & Production of Databases & Database Information in Civil Litigation”, “Information stored in databases differs fundamentally from discrete unstructured data, because unstructured data files tend to be static and self-contained.”
Structured data, on the other hand, contains three common characteristics, according to the Sedona Conference: it consists of many pieces of discrete information; it is subdivided into fields or records; and it is saved in a common format such as a database.
The average organization may have information in many structured data formats. Examples of structured databases include:
In fact, email systems such as Microsoft Exchange are actually databases, and each individual email is a record in the structured system. If you have ever had to pull data out of Microsoft SharePoint, which is a database on the back end, then you have dealt with a structured system.
Structured data often exists in either a flat-file format (such as an Excel spreadsheet) or in relational databases (such as an SQL database). Each table in a relational database is actually a flat file-but when the tables are connected to each other via a common element and that “relationship” is maintained through the software, the files become relational. Consider a CRM database that includes contact information for an individual in one table and information about the company that individual works for in another table. When the data is presented to the user through the application, it looks like these elements are all part of one record, but behind the scenes, this data is actually divided across many tables.
Potentially responsive data can reside in a number of different structured systems. For example, information relevant to an alleged price-fixing case may exist in a combination of the sales CRM and financial ERP databases. Data in the HRIS system may be useful in combating age-discrimination claims.
In the order to find the information, though, the legal team needs to know where to look and how to get the data.
Considering the breadth and depth of the information contained in structured databases, how can the legal team conduct e-discovery in a way that is cost-effective, timely and defensible? There are several different approaches, each with its own set of advantages and drawbacks.
While backing up and producing the entire database represents a thorough approach, most legal teams rightfully avoid it at all costs. Even with a small database, the legal team will be handing over tremendous amounts of information, some of which may be privileged or confidential and much of which will be irrelevant.
Opposing counsel will only be able to review the content of the backup if they have the ability to restore it, the server and software framework to restore it to and a copy of the software that was used to get data into the system. Reviewing the data requires the exact same software application and server configuration that the backup was created from. Even if the opposing side has the same hardware and software, organizations often customize their database systems over time so licensing and other compatibility considerations will most likely be an issue.
This approach requires some technical finesse and assumes that critical documentation about the database structure and fielded data within exists.
A database schema is a diagram that defines the structure of a relational database, identifying what tables exist and how those tables interact with each other. A data dictionary is the critical document that provides a nontechnical definition for each column, helping the legal team to understand what information actually exists. Stated another way, a data dictionary - also known as a metadata repository - contains information about data, how it relates to other data, when data was created, who created it and where it resides. A database schema provides a map of how the database is structured, often depicted graphically.
Through this approach, the legal team works with a data dictionary and schema to identify and mine the data that counsel is seeking and then extract the information via the back end through an SQL or similar query.
While it seems straightforward, this approach is generally more complicated than it first appears. It requires that organizations compile and regularly update both a “data dictionary” and “database schema,” so that the data can be thoroughly identified and tracked. Few organizations have both the dictionary and the schema.
It is also very time-consuming, since both sides may need to meet and confer in an effort to negotiate what data should be extracted, the parameters of that data (i.e., a date range), production methods and other considerations.
Legal teams often find that many of the fields in the database, while they exist in the schema and data dictionary, are associated with software functionality that is not used by the organization. That adds to the complexity of the approach. In-house resources can be leveraged for this approach, but that leaves those employees open to testifying as subject matter experts.
With this approach, the legal team uses functionality built into most software applications to extract the potentially relevant data in an organized and meaningful format. Many of today's databases have fairly robust reporting functionalities, and using those is often the most intuitive and painless way to get the data. With this approach, the legal team can often leverage internal resources that are familiar with the data and the software. Since this is a much more simplistic approach than the others and leverages the application itself, there will be limited need for a technical expert.
This method focuses more on the reports that are generated from the system than the data itself. After all, most legal professionals take the position that reports from the system are what drive an organization's activities. When was the last time you heard of a CEO querying a database directly to glean information, rather than relying on the trends indicated by the reports coming out of it?
Structured data is an emerging issue, and this article only scratches the surface of the complexity associated with the topic. Unfortunately, in the ever-emerging landscape of electronic discovery, attorneys and their legal teams cannot ignore databases and other structured systems when conducting discovery. It is critical that counsel know when structured data may be relevant and how to leverage the appropriate resources to get it out.
Copyright © 2023 Legal IT Professionals. All Rights Reserved.