Legal funders interested in using quantitative valuation methods or quantitative analysis for other business decisions currently face many hurdles. In the last post in this series, we talked about how funders can overcome the data collection challenges by using a variety of sources and tools to aggregate information.
After data is obtained using one of the three techniques cited in the post above, litigation funders must then take that data and construct a useful database that can actually furnish insights to drive business decisions around portfolio construction, pricing, lead sources, team performance, or just about any other aspect of their business.
3 Steps to Build Meaningful Databases for Your Analytics
The process of constructing a database involves three primary steps:
- Cleaning the gathered data
- Structuring data to elucidate meaningful variables
- Merging disparate sets of data into a unified database
Each of these three steps has separate and specific challenges, which are crucial to your database's efficacy.
When executed properly, you arm yourself and your business with information that gives you an edge over your competitors. But if any one of these steps is done incorrectly, the output from your database can be erroneous and give you a false sense of confidence in your decision making.
Essentially, the three steps above are intended to ensure the underlying data you gathered is error-free, that the variables you choose to modify and combine have strong predictive power, and that the end result is a coherent database you can draw from over and over again to answer a number of different questions for you.
Clean the Accumulated Data
I like to joke with clients and liken quantitative techniques to kids - both often get dirty. The reality is that almost any gathered data has issues. Sometimes the problems are blatant, like issues with errors in the data or data transpositions, and require someone to undertake the rote task of fixing these discrete problems.
In other cases, your judgment is needed to decide that some of the data is erroneous. For example, when there are problems with outliers in the data. An outlier is a data point that is correct, yet highly abnormal when compared to the rest of the data. For example, a personal injury case mighty usually settle for a sum between $50K and $500K. One time in a thousand, however, it might settle for $5M. As a legal funder, you would probably want to downplay or completely remove this aberrancy in your analysis.
In order to identify problems with data, a process called winsorizing can be used. Through Winsorizing, funders can identify the difference(s) between any two given values in a dataset and remove it (or flag it) if the value is three or more standard deviations away from the mean value. This process would help to flag or remove the foregoing $5M personal injury settlement, for instance, if the mean settlement amount is $100K. The purpose in doing this is to avoid distorting the average settlement amount with unusual data or data that may be erroneous.
It's important to use winsorization for all of the variables in a dataset to ensure you really understand the data's technicalities and fine points. Even so, cleaning data can be much more nuanced and productive than Winsorizing.
It also takes care of typographical problems that may be plaguing your data. Microsoft actually provides a really good post on some of the typographical issues that nearly always need attention: 10 ways to clean your Excel data.
Structure your data for meaning
Even with a cleaned dataset, litigation fund managers may still find that they have problems making accurate predictions about case outcomes due to the type of collected data. Obviously, relevant data needs to be used, but, in many cases, raw data itself is not useful for making predictions.
One basic example of this is using data levels to anticipate default risk based on debt levels. A firm's probability of going bankrupt is related to its debt. If one were to gather data on many different firms, however, the amount of debt would not be directly correlated to bankruptcy risk.
Why? The answer is that larger firms (unsurprisingly) have more debt (and assets) than smaller firms do. Instead, we need to use the ratio of debt to assets as a predictor of bankruptcy.
The same situation applies in litigation finance. Looking at expenditures separately from case details is not all that useful in forecasting settlement propensity, but looking at the ratio of expenditures to the number of months spent on a case is.
There are countless modifications of this sort that should be made when dealing with litigation finance data in order to improve its predictive efficacy.
Merge into a unified database
Finally, once data has been gathered, cleaned and structured, funders often need to combine the different sets of data. This sounds easy, but it is not.
The key to merging disparate sets of data is to find a unique identifying variable that is present in each data set. For example, a database about lawsuit outcomes could be merged with a dataset on economic conditions. In this case, we'd want to merge based on time period - month, quarter, or year most likely.
Similarly, a data set on lawsuit outcomes could be merged with a dataset on law firm characteristics based on the lawyers involved. Here we'd need to use both the law firm in question and a time variable for the purposes of merging because law firm characteristics will change over time.
Once funders identify the unique variable to merge on, they can combine different datasets and create a larger whole-this can be as simple as using a vlookup in Excel, or it can involve more complex software packages like SAS and Stata.
Whatever techniques we use, we need to be careful to ensure that the database merges properly. After the merge is done, it usually makes sense to go through and spot-check the records in the dataset to ensure that the records from different sources have combined properly.
Next time, we'll look at the nuts and bolts of analysis of data.
About the reviewer