Loan Pricing

Preparing for a Fair Lending Examination Statistical Analysis

At the Independent Bankers Association of Texas (IBAT) Lending Compliance Summit in April, 2014 and at the Southwest Graduate School of Banking (SWGSB) Alumni program in May, there was much discussion about the regulatory focus on Fair Lending in general and the statistical analysis that is being done to identify disparate treatment. The article that follows is the first in a series of three that discuss how banks can prepare for an examination and minimize the likelihood of problems, how a bank might proceed with an in-house study to identify and fix any disparate treatment problems and finally, how some statistical examples to help explain several questions that came up at the IBAT and SWGSB gatherings.  For additional reading, you may wish to look at How a Bank Can Get in Trouble with Fair Lending Statistical Analysis and Doing Your Own Fair Lending Statistical Analysis.

The discussion of preparing for a disparate treatment statistical analysis is divided into the following sections:

Fix Data Quality Problems

When I worked in IBM’s Global Business Intelligence Systems datamining group, we had a saying:

There are customers that know they have a data quality problem, and there are customers that don’t know that they have a data quality problem.

A dataset can be pristine and balance to the penny from an accounting perspective, and yet be a nightmare from the viewpoint of performing any statistical analysis. If a regulatory statistical analyst receives a poorly prepared dataset, the analyst will will spend so much time cleaning up data that little time will be available to distinquish between unusual datapoints that can be discarded as mistakes and others that contain important information and must be included.

The FDIC Compliance Manual -- January 2014 describes risk factors for discrimination to be used in planning an examination on page IV-1.6:

C2. Prohibited basis monitoring information required by applicable laws and regulations is nonexistent or incomplete.
C3. Data and/or recordkeeping problems compromised reliability of previous examination reviews.

Don’t send a poorly prepared dataset for statistical analysis. As a banker, you are much better off if the analyst has more time and spends more time looking for data elements to explain racial/ethnic/gender patterns in your dataset. If the analyst spends hours cleaning up a poorly prepared dataset, expect to have examination problems.

All of these data quality analysis steps can be performed in Excel, though the corrections should be done on the source system so that you don’t have to repeat the clean-up process every year. Most IT personnel would probably choose to use a programming or scripting language that allows regular expressions and other features that make data manipulation easier.

Catch up on Returned Mail Address Clean-up

All returned mail identifies an address problem--either an old address, an incorrect one, or one that is entered so badly that even the U.S. Post Office can’t figure out what it is--and I am amazed at what the Post Office can deliver correctly. Before you do a data pull for any type of statistical analysis, make absolutely sure that you are caught up on fixing returned mail.

Standardize Addresses

The statement mailing firm that you use probably does address standardization as part of the service that they provide, but the standardized addresses probably don't make it back to your core system. Investigate ways to get the standardized addresses into your core system.

If you don’t use address standardization software to identify and correct spelling, format and abbreviation problems in addresses, at least do a pull and get a count of addresses by city and state. Sort the list by the cities with only one account--these are probably misspellings. If you don’t have address standardization software, you will be amazed at how many ways people can spell "Dallas" and "Houston." The Post Office correctly delivers a lot of mail that is badly misspelled. Make sure that all of the states abbreviations are valid.

If you don't have standardization software, you can use a geocoder to attempt to find the latitude and longitude of the address; if the geocoder can't figure out the latitude and longitude, it is either a Post Office Box, a Military address, or an invalid address. The next article in this series, Doing Your Own Fair Lending Statistical Analysis, has a significant discussion about geocoders and geocoding.

Verify Date Formats and Content

Most core systems do a very good job of preventing bogus dates from being entered, but you should check to make sure--especially for ancillary systems and datasets provided from third party vendors. At a minimum, check the following:

  • Verify that all dates are valid dates. For example, 2/30/2014 is clearly an invalid date, but could get into a poorly designed software system, or be part of an incorrectly generated data extract from a third party system.
  • Verify that all dates are in the right order. For example, the loan payoff date should always be after the loan opening date. There are a variety of other date relationships that should be maintained, but which sometimes aren’t.

Include and Standardize Indirect Dealer/Originator Names

If you do indirect lending, make sure to include the name of the dealer or originator of the indirect loans, and that the loan type and originator are coded correctly and consistently.

Verify Interest Rates Against Rate Sheet

Take an extract of your historical rate sheets, merge the rate sheet with your loan data by time of loan origination, calculate the difference between the rate sheet for the time period of loan and then rank by absolute value of the difference. Look at the extreme values--these are probably mistakes. Investigate the reason for the largest differences and add a code or comment to explain why these particular loans have unusual deviations from the rate sheet. If they are mistakes, work with the borrower to correct the loan.

Code Collection and Other Loan Modifications Correctly

Make sure that all loan modifications and rework of loans that were messed up somewhere along the line are coded in a way that they can be easily identified and understood. It should be easy for an analyst to figure out that a goofed up loan entry that was corrected and re-issued under another number can be legitimately excluded as an outlier.

Handle Significant Digits Properly When Exporting--Don’t Truncate or Round

In the core systems, numbers can be stored in a variety of ways--some quantities are stored as floating point, some as decimal, some as integers, and occasionally as characters. Each of these data types works differently for rounding and in some cases may just truncate everything to the right of the decimal point. If you extract using a data type that truncates or take a number with 5 decimal places and round it to 2 decimal places, you can introduce some unusual patterns in your dataset.

Always export in the data type that is used to store an element, and always export the number of digits that are stored without rounding wherever possible.

Include Calculated Items from Credit Report

Perhaps the biggest problem that you may encounter in a Fair Lending statistical analysis will be loan decisions that are based upon information that is present on a text-based credit report. If you calculate loan to value, debt to equity, or medical bill charge-offs to total charge-offs from a credit report, but don’t include that in the extract, you will almost certainly have problems during an examination. If these ratios have a strong statistical relationship with race/ethnicity/gender (likely, since income has a strong relationship), race/ethnicity/gender will show as a statistically significant, and you will have have to spend a lot of time and money providing a corrected extract plus the aggravation of dealing with examiners over Fair Lending disparate treatment issues.

If you include the additional credit worthiness-related variables that you used in the underwriting process, race/ethnicity/gender will probably not show up as statistically significant, and your Fair Lending examination will probably go as smoothly as Fair Lending examinations can go.

If your origination system does not calculate all of the ratios that you use, pressure them to add the additional ratios so that it is easy to extract them. This isn’t so much to make Fair Lending examinations easier, as it is to make fraud and abuse analysis easier for you to do. You should use the Fair Lending dataset for a fraud and abuse analysis; you will probably quickly recover the cost of preparing the data set and will start using your fraud and abuse dataset as the one you submit for Fair Lending analysis.

Perform Analysis of Indirect Loans by Dealer/Originator

If you have an indirect auto loan program, this is an area where race/ethnic/gender discrimination may be occurring without your knowledge or control. It is also an area where there is significant opportunity for fraud and abuse by an auto dealer, or specific employees at an auto dealer. The analysis that you do for indirect lending should be at least quarterly, as salespeople move from one dealership to another fequently--a dealer that has demonstrated exemplary performance for years can go south quickly when a new sales person comes onto the floor.

The discussion that follows is really oriented toward dealer-level fraud and abuse problems rather than Fair Lending, but if a dealer or an employee at a dealer is willing to commit fraud or abuse, discrimination based upon race/ethnicity/gender would not be a far stretch and vice versa. To get to this point, you will have put in a fair amount of work; you should reap the benefit of that labor, and a simple fraud and abuse analysis is the way to do it. For regulatory purposes, this analysis may or may not constitute a review of Fair Lending practices that would require you to correct any problems found; that is a question for your attorney.

Look at Fraud and Abuse Metrics

For a simple fraud and abuse analysis that can be done in Excel, calculate and rank dealers by the following quantities:

  • First payment defaults
  • Defaults immediately after end of recourse period
  • Defaults and delinquency by age

For a dealer that ranks at the top of each list, investigate individual loans that have defaulted or are delinquent. It is likely that this work will be financially rewarding to the bank.

Rank by Dealer Participation Fee

Rank the loans by dealer participation for each dealer, and for all dealers. For the highest participations, are there any patterns? A high dealer participation could be an indicator for negative equity rolled into a deal for benign reasons, it could be negative equity rolled into a deal in anticipation of bankruptcy, it could be good negotiating on the part of the dealer, or it could be the result of discrimination based upon race/ethnicity/gender.

Estimate Negative Equity for Indirect Loans

If you have an indirect lending program, negative equity rolled into a deal is a strong predictor of a lot interesting behavior. Estimating negative equity is painful if not impossible, as vehicles rarely sell for the Manufacturer’s Suggested Retail Price (MSRP) and there really isn’t a good way to capture the "value" of the vehicle. If you do capture MSRP and Kelly Blue Book (KBB) or a similar metric, it is worth calculating the difference between the purchase price and the MSRP/KBB as a proxy for negative equity.

Try to figure out a way to estimate the negative equity rolled into a loan. The dealer knows this exactly, but most lending systems don’t really have a way to record it. If high dealer participations are due to negative equity, you have a credit risk problem to monitor; if high dealer participations are not due to negative equity rolled into a deal, you absolutely have a customer satisfaction problem (the painfully high loan rate that gives the dealer the room to roll in negative equity or over charge has your name on it each month--not the auto dealer’s name) and you may have a Fair Lending problem.



Although this article is about preparing for a Fair Lending examination statistical analysis, there is little in the steps to this point that is directly related to Fair Lending--most of this preparation is related to general data quality and to simple fraud and abuse analysis. Everything in this article can be done using Excel, though there are other tools that your IS staff may have that are better suited to the task.


We use cookies to ensure you get the best experience on our website.