Doing Your Own Fair Lending Statistical Analysis
At the IBAT Lending Compliance Summit in April, 2014 and at the SWGSB Alumni program in May, there was much discussion about the regulatory focus on Fair Lending in general and the statistical analysis that is being done to identify disparate treatment. This is the second in a series of articles that discuss statistical analysis as it can be used for Fair Lending analysis. The first article in the series, Preparing for a Fair Lending Examination Statistical Analysis, discusses how to collect and prepare a dataset that a regulatory agency will use for Fair Lending analysis. The steps described in the article that follows involve analysis doing your own Fair Lending compliance analysis to anticipate problems that might come up during an examination. Fortunately, there are now open source statistical tools to do very sophisticated analysis, though these tools may require skills that the bank may not have in-house.
The article assumes that you have already cleaned up and prepared your dataset as described in Preparing for a Fair Lending Examination Statistical Analysis. The article is divided into the following sections:
- Geocode Addresses
- Estimate the Race, Ethnicity and Gender of Loan Applicants
- Calculate Manhattan Distance or Drive Time from Borrower to Nearest Branch
- Join Loan Data with Rate Sheet Historical Data
- Create Variables for Analysis and Create Training and Test Data Sets
- Testing for Disparate Treatment
The first step in almost any customer-oriented analysis is to geocode the customer's address. Geocoding is the process of converting a street address into latitude and longitude coordinates that can be plotted on a map, used to merge address data with census data, used to calculate a drive time between two locations or used in calculating the Manhattan or the straight-line distances between two points. It is very useful in a variety of banking analysis problems, not the least of which is address clean-up; if an address won’t geocode and isn’t a P.O. Box, it probably has some problems that need to be fixed. Ten years ago, geocoding was difficult and expensive. Today, there are a variety of applications to do this in volumes that are reasonable for small banks:
- Most Master Customer Information File (MCIF) marketing system vendors provide services to add demographic data and frequently geocode addresses as part of this service. This is probably the easiest way to geocode a set of loans.
- If you have a commercial address standardization package or your statement mailing vendor does address standardization, it may have geocoding available by default or as an added feature; it is worth investigating.
- For in-house geocoding without a commercial package, the most convenient geocoder is probably Google Maps. Make sure to review the Google Maps API Terms of Service and potential privacy issues with your bank's attorney before choosing this option. Most institutions would want to get a Google Maps API key to use in their geocoding application to set up payment, otherwise the geocoding application would need to be throttled in order to meet Google’s Terms of Service. This is available from a variety of programming languages including PERL, Python (one or both may be known by IT Systems Administrators) and R (a statistical language). There are other open source packages available for geocoding.
- The PERL programming language has an open source geocoding package available for download and installation. See the Comprehensive PERL Archive Network and search on “geocode”.
- The Python package geopy offers several open source geocoders using several different cloud APIs.
- The R ggmap package offers geocoding via the Google Maps API.
- FFIEC offers a geocoding service, but it would require screen scraping and isn't really suited to doing a large volume.
Estimate the Race, Ethnicity and Gender of Loan Applicants
Since no one collects information on race and ethnicity in loan applications, in doing its Fair Lending analysis, the regulatory agencies must come up with some way to estimate the race, ethnicity and gender of a borrower. All of the ways to do this are error-prone to one degree or another, but that discussion is beyond the scope of this article. Use one of the alternatives below to come up with an estimate for the race, gender and ethnicity of each borrower, and then create an array of variables to use in the analysis.
The Hard Way--Do It Yourself
- Merge Loan Data with Census Data
Once you have geocoded all of your loans, join it with Census data, and add census variables for race, and ethnicity of the surrounding block group to your dataset. You'll end up with a number of
- Join Loan Data Set with Census Ethnic Surname Database
An on-line research publication, Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities provides a description of a methodology for estimating the race/ethnicity for a person using Census surname data and the geocoded block group; the method described has a correlation of 0.76 when compared to self-reported race/ethnicity at a health insurance provider. The surname frequency and race/ethnicity probability can be downloaded from http://www.census.gov/genealogy/www/data/2000surnames/index.html.
The Easy Way--Merge Loan Data with MCIF Vendor Race and Ethnicity Data
Acxiom and a number of data brokers routinely provide estimated race and ethnicity data in demographic data sets. If this is available to you, merge this data with your loan data. Although you may not do business directly with Acxiom or the other major data brokers, many MCIF vendors offer demographic data enhancement services that resell Acxiom’s services. You can probably get this service through your MCIF vendor.
Generate Array of Variables for Estimated Race, Ethnicity and Gender
For each race, ethnicity and gender, create a variable with a probability that the person fits in to each category. You should develop two sets; one with all of the census detail, and a second where all of the unprotected groups (talk to your Compliance Officer on this) are merged. For each set create a variable that is the best estimate of race and ethnicity. You will use the second set of variables to determine whether or not you have a Fair Lending compliance problem, and the first set to diagnose and refine your understanding should you identify a Fair Lending compliance problem.
Calculate Manhattan Distance or Drive Time from Borrower to Nearest Branch
Since you have the latitude and longitude for each borrower, go ahead and calculate the Manhattan distance (distance North/South + distance East/West) between the borrower and your nearest branch and between the borrower and the nearest competitor’s branch. Distance to a branch is a strong predictor for a variety of consumer financial behaviors, so it is worth having it available for analysis. The Manhattan distance is easy to compute and doesn't require expensive software.
The time it takes to drive from the customer's address to the branch is a much better predictor than distance, but it is difficult to calculate. The easiest is a commercial package called Arcview Business Analyst and the related Arcview Network Analyst products from ESRI. They aren't cheap, so you might want to contract with ESRI to do this part of the work. There may be other alternatives within the Google Maps API or other navigation web services.
Join Loan Data with Rate Sheet Historical Data
One of the most important tests is to look at deviations from published rate sheets (See FDIC Compliance Manual--January 2014 page IV-1.8 section P2). To do this you will need to join your historical rate sheets with the loan data for corresponding dates and then calculate the deviation from the rate sheet.
Create Variables for Analysis and Create Training and Test Data Sets
Now that you have cleaned up all of your data as described in Preparing for a Fair Lending Examination Statistical Analysis, and done all of the estimates for the borrower’s race/ethnicity/gender, combine everything into a single dataset to be used for analysis. For codes that are numbers, make sure to identify them as factor or ordered factor data types rather than real valued numbers. Now is the time to start duplicating items with common data transformations to normalize values on a 0 to 1 scale, or take the log of a variable where the values are orders of magnitude different. This step is the first that must be done in a statistical tool, as databases and flat files don't support the concepts of "factor" and "ordered factor."
You should also create training and test datasets, to determine whether or not the models that you generate are over-fitted. If protected groups (or unprotected groups) are very infrequent in your dataset, you should consider repeating some of these low-frequency observations in the training data set. If they are very infrequent, they may be ignored, and a pattern could be present, but not recognized, in which case you could get a rude awakening during the examination.
Testing for Disparate Treatment
There are several approaches to look for disparate treatment under the Fair Lending regulations. Since we are interested in screening for problems rather than proving a problem, it is appropriate to use that an approach that casts a wide net and identifies issues that might not rise to the level of statistical significance, financial materiality, frequency, or causation that may cause problems. These terms are mine and don't appear in any regulation or compliance manual that I've seen. I use them because most of the discussions that I've heard combine all of these concepts into the term “significant” and aren't all that precise.
The statistical approach that follows is hopefully not the procedure used used by regulators. The predictive modeling approach that I discuss below will probably indicate patterns where race/ethnicity/gender are useful in predicting price where hypothesis testing approaches might not identify race/ethnicity/gender as statistically significant. Remember that in this analysis, we want to cast a broad net to find anything that might be remotely problematic.
Once all of the data preparation is done, you can begin to look at the data and identify any race/ethnicity/gender patterns that exist. Broadly speaking, you will need to look at interest rates on loans that were approved, including both loans that closed and loans that did not close. You will also need to look at loan approvals.
Because we are screening for problems, we don't want to spend a lot of time if we can help it. The approaches below are by no means exhaustive, but instead are intended to be a labor-efficient approach to screening for problems. The section is divided into the following steps:
- Disparate Treatment in Pricing--Test Approved Loans Including Loans that Did Not Close
- Disparate Treatment in Underwriting--Test Approved vs. Denied Loan Applications
- Disparate Treatment in Product Selection--Test Qualified vs. Sold
- Test Models for Over-Fitting
Before doing any statistical tests on the dataset, it is usually helpful to look at some simple visualizations. A few possibilities are listed below:
- Generate visualizations for all of the variables that you have in the dataset. the best way to start out is to plot the deviation from the rate sheet vs. each variable. In the R statistical program, you can do this easily using the
- Plot the geocodes for all of the loans on a map, along with branch locations. This won't shed any light on the disparate treatment directly, but it is an easy plot to do and may help you to understand sales patterns better.
- For each racial/ethnic group, plot the deviation from the rate sheet as a time series to see if there are any seasonal patterns; you may find significant deviations immediately before and after rate sheet changes. If this is the case, you should look at these by race and ethnicity to see if there are patterns in who got the old rate after the rate sheet change in a rising environment and who got the new rate early in a falling environment.
Disparate Treatment in Pricing--Test Approved Loans Including Loans that Did Not Close
There are many ways that you can look for disparate treatment in pricing in approved loans, but the fastest way to get an understanding of the data would be to do a stepwise regression to predict the interest rate on the loan using all of the credit-worthiness metrics available plus race/ethnicity/gender. If the race/ethnicity/gender variable shows up as significant in the stepwise regression model, you either have disparate treatment in pricing, you have been making credit decisions on creditworthiness variables that are not included in your dataset, or you need to do further analysis to find a model that better explains the patterns present in the data. Stepwise regression gives good models quickly, but there may well be a model that better explains the patterns present in the data that the stepwise automation didn’t find.
At a minimum, you should do the following:
- Perform a step-wise linear regression to predict interest rate
- Perform a step-wise linear regression to predict interest rate deviation from rate sheet
- Repeat analysis for each indirect dealer/originator
In all cases, make sure to check the residual plots for the various regression models.
Although it won't tell you anything directly, looking at the closing rates for each racial and ethnic group as shown in Table 1 can point you to further investigation; if there are statistically significant differences, you will probably want to expend more effort in the later steps.
|Approved||Closed||Close Rate||P-value That Group
Has Same Average as Non-Hispanic White
|Ethnic Group 1||100||40||0.40|
|Ethnic Group 2||100||60||0.60|
|Ethnic Group 3||100||45||0.45|
Disparate Treatment in Underwriting--Test Approved vs. Denied Loan Applications
To look for disparate treatment in underwriting, you will need to look at both approvals and denials. To get a quick understanding, do a stepwise logistic regression to predict loan approval.
- Perform stepwise logistic regression to predict loan approval
- Repeat analysis for each indirect dealer/originator
In all cases, make sure to check the residual plots for the various regression models.
Disparate Treatment in Product Steering--Test Qualified vs. Sold
Finally, we need to look at product selection to make sure that qualified borrowers aren't steered into more expensive sub-prime products when they qualify for a prime product. For this analysis, we will generate a matrix like the one shown in Table 2 below for each racial/ethnic group:
In this table, everyone should be on the northwest to southeast diagonal. If you have non-zero entries on the southwest to northeast diagonal for any of the racial or ethnic groups, you will need to perform a chi-squared test (Χ2) to determine if the groups are treated differently from a steering perspective.
Test Models for Over-Fitting
If you come up with models that include race/ethnicity/gender as significant predictors even when all other creditworthiness variables are available to the stepwise regression, make sure to run them against the test data set. If the model continues to predict well, you are missing a creditworthiness variable with strong race/ethnicity/gender patterns, you have a lot of work ahead to find a manually constructed model that performs better or you have a disparate treatment problem that needs to be addressed.
The analysis described above will help you to identify whether you have disparate treatment patterns that could appear in a Fair Lending examination statistical analysis. Since the exact procedures that the regulatory agencies use are not public, it is not a guarantee that issues won’t come up.
- Written by Bruce Moore
- Hits: 5140
How a Bank Can Get in Trouble with Fair Lending Statistical Analysis
At the Independent Bankers Association of Texas (IBAT) Lending Compliance Summit in April, 2014 and at a lecture on compliance at the Southwest Graduate School of Banking (SWGSB) alumni program in May, there was a great deal of discussion about the statistical analysis that the FDIC is doing to test for disparate treatment under the Fair Lending (Reg Z) laws. One participant said incredulously "we do 80% of our loans to Hispanic borrowers, and they [FDIC] are telling us that we're discriminating against Hispanic borrowers." She assumed that because her bank does a large number of loans to Hispanic borrowers, that it is impossible that the bank could have disparate treatment. I'll use an example to show that the participant's assumption is wrong--a bank can have disparate treatment even though the bank does a majority of its lending to Hispanic borrowers. It can actually be easier to have disparate treatment problems when most of a bank's loans are to a protected group and a minority of the bank's loans are to non-Hispanic whites.
Another person told of an encounter with a regulator where the regulator stated that if the bank were engaging in disparate treatment that the regulators would find and punish it. I would not have that much confidence; disparate treatment can exist but not rise to a level where it will be identified by statistical methods; this can be easily demonstrated.
Another person described a bank officer who was so appalled and stressed that his bank was being cited for disparate treatment, that he had a heart attack--he felt that he was being accused of being racist. It is possible--even probable--that a bank can have a very real disparate treatment problem without anyone on the staff being remotely racist. I'll describe a scenario where the cause is not remotely racist, but where the pricing is undeniably disparate and unfair.
A fourth discussion cited an instance where a bank was cited for disparate treatment, but the penalty was approximately $1,000 to cover a large number of Hispanic borrowers. It is possible to have statistically significant disparate treatment that is not financially material. The examples in the article that follows will illustrate financial materiality and illustrate different ways to determine whether or not a pricing difference is material.
Finally, it struck me as unusual that all of the disparate treatment questions discussed involved Hispanic borrowers and that none involved blacks (the Census term). The method that is presumed to be used by the FDIC for identifying a borrower’s race/ethnicity is more accurate at identifying Hispanic borrowers than non-Hispanic whites or blacks; the variability that this introduces makes it less likely that disparate treatment would be identified for a black population. This will be illustrated with an example.
The same inaccuracy in racial/ethnic identification that make a finding of disparate treatment for blacks unlikely also makes it makes it possible that a bank could be subject to a finding of disparate treatment of Hispanic borrowers when perfect racial/ethnic identification would not trigger a finding of disparate treatment. This will be illustrated with an example.
The article is a grossly simplified and contrived series of examples that are intended to illustrate both the power and weakness of statistical analysis for disparate treatment in fair lending. It will hopefully allow bankers and regulators--neither of whom commonly have statistical training--to have more useful discussions about the statistics involved in a finding of disparate treatment.
The article touches on many controversial subjects; if you don't like controversy, stop reading now. To make the discussion easier to follow and more precise in conjunction with other publications, I identify racial groups by the terms used in the Census, rather than terms which may be more common in current journalism style sheets. The examples are contrived but hopefully illustrate what may be a common scenario in banking and society that can be understood and discussed. The examples are constructed to illustrate how statistics can and cannot be applied to the analysis of disparate treatment. There is some discussion of the statistical use of the term significant and the popular use of the term “significant” which more commonly is used with the meaning of the accounting term material or the common word “frequent.” A bank can have disparate treatment that is statistically significant while at the same time such treatment is neither financially material nor frequent.
This article is the second in a series on Fair Lending disparate treatment analysis. You should also read Preparing for a Fair Lending Examination Statistical Analysis. The article is divided into the following sections:
- Example Loan Portfolio
- Example 1: How Many Preferential Loans are Needed When Loan Pricing is Variable and Preferential Rate is 1.5% Better?
- Example 2: How Many Preferential Loans are Needed When Loan Pricing is Variable and Preferential Rate is 0.65% Better?
- Example 3: How Many Preferential Loans are Needed When Loan Pricing is Perfect and Preferential Rate is 1.5% Better?
- Example 4: How Many Preferential Loans are Needed When Loan Pricing is Perfect and Preferential Rate is 0.05% Better?
- Example 5: The Danger Zone for Likely Enforcement
- Example 6: How Does Inaccuracy in Race/Ethnicity Estimation Alter Results?
Example Loan Portfolio
The loan portfolio for these examples has 1000 borrowers, of whom 80 percent are Hispanic borrowers and 20 percent are non-Hispanic white borrowers. To take out all of the complications of differences in credit scores and other credit-worthiness metrics, we assume that all borrowers’ credit-worthiness is absolutely the same in all respects. The only difference is that some are Hispanic and some are non-Hispanic white. The interest rates are assigned to the whole population using a random number generator with a normal distribution of specified mean and standard deviation that differs from example to example.
Example 1: How Many Preferential Loans are Needed When Loan Pricing is Variable and Preferential Rate is 1.5% Better?
For the first example, we start out with a loan portfolio where the average interest rate is 15.00% for both Hispanic borrowers and non-Hispanic whites. In this bank, there is a lot of negotiation on loan rates, and thus a lot of variability. We will assume that one loan officer is non-Hispanic white and goes to a church that is overwhelmingly non-Hispanic white. Since many churches are very closely tied to ethnic groups--Greek and Russian Orthodox congregants are predominately of Greek and Russian descent, Lutherans are predominately of German or Scandinavian descent, Presbyterians are predominately of Scottish or Scots-Irish descent, Episcopals are predominately of British descent, African Methodist Episcopal church members are predominately African American--it is reasonable to assume that the vast majority of the congregants at the loan officer’s church are similarly non-Hispanic white.
The loan officer knows several congregants well and knows far more about how they spend (or don't spend) money than one could ever tell from a loan application. He feels that they are unusually good credit risks. He knows that some of them struggled for several years but managed to pay off substantial medical bills. If this loan officer were to give a preferential interest rate to these congregants--1.50% better--how many preferential loans would he have to make before the bank has a disparate treatment problem that is identifiable through statistics or in the language of statistics, is “statistically significant.”
The first figure below shows the the distribution of interest rates from the loan portfolio before we apply the preferential interest rates, with the Hispanic distribution on the left and non-Hispanic white distribution on the right. The second figure shows the distribution after the loan officer gives enough preferential loans to detect it using a statistical test at the 0.01 level. This means that the probability is less than 0.01 (1%) that the Hispanic and non-Hispanic white groups are getting the same interest rate. It doesn't look a lot different, but you can find the differences if you study it for a moment.
Statistics by itself only gives the probability that the two groups are getting the same interest rate; I've arbitrarily chosen to say that 0.01 (1%) is improbable enough that I will accept that the two groups are being treated differently. 0.01 (1%) is a stringent threshold for accepting or rejecting that two groups are different. For some types of analysis where the stakes are low--like a marketing direct mail response model--statisticians might use 0.10 (10%) probability of random occurrence (p-value) as the threshold while for medical drug efficacy it might be a much more stringent threshold of 0.02 (2%) or 0.01 (1%).
The third figure shows how the probability drops as the number of preferential loans increases. 57 preferential loans would be statistically significant at the 0.01 level. This is useful in understanding that although findings of disparate treatment are pass/fail, regulators will almost certainly notice when a bank’s loan portfolio comes back with a p-value of 0.10 or 0.05. From a regulatory point of view, it would be efficient to use this information to allocate more resources to fair lending examinations for banks that scored a p-value that was near the level of enforcement on previous examinations.
The fourth figure shows how the average interest rate for non-Hispanic whites drops as the number of preferential loans increases. In this example, it drops to 25 basis points to 14.75% at about 40 loans--the point where the preferential treatment becomes statistically significant at the 0.01 level.
In this example, we know that all of the preferential loans indicate disparate treatment, but the threshold that I've chosen doesn't kick in until I've give preferential rates to 57 of the 200 non-Hispanic whites in the loan portfolio. With 0.01 as the standard of significance, a bank with only 55 preferential loans would get a pass, while one with 57 preferential loans would get a fail.
This example illustrates two points:
- A bank can have statistically identifiable disparate pricing even when the majority of the bank's loans are to Hispanic borrowers.
- A large number of preferential loans can occur without rising to a level where the disparate treatment can be identified statistically. With the variability in this example, 28% of the non-Hispanic whites could receive preferential rates without rising to the level of detection at the 0.01 level.
Statistically Significant vs. Financially Material
In the scenario that I've described, the bank is clearly giving preferential treatment to the non-Hispanic white group that is statistically significant, but I don't think that anyone would describe the cause of preferential rates to long-known friends as intentionally racist--just grossly unfair to people who don't attend his church. Statistically significant has special meaning in statistical terminology--but is this case also “significant” as we use the term “significant” in everyday speech? More precisely, is this financially material?
Let's assume that these are unsecured term loans for $1000.00 with a term of 12 months, and calculate the difference in interest paid between the normal and the preferential rate in a simplified estimate using just the average interest rate of the Hispanic borrowers and the average interest rate of the preferential loans as shown in Table 1. To simplify the example analysis, the time value of money will be ignored in estimating the value of the preferential pricing.
|Loan Term||12.00 months|
|Average non-Hispanic white Int Rate||14.73%|
|Normal Cumulative Interest||$83.10|
|Preferential Cumulative Interest||$74.62|
|Individual Preference Value||$8.48|
|Total Preference Value Given||$483.10|
|Average Preference Value for non-Hispanic white Population||$2.42|
|Average Preference Value as a Percent of Cumulative Interest||2.91%|
|Total Preference Spread Across Hispanic Population||$0.60|
|Total Preference Value if preferential rate extended to all Hispanic borrowers||$6780.34|
|Number of Preferential Loans||57|
As shown in Table 1, the preference value is $8.48 per loan, for a total value of the preference given of $483.10 . If this $483.10 were apportioned to all 800 Hispanic borrowers, they would each receive $0.60. If the preferential rate were extended to all 800 Hispanic borrowers, the value would be $6780.34. Since all 1000 people--including 800 Hispanic borrowers--have exactly the same creditworthiness, if any of the non-Hispanic white borrowers were eligible for the 13.50% rate, then all borrowers should be eligible for the 13.50% rate.
What is Financially Material?
At what point does the individual preference value become material? This is a subject for much discussion and debate. To provide context, most financial statements are in units of $1,000, so $1,000 will appear as a change on a financial statement. At most institutions, the staff would work for a considerable time before closing a teller workstation with a $1,000 discrepancy. If we take this as an arbitrary standard for the lowest amount that is financially material for a bank, how would this translate to a household? A $500 million bank generates $5,000,000 in income for a 1% ROA--what would this look like scaled to a household income? Figure 5 shows how a $1,000 discrepancy at a $500 million institution would scale to modest household incomes. By this measure, $10 would be material for a household with an income of $50,000 per year, and $5 would be material for a household income of $25,000 per year.
A colleague who is black reviewed this article and said, "it's not just that [bank loan prices]. It's everything." This conversation inspired another approach to determining whether or not a preference could be considered material. Out of this conversation came an approach to materiality that looks at the value of the preference as a percentage of the transaction, which is 2.91% in this case. If all purchases by a protected group were at a 2% higher price, what would be the aggregate affect--would 2% be material in the aggregate? Figure 6 shows 2% of income--an amount that is clearly material for all income levels.
In any case, annecdotes suggest that the FDIC has chosen an average rate difference of 0.25% as the threshold enforcement.
Financially Material vs. Frequent
Example 1 is clearly statistically significant, but with an average preference value of $2.42 for non-Hispanic whites that is below the range of being financially material even after scaling for a low-income household as estimated in Figure 5, it is hard to describe this case as material either from the perspective of the bank or the borrower. Anecdotal evidence indicates that the FDIC could well consider this case to be disparate treatment that rises to the level of action. This suggests that frequency of occurrence is as important or more important than whether the preference is financial material.
In a population where everyone is exactly equally creditworthy, 57 preferential loans out of 200 loans to non-Hispanic whites should reasonably be considered frequent enough to justify action.
Example 2: How Many Preferential Loans are Needed When Loan Pricing is Variable and Preferential Rate is 0.65% Better?
Anecdotal evidence suggests that the FDIC is using a 0.25% unexplained difference in interest rates between Hispanic and non-Hispanic whites as a threshold for action. The variability and size of preference in Example 1 were carefully chosen so that the probability of random occurrence reached 0.01 at the same time that the average interest rate difference reached 0.25%. 57 non-Hispanic whites received the preferential rate in this case. What happens when the amount of preference is lowered? Example 2 keeps the same variability but changes the preference from 1.5% to 0.65%. Figures 7, 8, 9 and 10 illustrate what happens when the amount of the preference is lowered.
As shown in Figure 8, this requires 131 preferential loans for this random portfolio before reaching the threshold of 0.01 probability of random occurrence compared to 57 preferential loans when the preference was larger. Lowering the preference further would result in a higher number of preferential loans. Table 2 shows that the materiality calculations don't really change from Example 1, but almost 50% of the non-Hispanic whites received the preferential rate--a number that is certainly frequent.
This example is equally as statistically significant, is also not individually financially material, but it is clearly much more frequent.
This example illustrates the point that the frequency of preference given is perhaps a more useful measure than the amount of the financial preference.
|Loan Term||12.00 months|
|Average non-Hispanic white Int Rate||14.74%|
|Normal Cumulative Interest||$83.10|
|Preferential Cumulative Interest||$79.42|
|Individual Preference Value||$3.68|
|Total Preference Value Given||$481.71|
|Average Preference Value for non-Hispanic white Population||$2.41|
|Average Preference Value as a Percent of Cumulative Interest||2.90%|
|Total Preference Spread Across Hispanic Population||$0.60|
|Total Preference Value if preferential rate extended to all Hispanic borrowers||$2941.73|
|Number of Preferential Loans||131|
Example 3: How Many Preferential Loans are Needed When Loan Pricing is Perfect and Preferential Rate is 1.5% Better?
Examples 1 and 2 show that if there is a lot of variability in the interest rates that borrowers negotiate, it can take a relatively large number of preferential loans before there is a statistically significant disparate treatment at the 0.01 level--28.50% and 65.50% of the non-Hispanic white borrowers received preferential loans. How are things different when the bank tightens up it pricing so that there is no variability in pricing?
For this example, we will start with the same portfolio of 800 loans to Hispanic borrowers, and 200 loans to non-Hispanic whites. All of the borrowers have exactly the same creditworthiness. All loans will have exactly the same interest rate. How many preferentially priced loans (interest rate is 1.50% better) would be necessary to have disparate treatment that is significant at the 0.01 level?
Figure 11 shows the distribution of interest rates before giving preferential rates, while Figure 12 shows the distributions when the probability that the preferential loans occurred randomly is less than 0.01 (1%). Figure 13 shows the probability that they occurred randomly as the number of preferential loans increases. Instead of 57 preferential loans, it now only takes 2 preferential loans to cross the threshold of statistical significance. The second bar at 13.50% in the non-Hispanic white side of Figure 12 is so small that it is very hard to see and it may not show up if you print this article.
Statistically Significant vs. Financially Material
Although this case is equally as statistically significant as the case in Example 1, the calculations in Table 3 show that the value given for the preference is substantially less than in Example 1, with a total preference value of $16.95 compared to $483.10 . The preference extended to all Hispanic borrowers is the same at $6780.34, even though only 2 preferential loans were given instead of 57 . The comparison of Examples 1 and 2 with Example 3 show that the determination of materiality must be part of the analysis, and is probably the reason that anectdotal evidence suggests that the FDIC uses both a p-value of 0.01 and an average difference of 0.25% in making enforcement determinations.
You can have statistically identifiable disparate treatment without the disparate treatment being financially material. Although equally statistically significant when compared to Example 1, the average non-Hispanic white preference value of $0.08 does not rise to any reasonable level of being financially material at the household level. With an average interest rate difference between Hispanic borrowers and non-Hispanic whites of 0.02% this probably would not rise the level of FDIC action during an examination.
This example illustrates the point that statistical significance and financial materialilty are not related.
|Loan Term||12.00 months|
|Average non-Hispanic white Int Rate||14.98%|
|Normal Cumulative Interest||$83.10|
|Preferential Cumulative Interest||$74.62|
|Individual Preference Value||$8.48|
|Total Preference Value Given||$16.95|
|Average Preference Value for non-Hispanic white Population||$0.08|
|Average Preference Value as a Percent of Cumulative Interest||0.10%|
|Total Preference Spread Across Hispanic Population||$0.02|
|Total Preference Value if preferential rate extended to all Hispanic borrowers||$6780.34|
|Number of Preferential Loans||2|
Example 4: How Many Preferential Loans are Needed When Loan Pricing is Perfect and Preferential Rate is 0.05% Better?
To fully understand the difference between statistical significance and financial materiality, let’s look at another example where all loans are priced exactly equally and the preferential rate is 0.05% better--instead of 15.00%, the preferential rate is 14.95%. Figures 15, 16, 17 and 18 look almost identical to their counterparts in Example 3--with the perfect pricing (in statistical terms a standard deviation of 0), the tiny preferential rate is just as statistically significant as the large preferential rate.
How do the two cases differ in financial materiality? Table 4 shows the estimate of the value of the preference given is $0.57 and the value of the preference extended to all Hispanic borrowers is $226.48. This is a clear case where the disparate treatment is statistically significant but not financially material nor frequent.
|Loan Term||12.00 months|
|Average non-Hispanic white Int Rate||15.00%|
|Normal Cumulative Interest||$83.10|
|Preferential Cumulative Interest||$82.82|
|Individual Preference Value||$0.28|
|Total Preference Value Given||$0.57|
|Average Preference Value for non-Hispanic white Population||$0.00|
|Average Preference Value as a Percent of Cumulative Interest||2.91%|
|Total Preference Spread Across Hispanic Population||$0.00|
|Total Preference Value if preferential rate extended to all Hispanic borrowers||$226.48|
|Number of Preferential Loans||2|
Example 5: The Danger Zone for Likely Enforcement
The anecdotal FDIC enforcement triggers of a 0.01 level of significance and a 0.25% difference in the interest rate between protected groups and non-Hispanic whites makes it possible to develop a chart estimating how many preferential loans of a particular size are likely to trigger an enforcement action. The FDIC has not made public statements of these triggers, so these assumptions are just that--assumptions. Do not depend upon the accuracy and correctness of this curve, as these assumptions could be completely wrong, and this simulation is a contrived two-group population rather than the complex populations that occur in real life. To the extent that the various assumptions are reasonable, Figures 19 and 20 give an estimate of the zone where banks are in danger of triggering an enforcement action. Above the curve enforcement action would be likely under these assumptions, while enforcement action is less likely as a portfolio moves farther below the curve. This isn't a hard line--this curve assumes the variability used in Examples 1 and 2 (in statistical terms, the standard deviation is 1.25%); with higher variability the curve will move up while with lower variability the curve will move down. Figure 19 shows all of the data points from 20 simulations; think of this as doing this test with portfolios from 20 different banks. Figure 20 is from the same data, but is drawn to show the 95% confidence interval for the enforcement boundary.
The key lesson from this analysis is that a small number of big “sweetheart deals” in a small pool of non-Hispanic white borrowers may be enough to cause significant problems for a bank during a Fair Lending examination.
Example 6: How Does Inaccuracy in Race/Ethnicity Estimation Alter Results?
All of the previous examples assumed that the identification of Hispanic and white non-Hispanic borrowers was perfect. In reality, we know that this is not perfect. The paper thought to describe the FDIC approach to estimating a borrower’s race and ethnicity is described in Using the Census Bureau’s Surname List to Improve Estimates of Race/ethnicity and Associated Disparities. The paper lists correlations of 0.7 for identification of black borrowers, 0.76 for non-Hispanic white/other and 0.82 for Hispanic borrowers which corresponds roughly to accuracy of 49%, 58% and 67% respectively. If through identification errors, non-Hispanic whites (some of whom have preferential loan rates) are mixed in with blacks or Hispanic borrowers who do not have preferential loan rates, how does this alter the number of preferential loans needed to reach a p-value of 0.01 and an average rate difference of 0.25%?
Figures 21 and 22 show the region where enforcement action is likely for four scenarios where the fraction of Hispanic correctly identified is varied from 100% to 70% and the fraction of white non-Hispanic borrowers is varied from 100% to 60%. Generally, as accuracy decreases the likely enforcement region relaxes upward and to the right.
Although this simulation is a binary example between two populations, it illustrates a possible cause for the anecdotal observation that there are few enforcement actions involving disparate pricing for blacks; the lower accuracy of the the racial/ethnic identification of blacks relaxes the enforcement region significantly to the right. From high to low, the relative accuracy of identification is Hispanic, non-Hispanic white and black. This makes it more likely that enforcement will occur for disparate pricing to a Hispanic population than to a black population.
The simulation was run 20 times with a different random number seed each time--think of this as running the simulation using portfolios from many banks. The discontinuities in the various curves (especially the 70%/60% curve) are due to the variability of the data. In a single bank, a small number of loans could move the bank from the non-enforcement region into the enforcement region.
Table 5 shows the number of portfolios from this simulation where the 70%/60% curve crosses below the 100%/100% curve; these a particular portfolios where an enforcement action might occur when perfect race/ethnicity identification would not cause it to occur. The inaccuracy in race/ethnicity assignment makes it much less likely that most banks would face enforcement, but for a small number of banks, enforcement could occur when it would not occur with perfect race/ethnicity identification.
|Simulations where 70%/60% Identification Curve More Strict than Perfect 100%/100% Identification Curve||0|
|Percent Simulations with 70%/60% Identification Curve More Strict than Perfect 100%/100% Identification Curve||0%|
The differences in Figures 23/24 and 25/26 illustrate the variability introduced by having a small non-Hispanic white population and a large Hispanic population. Figure 24 (varying correct identification of the Hispanic population) shows curves that are not smooth and which have some discontinuities, while Figure 26 (varying correct identification of non-white Hispanic population) shows curves that are very smooth. Incorrectly identifying a few Hispanic borrowers as non-Hispanic white results in significant changes in the interest rate distribution of the small non-Hispanic white population, while incorrectly identifying a non-Hispanic white as Hispanic does not significantly alter the interest rate distribution of the Hispanic population. The differences in these two figures emphasizes how a small number of preferential loans in a small non-Hispanic white population can quickly move a bank from the non-enforcement region to the likely enforcement region.
The examples in this article are grossly simplified from a real case in a few ways:
- Some of the examples exactly identify Hispanic and non-Hispanic white borrowers. In reality, race and ethnicity are inferred using the borrower's surname and address in a process that is accurate for about 50-64% of borrowers from the four major racial groups (accuracy is lowest for blacks and highest for Hispanic borrowers). See Using the Census Bureau’s Surname List to Improve Estimates of Race/ethnicity and Associated Disparities for a discussion of how race and ethnicity can be estimated.
- All of the examples assume all borrowers are exactly equally creditworthy. In a real loan portfolio there would be numerous differences in the creditworthiness between individuals that would have to be corrected before doing any type of analysis for disparate treatment. The simple statistical tests used in this article could not be used for a real loan portfolio, although the concepts would be the same.
- All of the examples assume that 80% of the borrowers are Hispanic and 20% are non-Hispanic whites and tha the loan portfolio is exactly 1000 loans. The various curves would change significantly as the number borrower groups and proportions of borrower groups change.
Disparate Treatment Can Occur but Not Rise to Level of Enforcement
Figure 7 shows that if the preference is small, disparate treatment could occur frequently (though not necessarily materially) without rising to the level of enforcement using what is currently understood to be the FDIC threshold for enforcement. When this characteristic is combined with the inaccuracy of race/ethnicity identification for some groups, disparate treatment could be pervasive for some groups without rising to the level of enforcement.
Procedures Don't Distinguish Well Between Small but Widespread Preference and Large but Unusual Preference
Figures 7, 8, 9 and 10 show that there is no difference between a small but widespread preference and a large but unusual preference. Most people would view these cases very differently; most would agree that widespread small preference is probably more worthy of enforcement than infrequent large preference; in the latter case, the vast majority of non-Hispanic whites who didn't get the preferential rate will be just as angry as Hispanic borrowers that did not get the preferential rate.
Relative Inaccuracy of Race/Ethnicity Identification Reduces Likelihood of Enforcement But Can Cause Enforcement When Perfect Identification Would Not
It is clear in Figures 21 and 22 that the inaccuracy in race/ethnicity identification relaxes the enforcement region for most banks, but that it can lead to a small number of scenarios where enforcement could occur when perfect identification would not cause enforcement.
Figures 22 and 26 show that when the accuracy of identification is higher for Hispanic borrowers than non-Hispanic whites, there is a wide degree of overlap in the 95% confidence intervals for the perfect identification scenario and for the 70%/60% correct identification scenario. In Figures 21/22 and 25/26, it is clear that for some portfolios, enforcement would occur due to errors in identification when enforcement would not occur with perfect identification.
If this is the procedure used for race/ethnicity identification, then enforcement actions could occur in some circumstances where they should not occur. For banks that are aggressive in working to identify problems before an examination, this presents further frustration. The easiest way to estimate race/ethnicity is to purchase data from a third party provider. Since some third party data includes self-reported race and ethnicity, it is likely more accurate than the procedure thought to be used by the FDIC. With the more accurate identification, the bank's work might not find disparate treatment when the FDIC ends up finding disparate treatment due inaccuracy of the race and ethnicity estimation.
It is Unlikely that Disparate Treatment of Black Borrowers Would be Identified
The inaccuracy in the identification of blacks makes it unlikely that even widespread disparate treatment would rise to the level of enforcement. Blacks who are descended from slaves frequently have the same surname as the non-Hispanic white slaveholder; surnames do not provide useful information for race/ethnicity identification in this case, resulting in some number of black borrowers being mis-identified as non-Hispanic white borrowers. This makes it unlikely that pervasive disparate treatment of black borrowers would be identified.
Recommendations to Avoid Disparate Treatment
Given that churches, synagogues, mosques and many other religious and social institutions have very strong immigration histories and resulting race and ethnicity patterns, if a bank's workforce is disproportionately non-Hispanic white and the bank allows relationship-based price negotiation, the bank can unintentionally have a disparate treatment pricing problem that is statistically significant. If a bank’s workforce is predominately non-Hispanic white and the borrower pool for a product is predominately Hispanic, the bank is a higher risk for an unintentional disparate treatment problem. There are a couple of actions that a bank can take to avoid problems:
- Eliminate rate negotiation without exception.
- Work to change the lending workforce racial/ethnic composition to match the borrower pool racial/ethnic composition, and hope the all lenders maintain relationships within their respective racial/ethnic groups. If the racially mixed lending workforce maintains relationships with their income peers rather than racial/ethnic peers, this may do nothing to resolve the problem.
- Increase marketing and originations to black borrowers and other groups that will disproportionately be mis-identified as non-Hispanic white borrowers. The previous two items work to prevent disparate treatment while this item merely masks a problem; I'm not remotely comfortable making this recommendation, but it would have the effect of diluting the non-Hispanic white pool with more borrowers who did not receive a preference. This item is a highly undesirable side-effect of the error patterns in the race/ethnicity approach thought to be used by the FDIC.
Responding to a Finding of Disparate Treatment
If your bank is subject to a finding of disparate treatment there are a few things from an analytical standpoint that you should do in response:
- Make sure that you included ALL credit-worthiness data that your bank uses in underwriting as part of the dataset provided to the FDIC. If you only include a subset of the data elements used in underwriting, it is likely that a multi-variate analysis will result in an incorrect finding of disparate treatment when the use of ALL credit-worthiness data would not result in a finding of disparate treatment. If your bank gives rate breaks for automatic payments or presense of a checking account, include indicators for these accounts. Anecdotally, providing an incomplete dataset is a common self-inflicted problem that consumes significant time and money for both banks and regulators. Since the 70%/60% curve in Figure 22 is quite far to the right, it is likely that the anecdotal evidence is correct.
- Discuss this item with your attorney before proceeding. Perform a data enrichment with third-party race/ethnicity identification that is more accurate than that used by the FDIC, and perform an analysis to identify disparate treatment. If the more accurate race/ethnicity identification does not support a finding of disparate treatment, bring this to the attention of regulators. You may (but probably do not) have a portfolio where the specific characteristics of race/ethnicity identification cause a finding of disparate treatment when perfect identification would not.
- Examine your portfolio for frequent small-preference loans or unusual large-preference loans. If your portfolio has a small number of large-preference loans, your bank may have a insider fairness problem rather than a race/ethnicity disparate treatment problem. This still isn't a good thing, but is perhaps better than a finding of disparate treatment.
There is the old saying “there are lies, damned lies and statistical lies;” in discussions of regulatory use of statistics in Fair Lending analysis, bankers who avoid learning statistical terminology do themselves, their bank and their customers a disservice. A banker that ignores the issues of workforce racial and ethnic composition in a racially and ethnically self-segregated community does the bank and the bank’s customers a disservice. When a regulator uses inaccurate borrower race/ethnicity identification and arrives at a finding of statistically significant disparate treatment, the regulator should take steps to confirm that enforcement would not occur with perfect race/ethnicity identification.
The simulations discussed in this article show that it is disturbingly easy to have a statistically significant disparate treatment problem that is not material and that the enforcement region is very nebulous and can vary by a factor of 2 due to the misidentification errors in the procedures used to estimate race and ethnicity.
Deriving a closed-form equation to estimate the probability of enforcement when perfect race/ethnicity identification would not cause enforcement would be useful, as would graphical models to illustrate the characteristics of more racially complex populations than were considered here.
- Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities
- Anatomy of a Fair-lending Exam: The Uses and Limitations of Statistics
The charts in this article were prepared using R, RStudio and the knitr package to integrate text and statistical analysis. The charts use the ggplot2 package. Multi-core parallel processing for the simulations uses the doMC package.
print(proc.time() - startTime)
## user system elapsed ## 11545.773 215.456 2250.363
- Written by Bruce Moore
- Hits: 3066
Stopping Phone Spam from Rachel from Cardholder Services
Over 80% of the calls on our home phone are spam marketing calls of one type or another. Our home phone line gets frequent calls from “Rachel at Cardholder Services” with a social engineering scam to get your credit card number. Sometimes her name is Carmen or some other name. Of late, the caller ID information has been spoofed--a felony punishable by a $10,000 fine for each violation. Blocking unknown callers doesn't do any good, because most of the scam calls have caller ID information--though the caller ID is bogus. In some case the scammers calling our number have spoofed United Parcel Service, while in others they spoofed a number a few digits off that is used by a residence a few blocks away, and occasionally our number itself. It had gotten to the point that my wife and I were starting to use our cell phones to call one another at home, so I started looking at some solutions, and ended up with a three-layer system that now catches most of the spam calls. The first layer is call blocking at our telco, the second is a service called NoMoRobo, and the third is a low-power computer running a program called Network Caller ID. The article that follows talks about how to implement this and other topics on telephone spam:
- Turning on Call Blocking at Telco
- Sign up with NoMoRoBo
- Political Robo Calls
- Using Call Tracing to Prepare to Turn over to Law Enforcement
- File a Complaint with FCC
- Alternatives for Blacklist Devices
- Installing Network Caller ID Package and Enabling Blacklist Hang-up
- Results from Installing NCID
Turning on Call Blocking at Telco
The first step was logging on to our telco and searching through the features on our account to find call blocking; our telco allows us to block up to ten specific numbers, or block anonymous calls, but not both. Since most of the calls were coming from spoofed numbers, I checked the box for blocking a specific list and started filling in the numbers off of our caller ID. This step cut the volume of scam calls from 10 per day to 2 per day. The remaining scam calls were mostly ones that were anonymous and did not spoof the caller ID.
For this call blocking, it is probably best to put in at least your own phone number, since this spoofing attack is not likely to be widespread enough to end up in one of the blacklists described below.
In the Google Voice interface, you can block individual phone numbers that have called your Google Voice number. This is an important step, as many of the “Google Listing” spam appears to use Google Voice directly as a way to avoid having Google Voice forward to your external phone numbers.
Change to a Telephone Provider that Supports NoMoRoBo and Other Call Blocking Features
If your telephone provider does not support NoMoRoBo or provide any other call blocking features, consider switching to a provider that does provide call blocking features. Strictly for cost reasons ($30/month) we switched to voip.ms, a voice over IP service (VOIP) that supports NoMoRoBo. Voip.ms also provides a lot of call blocking features that Verizon/Frontier did not offer; it is not as robust as the NCID solutions that I describe later, but it does allow 500 block numbers instead of the 10 or 20 that Verizon/Frontier allowed, and it has the capability to do “regular expressions” for evaluating the caller ID line. The cost will be about $5/month for our typical use, with about $85 in initial costs for hardware and setup.
To make this work on all of the phones in our house, I installed an Obi202 box (about $70 of the $85 total cost)to connect the VOIP line to our home phone wiring. Setting this up requires some technical skills. You should be comfortable configuring IP addresses and opening ports on a router before you attempt this.
There are three caveats to going to a VOIP service:
- They don’t claim to provide telco level reliability for 911 calls. You can set it up, but you should not go this route unless you have a backup approach for calling 911–a cell phone will do fine.
- Setting up your outbound caller ID takes some doing, and requires a one-time $10 charge.
- If you are on Frontier, when you port your number, they will close all of the services on your account including Internet and TV, and the customer service people do not know that two months in to the transfer from Verizon to Frontier. Getting Internet working again will require several calls.
All said, the transfer to VOIP has worked well, and it appears to do a better job of spam call blocking; I think voip.ms is transfers calls to NoMoRoBo faster than Verizon/Frontier, as the hang-up occurs midway through the first ring most of the time, and I think it may actually hang up before the first ring in some cases.
Sign up with NoMoRoBo
The second step was easy and fairly effective. Because the robo-dialer scam problem has gotten so bad, some business have started to help address the problem. Nomorobo is one such service. I’m not sure how they make their money at this point, but I suspect that they will start offering subscriptions or will offer the service through telcos at some point. In any case, my wife signed us up and it works similar to the Network Caller ID (NCID) system described below, but it is much easier to set up. The service is currently limited to phone lines that can ring simultaneously in two places–primarily VOIP. The phone rings once and then Nomorobo looks at the caller ID and hangs up if the number is on their list.
In practice, Nomorobo has hung up on some calls from numbers that were not in my NCID log yet, and in other cases, it hung up on phone calls that were legitimate; there is no way to white-list numbers that I can find. Fortunately, the NCID log is easy to use, so I could recognize the number and call it back.
Not all telcos support NoMoRoBo; in particular Google Voice and MagicJack do not at this writing. See the NoMoRoBo Supported Carriers list in the sign screen to check for yours.
Political Robo Calls
The legislation that requires legitimate telemarketers to honor the Do Not Call list exempts charities and political robocalls. Because NoMoRoBo is an opt-in service, NoMoRoBo has the option to block political robo calls, but you must check off an item in your profile to do so. In practice, it isn’t all that effective at blocking political robo calls, and may be the subject of some manipulation. In a recent primary, NoMoRoBo did not stop many (if any) of the robo calls from PACs on one side of the contest, but it did stop the second and subsequent in-person calls from a resident of my town who was a volunteer for the other candidate. As I maintained my NCID blacklist, it was reasonably effective at blocking the PAC robo calls, b
File a Complaint with FCC
The third step initially felt like a waste of time, but has turned out to be quite important; you should file a complaint on the FCC web site. This may not do anything in the short run, but will help in the longer term; the FTC has actually sponsored a contest for solutions on dealing with “Rachel Robocalls” and now publishes a list of phone numbers associated with complaints. The list is updated monthly, and is very useful; since I installed it on the Network Caller ID server described below, it has caught almost 100% of spam calls. There are Android apps that appear to use this list as well. Filing a complaint with the FCC is an important part of fighting robocalls.
Using Call Tracing to Prepare to Turn over to Law Enforcement
The next step took a little bit more research, and may cost me some money. After a scam call that used a spoofed caller ID (a felony), I pressed *57 which initiates a telephone company trace that is kept for 90 days and which the telco can turn over to law enforcement. Some sites indicate that telcos charge for this while our telco web site is silent about any extra charges for traces. It will take a while to find out whether or not this does anything, and whether or not there is enough information to pass on to law enforcement.
Alternatives for Blacklisting Devices
There are both commercial and open source software devices that will allow you to blacklist specific phone numbers or in some cases patterns. The commercial devices are easier to set up, but don’t necessarily allow you to specify patterns while the open source devices (Network Caller ID) are more flexible but are also more complex to set up. The next sections describe both some commercial devices and an open source device that I am using successfully.
I have not used these devices, but they have been recommended in other reviews, and the features are features described are features that I have found to be useful in my NCID set up.
- Digitone Call Blocker Plus. This is a central device; you may have to go to it to add a number.
- Panasonic Home Monitoring telephones with Call Blocking. These are generally limited to 250 numbers; my block list is rapidly approaching that length. Phone systems have the advantage of allowing you to add block numbers from any handset.
Open Source Devices
Open source software is available for doing call blocking. These can be configured to run on a Raspberry Pi, an old laptop (especially if it has a modem) or any computer that is left running. The adventurous might even be able to get it running on an old router or Western Digital NAS device.
- Network Caller ID (NCID). I use this very successfully; instructions for configuring this on a Raspberry Pi is discussed below.
- Telemarketing (Junk) Call Blocker. I have not used this.
- Various Android applications
Network Caller ID (NCID)
Network Caller ID (NCID) is a great open source package for setting up sophisticated call blocking. The remainder of this article is dedicated to setting up NCID on a Raspberry Pi low-power server.
Installing Network Caller ID Package and Enabling Blacklist Hang-up
Network Caller ID (NCID) is much more technical than all of the previous solutions, but is by far the most flexible. For users that are comfortable with using the command line, this is pretty easy, but it will be difficult for users that don't regularly use command-line utilities. The open source program Network Caller ID (NCID) allows you to hook up a modem to a phone line and then automatically hang up calls that match rules in a blacklist file. This program will address anonymous calls and repeated spoofed calls simultaneously--something I can't do at through the telco web site. Call blocking at my telco won't allow me to block numbers that have a leading 1, as in 1-xxx-xxx-xxxx where the caller ID spoofers put a 1 in front of the area code. NCID will allow me blacklist these numbers.
NCID is available for Linux, Mac and Windows. To find installation instructions for your particular platform and/or distribution, search on ncid, ncid-client, ncid-mythtv, and ncid-pop. For the most recent versions of Ubuntu, this may be part of the standard repository. There is a binary available on the NCID web site for Cygwin, so it should be possible to run NCID on an old Windows laptop if you don't want to load a Linux distribution, though I have not tried this.
NCID has an app for Android that allows you to send caller ID and SMS text information from your cell phone to NCID and then to your computer display, allowing you to know when your cell phone rings when it isn't right next to your desk. I haven't configured this feature.
NCID won't completely block the call, but will automatically hang up after the first ring if the call matches one of the rules in your
Installing NCID on a Raspberry Pi Server
For my NCID installation, I used a TrendNet TFM-561U modem which was about $25 at a local computer store. I attached it to a Raspberry Pi low power server that I use for a few utility functions that aren’t computationally intensive. NCID was’t available in the standard Raspian repositories, but I was able to get useful instructions from the NCID web site, but these have subsequently been deleted..
The first step is to download the .deb packages for your architecture from Sourceforge and then use
gdebi to install the .deb packages:
dpkg -i ncid_1.8-1_armhf.deb dpkg -i ncid_gateway_1.8-1_armhf.deb apt-get install -f
Originally, I ended up having to use the gdebi package to install NCID, but have successfully used
dpkg. Gdebi attempts to do more resolution of package dependencies than dpkg, and has a reputation for doing a less brute-force job than apt.
To use NCID, you have to configure
/etc/ncid/ncidd.conf to make a couple of changes to turn on blacklist call hangup and configure your modem:
- Uncomment the line for
set ttyport = /dev/ttyACM0to enable the TrendNet modem. Which line you uncomment or change will depend upon your platform, distribution and modem type.
- Uncomment the line for
set hangup = 1to cause NCID to hang up on calls that match a black list.
- I did not need to modify the init string for the modem, but one article reader had to add
AT+VCID=1to the modem initialization.
Configuring the NCID Blacklist
To start hanging up on anonymous and blacklisted numbers, I made the following changes to the
^UNKNOWN ^unknown ^Unknown ^No Caller ID ^OUT-OF-AREA ^UNAVAILABLE ^CONSUMER SVCS ^DMCR ^RING ^000
"OUT-OF-AREA" has blocked some legitimate calls from Google Voice numbers. I had to add these numbers to the
Make sure to include numbers both with and without the preceding 1 for long distance.
If you have problems with NCID hanging up on ALL calls, look in your
ncidd.blacklist for something like
as this appears to cause it to hang up on all calls.
You should download and format the FTC complaint list as described in the related article Download and Format the FTC Robocall Complaint List for NCID. This list has caught almost 100% of robocalls since I installed it on my NCID server in early November, 2015.
Installing and Configuring NCID Clients
Although we now have caller ID on all of our phones, I wanted to have it display on my computer terminal. For this I downloaded and installed the NCIDPop package for Mac OS X. The first time it came up, it brought up a configuration dialog where I had to put in the IP address of the Raspberry Pi server that had the modem attached to it. NCIDPop also has a feature where it can use the
say text to voice command to read the phone number to you. In some cases, this is annoying, but in others it is useful.
The NCID Android application can optionally transfer calls on your Android phone to the NCID server. This can be useful in keeping track of robo callers and adding them to the black list. There are a number of other features that I'm not using at this point.
It was nice to be able to put caller ID on all computers using only one modem.
Results from Installing NCID
After installing Network Caller ID, it took me a few days of adding rules for various marketing robo dialers. After five months, I probably spend about two minutes per day adding new spam phone numbers to the
At this point NCID is automatically hanging up on about 50% of all robo dialer calls and is allowing almost all legitimate calls through. NoMoRobo catches a few that NCID does not, and both miss about 10-20% of the spam calls. NCID hung up on two legitimate calls that I can't figure out what rule caused the hangup. I have programmed it to hang up on all calls that come in without caller information including "OUT OF AREA"; this is a problem for Google Voice and other voice over IP (VOIP) telephone numbers and has blocked a small number of legitimate calls. You can avoid this for specific numbers by putting the number in the
Results from NCID and NoMoRoBo
As calls come in during the month, I add all spam calls that got past NCID into the NCID blacklist. The number of valid calls can be calculated by joining the NCID blacklist file with the NCID call log on the phone number as shown in Figure 1. The average numbers are annoying:
- About 0.58 calls per day are valid.
- About 0.14 spam calls per day are stopped by NCID based upon the local blacklist phone number (after November 1, 2015).
- About 0.21 spam calls per day are stopped by NCID based upon the FTC complaint list (after November 1, 2015).
- About 1.7 calls per day are spam calls that are either blocked by NoMoRoBo or get through to ring multiple times.
- After February 2015 43.5% of calls were valid, while 56.5% were spam calls.
It is important to note in Figure 1 that many of the calls are labeled as “NoMoRoBo or Pass-through Spam” are stopped at one ring by NoMoRobo. Unfortunately, I don’t have a way to identify these; I may eventually look at the NCID code to see if there is a way to identify calls that only ring once, and use a different code in the
In early 2016, the phone line was ported from Verizon to a VOIP provider. This broke the NCID installation, but also caused NoMoRoBo to be more effective; the VOIP ring was delayed a few tenths of a second, allowing NoMoRoBo to block the call before the VOIP line rings and NCID blocks the call. The increased effectiveness of NoMoRoBo was a disincentive to fix NCID, and thus much of the data for 2016 is missing.
For additional information, you may be interested in other articles on NCID and stopping phone spam:
- Current Month Phone Spam Call Blocking Effectiveness shows the effectiveness of the various call blocking methods on our residential land line.
- Stopping Rachel from Cardholder Services covers multiple ways to address phone spam, including setting up an NCID server.
- Download and Format the FTC Robocall Complaint List for NCID shows how to download and format the FTC complaint list to give you a list of spammers before they call you.
- Using NCID on Two Phone Lines shows how to add a second modem to your NCID configuration.
- Written by Bruce Moore
- Hits: 25383
Preparing for a Fair Lending Examination Statistical Analysis
At the Independent Bankers Association of Texas (IBAT) Lending Compliance Summit in April, 2014 and at the Southwest Graduate School of Banking (SWGSB) Alumni program in May, there was much discussion about the regulatory focus on Fair Lending in general and the statistical analysis that is being done to identify disparate treatment. The article that follows is the first in a series of three that discuss how banks can prepare for an examination and minimize the likelihood of problems, how a bank might proceed with an in-house study to identify and fix any disparate treatment problems and finally, how some statistical examples to help explain several questions that came up at the IBAT and SWGSB gatherings. For additional reading, you may wish to look at How a Bank Can Get in Trouble with Fair Lending Statistical Analysis and Doing Your Own Fair Lending Statistical Analysis.
The discussion of preparing for a disparate treatment statistical analysis is divided into the following sections:
- Fix Data Quality Problems
- Include Calculated Items from Credit Report
- Perform Analysis of Indirect Loans by Dealer/Originator
- Estimate Negative Equity for Indirect Loans
Fix Data Quality Problems
When I worked in IBM’s Global Business Intelligence Systems datamining group, we had a saying:
There are customers that know they have a data quality problem, and there are customers that don’t know that they have a data quality problem.
A dataset can be pristine and balance to the penny from an accounting perspective, and yet be a nightmare from the viewpoint of performing any statistical analysis. If a regulatory statistical analyst receives a poorly prepared dataset, the analyst will will spend so much time cleaning up data that little time will be available to distinquish between unusual datapoints that can be discarded as mistakes and others that contain important information and must be included.
The FDIC Compliance Manual -- January 2014 describes risk factors for discrimination to be used in planning an examination on page IV-1.6:
C2. Prohibited basis monitoring information required by applicable laws and regulations is nonexistent or incomplete.
C3. Data and/or recordkeeping problems compromised reliability of previous examination reviews.
Don’t send a poorly prepared dataset for statistical analysis. As a banker, you are much better off if the analyst has more time and spends more time looking for data elements to explain racial/ethnic/gender patterns in your dataset. If the analyst spends hours cleaning up a poorly prepared dataset, expect to have examination problems.
All of these data quality analysis steps can be performed in Excel, though the corrections should be done on the source system so that you don’t have to repeat the clean-up process every year. Most IT personnel would probably choose to use a programming or scripting language that allows regular expressions and other features that make data manipulation easier.
Catch up on Returned Mail Address Clean-up
All returned mail identifies an address problem--either an old address, an incorrect one, or one that is entered so badly that even the U.S. Post Office can’t figure out what it is--and I am amazed at what the Post Office can deliver correctly. Before you do a data pull for any type of statistical analysis, make absolutely sure that you are caught up on fixing returned mail.
The statement mailing firm that you use probably does address standardization as part of the service that they provide, but the standardized addresses probably don't make it back to your core system. Investigate ways to get the standardized addresses into your core system.
If you don’t use address standardization software to identify and correct spelling, format and abbreviation problems in addresses, at least do a pull and get a count of addresses by city and state. Sort the list by the cities with only one account--these are probably misspellings. If you don’t have address standardization software, you will be amazed at how many ways people can spell "Dallas" and "Houston." The Post Office correctly delivers a lot of mail that is badly misspelled. Make sure that all of the states abbreviations are valid.
If you don't have standardization software, you can use a geocoder to attempt to find the latitude and longitude of the address; if the geocoder can't figure out the latitude and longitude, it is either a Post Office Box, a Military address, or an invalid address. The next article in this series, Doing Your Own Fair Lending Statistical Analysis, has a significant discussion about geocoders and geocoding.
Verify Date Formats and Content
Most core systems do a very good job of preventing bogus dates from being entered, but you should check to make sure--especially for ancillary systems and datasets provided from third party vendors. At a minimum, check the following:
- Verify that all dates are valid dates. For example, 2/30/2014 is clearly an invalid date, but could get into a poorly designed software system, or be part of an incorrectly generated data extract from a third party system.
- Verify that all dates are in the right order. For example, the loan payoff date should always be after the loan opening date. There are a variety of other date relationships that should be maintained, but which sometimes aren’t.
Include and Standardize Indirect Dealer/Originator Names
If you do indirect lending, make sure to include the name of the dealer or originator of the indirect loans, and that the loan type and originator are coded correctly and consistently.
Verify Interest Rates Against Rate Sheet
Take an extract of your historical rate sheets, merge the rate sheet with your loan data by time of loan origination, calculate the difference between the rate sheet for the time period of loan and then rank by absolute value of the difference. Look at the extreme values--these are probably mistakes. Investigate the reason for the largest differences and add a code or comment to explain why these particular loans have unusual deviations from the rate sheet. If they are mistakes, work with the borrower to correct the loan.
Code Collection and Other Loan Modifications Correctly
Make sure that all loan modifications and rework of loans that were messed up somewhere along the line are coded in a way that they can be easily identified and understood. It should be easy for an analyst to figure out that a goofed up loan entry that was corrected and re-issued under another number can be legitimately excluded as an outlier.
Handle Significant Digits Properly When Exporting--Don’t Truncate or Round
In the core systems, numbers can be stored in a variety of ways--some quantities are stored as floating point, some as decimal, some as integers, and occasionally as characters. Each of these data types works differently for rounding and in some cases may just truncate everything to the right of the decimal point. If you extract using a data type that truncates or take a number with 5 decimal places and round it to 2 decimal places, you can introduce some unusual patterns in your dataset.
Always export in the data type that is used to store an element, and always export the number of digits that are stored without rounding wherever possible.
Include Calculated Items from Credit Report
Perhaps the biggest problem that you may encounter in a Fair Lending statistical analysis will be loan decisions that are based upon information that is present on a text-based credit report. If you calculate loan to value, debt to equity, or medical bill charge-offs to total charge-offs from a credit report, but don’t include that in the extract, you will almost certainly have problems during an examination. If these ratios have a strong statistical relationship with race/ethnicity/gender (likely, since income has a strong relationship), race/ethnicity/gender will show as a statistically significant, and you will have have to spend a lot of time and money providing a corrected extract plus the aggravation of dealing with examiners over Fair Lending disparate treatment issues.
If you include the additional credit worthiness-related variables that you used in the underwriting process, race/ethnicity/gender will probably not show up as statistically significant, and your Fair Lending examination will probably go as smoothly as Fair Lending examinations can go.
If your origination system does not calculate all of the ratios that you use, pressure them to add the additional ratios so that it is easy to extract them. This isn’t so much to make Fair Lending examinations easier, as it is to make fraud and abuse analysis easier for you to do. You should use the Fair Lending dataset for a fraud and abuse analysis; you will probably quickly recover the cost of preparing the data set and will start using your fraud and abuse dataset as the one you submit for Fair Lending analysis.
Perform Analysis of Indirect Loans by Dealer/Originator
If you have an indirect auto loan program, this is an area where race/ethnic/gender discrimination may be occurring without your knowledge or control. It is also an area where there is significant opportunity for fraud and abuse by an auto dealer, or specific employees at an auto dealer. The analysis that you do for indirect lending should be at least quarterly, as salespeople move from one dealership to another fequently--a dealer that has demonstrated exemplary performance for years can go south quickly when a new sales person comes onto the floor.
The discussion that follows is really oriented toward dealer-level fraud and abuse problems rather than Fair Lending, but if a dealer or an employee at a dealer is willing to commit fraud or abuse, discrimination based upon race/ethnicity/gender would not be a far stretch and vice versa. To get to this point, you will have put in a fair amount of work; you should reap the benefit of that labor, and a simple fraud and abuse analysis is the way to do it. For regulatory purposes, this analysis may or may not constitute a review of Fair Lending practices that would require you to correct any problems found; that is a question for your attorney.
Look at Fraud and Abuse Metrics
For a simple fraud and abuse analysis that can be done in Excel, calculate and rank dealers by the following quantities:
- First payment defaults
- Defaults immediately after end of recourse period
- Defaults and delinquency by age
For a dealer that ranks at the top of each list, investigate individual loans that have defaulted or are delinquent. It is likely that this work will be financially rewarding to the bank.
Rank by Dealer Participation Fee
Rank the loans by dealer participation for each dealer, and for all dealers. For the highest participations, are there any patterns? A high dealer participation could be an indicator for negative equity rolled into a deal for benign reasons, it could be negative equity rolled into a deal in anticipation of bankruptcy, it could be good negotiating on the part of the dealer, or it could be the result of discrimination based upon race/ethnicity/gender.
Estimate Negative Equity for Indirect Loans
If you have an indirect lending program, negative equity rolled into a deal is a strong predictor of a lot interesting behavior. Estimating negative equity is painful if not impossible, as vehicles rarely sell for the Manufacturer’s Suggested Retail Price (MSRP) and there really isn’t a good way to capture the "value" of the vehicle. If you do capture MSRP and Kelly Blue Book (KBB) or a similar metric, it is worth calculating the difference between the purchase price and the MSRP/KBB as a proxy for negative equity.
Try to figure out a way to estimate the negative equity rolled into a loan. The dealer knows this exactly, but most lending systems don’t really have a way to record it. If high dealer participations are due to negative equity, you have a credit risk problem to monitor; if high dealer participations are not due to negative equity rolled into a deal, you absolutely have a customer satisfaction problem (the painfully high loan rate that gives the dealer the room to roll in negative equity or over charge has your name on it each month--not the auto dealer’s name) and you may have a Fair Lending problem.
Although this article is about preparing for a Fair Lending examination statistical analysis, there is little in the steps to this point that is directly related to Fair Lending--most of this preparation is related to general data quality and to simple fraud and abuse analysis. Everything in this article can be done using Excel, though there are other tools that your IS staff may have that are better suited to the task.
- Written by Bruce Moore
- Hits: 3149