BankopediaBankopedia

r squared

Definition

R Squared (R²) — Meaning, Definition & Full Explanation

R squared (R²), also called the coefficient of determination, is a statistical measure that shows what percentage of the variation in a dependent variable is explained by one or more independent variables in a regression model. It ranges from 0 to 1, where a value closer to 1 indicates that the model explains most of the variability in the data, while a value closer to 0 suggests the model has little explanatory power. R squared is calculated by dividing the explained variation by the total variation: R² = 1 − (Unexplained Variation / Total Variation).

What is R Squared?

R squared quantifies the goodness of fit of a regression model—how well the regression line or plane fits the observed data points. It tells analysts and investors how much of the movement in the dependent variable (such as a bank's profitability or a stock's price) can be attributed to changes in the independent variables (such as interest rates, inflation, or market indices).

An R² of 0.85, for example, means that 85% of the variation in the dependent variable is explained by the independent variables in the model, while 15% remains unexplained. This unexplained portion represents errors, randomness, or the influence of omitted variables. R squared is widely used in credit risk modeling, asset pricing, regression analysis of financial performance, and econometric forecasting. It serves as a quick diagnostic tool to assess whether a model is worth using for predictions or policy decisions. However, R² alone should never be the sole criterion for model selection; it must be paired with other statistical tests, domain knowledge, and practical judgment.

Free • Daily Updates

Get 1 Banking Term Every Day on Telegram

Daily vocab cards, RBI policy updates & JAIIB/CAIIB exam tips — trusted by bankers and exam aspirants across India.

📖 Daily Term🏦 RBI Updates📝 Exam Tips✅ Free Forever
Join Free

How R Squared Works

R squared is calculated through a systematic process:

  1. Collect data: Gather observations of the dependent variable (outcome) and independent variable(s) (predictors).

  2. Fit a regression model: Use regression analysis (ordinary least squares or other methods) to find the line (or plane, in multivariate models) of best fit through the data.

  3. Calculate predicted values: For each observation, the regression model generates a predicted value based on the independent variables.

  4. Compute unexplained variation: Subtract each predicted value from the actual observed value. Square these differences (errors) and sum them. This sum is called the Sum of Squared Residuals (SSR) or unexplained variation.

  5. Compute total variation: Subtract the mean (average) of all actual values from each individual actual value. Square these differences and sum them. This sum is called the Total Sum of Squares (SST).

  6. Calculate explained variation: Subtract unexplained variation from total variation (SST − SSR). This is the explained variation, also called the regression sum of squares (SSE).

  7. Divide to get R²: Divide explained variation by total variation. Alternatively, use the formula R² = 1 − (SSR / SST).

R squared values are interpreted as percentages: an R² of 0.72 means the model explains 72% of the variance. Adding more independent variables almost always increases R², even if those variables are irrelevant; adjusted R² (which penalizes extra variables) is often preferred in model comparison.

R Squared in Indian Banking

In Indian banking and finance, R squared is a core concept taught in the CAIIB (Certified Associate, Indian Institute of Bankers) syllabus, particularly in modules on credit risk modeling, advanced financial analysis, and econometric forecasting. The RBI and banking regulators rely on R² and related statistical measures when evaluating the predictive power of internal risk models used by banks for capital adequacy and stress testing.

Banks in India use R² extensively in credit scoring models to assess the explanatory power of variables such as borrower income, debt-to-income ratio, loan tenure, and collateral value in predicting default probability. Insurance companies regulated by IRDAI also employ R² in mortality modeling and premium-setting algorithms. SEBI-regulated investment advisors and fund managers use R² to evaluate the correlation between fund returns and benchmark indices; a mutual fund scheme's R² against its benchmark helps investors understand how much of its returns are driven by index movement versus active management skill.

The National Stock Exchange (NSE) and Bombay Stock Exchange (BSE) research departments publish R² metrics for listed companies' earnings models. In RBI's regulatory framework for stress testing and internal models validation, banks must demonstrate the statistical robustness of models used to calculate capital requirements under Basel III norms; R² is a key diagnostic statistic in that validation process. JAIIB candidates studying financial mathematics and statistics will encounter R² in the context of regression analysis for loan portfolio analysis and economic forecasting.

Practical Example

Ashok Kumar, a credit analyst at ICICI Bank in Mumbai, is building a logistic regression model to predict the probability of default among salaried employees applying for personal loans. He collects data from 5,000 approved loans over the past five years, recording the actual default outcome (0 = no default, 1 = default), along with independent variables: age, monthly income (in ₹), loan-to-income ratio, credit score, and number of existing credit products.

After fitting the regression model, Ashok calculates that the model's R² = 0.68. This means 68% of the variation in default outcomes across the 5,000 borrowers is explained by the five variables in the model, and 32% remains unexplained (due to factors like involuntary job loss, health emergencies, or data errors).

Ashok interprets this as moderate model fit: the model is useful for ranking borrowers by risk and screening applications, but it is not perfect. He decides to add two more variables—employment tenure and savings-to-loan ratio—and refits the model. The new R² rises to 0.74. Ashok then checks the adjusted R² (which penalizes added variables) to confirm the improvement is genuine, not just statistical overfitting. He also examines residual plots and performs robustness tests before presenting the model to ICICI's risk committee for approval to deploy in the loan origination system.

R Squared vs Correlation Coefficient (r)

Aspect R Squared (R²) Correlation (r)
Definition Proportion of variance explained by the model Strength and direction of linear relationship
Range 0 to 1 (always positive) −1 to +1 (can be negative)
Interpretation 72% of variation explained (if R² = 0.72) Weak to strong association (if r = 0.85)
Use Model goodness of fit and predictive power Measuring association between two variables

R squared is the square of the correlation coefficient in simple linear regression (R² = r²), but they measure different concepts. Correlation tells you whether two variables move together; R² tells you how much of the dependent variable's movement is captured by the independent variable(s) in your model. For portfolio analysis, correlation coefficient helps you understand diversification benefits, while R² (in a regression context) shows how much a fund's return is explained by its benchmark.

Key Takeaways

  • R squared ranges from 0 to 1 and represents the percentage of variation in the dependent variable explained by the independent variables in a regression model.
  • R² is calculated as 1 minus the ratio of unexplained variation (sum of squared residuals) to total variation (total sum of squares).
  • An R² of 0.85 means 85% of the variance is explained; the higher the value, the better the model fit—but only within reason and context.
  • Adding more independent variables almost always increases R²; use adjusted R² (R² adjusted) when comparing models with different numbers of predictors to avoid overfitting.
  • In Indian banking, R² is essential for credit risk modeling, stress testing, and regulatory validation under RBI guidelines and Basel III capital requirements.
  • R squared is taught in CAIIB syllabi and is a standard diagnostic tool in bank internal models, mutual fund performance analysis, and econometric forecasting.
  • A high R² does not guarantee causation; correlation and statistical fit do not imply that one variable causes another.
  • Residual analysis and other diagnostic tests must always accompany R² interpretation to ensure the regression model is valid and suitable for decision-making.

Frequently Asked Questions

Q: What is a "good" R squared value?

A: There is no universal threshold, but context matters. In credit risk modeling, banks often target R² ≥ 0.65–0.75. In economic forecasting, R² > 0.50 may be acceptable. In physics or engineering, 0.95+ is expected. Always pair R² with domain expertise and residual diagnostics.

Q: Does a high R squared mean the model is correct?

A: No. A high R² only means the model fits the historical data well. The model can