Multiple regression works very well under some conditions. We ask: Are these conditions true in most real property valuations?
This is actually a vast topic, so we can only reflect on some of the conditions or assumptions which are required for a multiple regression to be valid.
Much (if not all) “advanced” appraiser education makes some sweeping or hidden assumptions about the underlying data. We start with the most common “statistical errors and mistakes.”
Regression math is a completely separate model from the inferential probability model. Yet current “advanced” appraiser education assumes somehow that the regression model always magically rests on a random sampling model. (Gotta show off those clever statistical tests!)
There is a major problem here.
First, the population to be studied is not the census tract, or the zip code, or the neighborhood, or the whole town. It is certainly not all the houses, or properties in an area. It is just the sales. Sales of similar properties. (Some have argued that we can pretend that all the houses in the neighborhood sold, and that ‘picking comps’ is somehow a random experiment.) Right!
Second, seldom do we have the dozens of similar/competitive/recent sales required for a statistically significant sample in order to approximate multiple regression coefficients. Sold transactions are our ‘population’, not all the houses. My house is not for sale. Neither is the apple in my refrigerator. Neither is part of the supply. I’m gonna eat the apple, and live in my house.
Thirdly, and laughingly simple. Given today’s instant, complete (or substantially complete) data and computer power . . . there is absolutely no reason to not analyze the whole data set. No reason. Using full set of similar competitive sales provides better accuracy, reduced variation, and clearer communication of the market situation.
Multiple regression coefficients provide a marginal result. This means that at a given number (let’s say the subject square feet), all other predictors are not at their fixed point, but also vary in their own different ways. Unfortunately, this does not coincide with appraiser thinking and “appraiser adjustment tables” shown as “reasonable” adjustment amounts. In marginal models, some coefficients, at their margin, may be even negative, and different from ‘rule of thumb’ appraiser adjustments. (This difference is due to correlation “multicollinearity” as between predictor variables, like living area, bedrooms, site size, baths, . . .)
Appraiser adjustments tend to be conditional. This means you pretend you hold all other variables constant. This is a fiction. In reality, appraiser mental analysis is somewhere like a combination of both conditional and marginal thinking. To my knowledge, ‘conditional’ versus ‘marginal’ has never been clearly defined in the appraisal literature. (It’s a clumsy topic to address.)
Worse yet, there are a pile of other assumptions under the math of regression. One of these is that you did not leave out any functional variables. Another is that categorical or ordinal variables have been properly transformed to a numerical (interval or ratio) variable prior to input. But wait, there are other assumptions: homoscedasticity, normality, no serial correlation, and linearity. Seldom or never are these assumptions true for real estate market segments. Real estate market data is heteroskedastic, skewed, autocorrelated, and nonlinear. Darn.
Ban Multiple Regression!
Scott
February 8, 2023 @ 2:46 am
I once used all sales (about 200) to build a regression model in my neighborhood. Next I took a random sample of 30 of those and applied the model. The total of 30 predicted values was within 1% of the total sold price. But individual predicted sold prices varied from their actual sold prices. Some were close. Others were 10% high or 10% low. Very good tool for a portfolio of houses. Lousy for an appraisal of a single house. Even if individual sale prices could be predicted with the MVR model, It makes no sense to pry a coefficient or two out of the MVR model and cram it into a different model like the sales grid.
Tom Stowe
February 8, 2023 @ 8:07 am
If possible, I would interview the buyer (and or Realtor) and ask what other properties did they consider buying and why? I would also ask the same question for the sales I considered for the analysis.
This quickly built up an understanding that buyers were selecting or not selecting properties available to them at the time they needed to purchase. So, selecting sales data based on buyer substitutes became more important than finding properties with identical characteristics. (A warehouse in Town A might be just as useful to the buyer’s distribution needs as the identical warehouse in Town B. Or conversely, any house in school district C is more valuable to that buyer than an identical house in school district D.)
Gary Kristensen
February 8, 2023 @ 9:58 pm
Mind blown.
Michael V. Sanders, MAI, SRA
February 15, 2023 @ 11:09 am
I’m going to be a contrarian here. I understand the argument for inferential statistics under the assumption that we are analyzing a POPULATION of sold properties within whatever time frame and geographic area we specify. But I believe that we are actually analyzing a SAMPLE of properties that happen to have sold over the specified time frame and within the specified geography (with whatever other filters we might also use). Not a random sample to be sure, but a sample nonetheless. We use this data sample and associated analyses to develop useful information, which we might reasonably apply to all the similar properties within the geographic area, whether they sold or not.
We can agree to advocate for using ALL relevant sales data, whether we call it a population or a sample. But considering the dataset as a sample (of sales within the population of all properties) makes multiple regression a valuable analytical tool. Of course the coefficients represent marginal values, which is exactly what the adjustments in a standard sales comparison grid are supposed to be. Can multi-collinearity between independent variables be a problem . . of course, but a correlation matrix can identify those variables that are highly related (common sense works well here too). Note also that multi-collinearity doesn’t invalidate the model, although it might affect the individual coefficients for the collinear variables.
I’m not aware of any assumption that requires all functional variables to be included in a regression model. The more functional variables can be accounted for, the higher the coefficient of determination (R2), but leaving some out doesn’t necessarily invalidate the model, it just means that you have more unexplained variation, the concern being that you want to be careful that the variables used don’t become proxies for those left out. There are a variety of ways to properly account for categorical variables and lack of linearity (variable transformation, for example), without dismissing multiple regression out of hand.
Having said all that, multiple regression is not the right tool for every valuation assignment. You need lots of data with some degree of homogeneity, and some skills in applying this type of modeling (plug here for the AI Quantitative Analysis class). But if you have an assignment that would benefit from examining multiple variables within the context of a single model, and you have adequate data, I will argue strongly for use of this accepted analytical technique.
Julie
March 8, 2023 @ 9:44 am
Definitely NOT. Advanced regression techniques, such as elastic net regression, are still the BEST option in rural markets. There is NO data for paired sales analysis in rural New England. Advanced regression models are the ONLY way to support adjustments in rural areas.
Juile
March 8, 2023 @ 9:50 am
In addition, advanced regression models take collinearity into account.
Paul
April 3, 2024 @ 6:09 pm
Excuse me while I locate my dictionary… 🙂