Editor’s Note: Are You Using Big Data in Appraisals? is a first-time Guest Post from John Fariss, MNAA. John is the owner of Fariss Appraisal Services located in Bakersfield, CA. He is a frequent contributor to George Dell’s Valuemetrics.Info classes and a Moderator for the Community of Asset Analysts©. Recently John wrote an extensive article for The Asset Analyst Report© entitled, ‘Painting With Numbers’.
Big data has been a hot topic within the real estate industry lately, and appraisers are beginning to pick up this. Are 3 competing sales enough to develop a value? For residential appraisers completing work for lenders, that’s all the 1004 requires in the grid. Adding additional sales and even listings is relatively easy in modern reporting software, so should we stop at 3? Shouldn’t appraisers embrace big data too?
First of all, how many sales and listings comprise big data? I’ve heard lots of numbers thrown around: 10, 30, 100. Well, which is it? And how could we possibly include 100 or even 30 comparables in a report? Some will tell you it’s upwards of 100 and that you need a random sample of about 30 for the results to be significant. But I’ll tell you it’s neither. Appraisers do not use big data. And they don’t take random samples.
How can I say that? First of all, we need to define what is big data, in the world of data science. In this world, big data refers to data sets too large or complex to be dealt with by traditional data processing application software (Eakins 2019). The article “Information Overload” by Sarah Everts says that digital big data is now “measured in exabytes, zettabytes, and yottabytes, or 1018, 1021, and 1024 bytes, respectively.” MIT gives further explanation: big data are observations with high volume, velocity, veracity and variety. The data typically exceeds current storage and processing capacity.
We are talking about huge volumes of data used by the likes of Amazon, Facebook and Google to create advertising algorithms. For instance, Amazon knows with strong veracity that 10,000 people bought Widget Y. They also know that of those buyers, 3,000 also bought Widget Z. They can confidently advertise both Widgets to anyone shopping for either Widget.
Real estate does not have this type of data. There is one buyer buying one house, or maybe one investor buyer a few properties, and those may be of the same type or variety of properties. The overall volume of real estate transactions can be relatively high (low compared to big data), but the velocity of those transactions is low, the veracity is medium-high, and the variety is high.
This is quite different from those Widget sales which were occurring at a rate 25-30 per day, and the sales were easily verified, and the variety of buyers was high. Real Estate is lucky to have 25-30 sales in a month in a given market. Real estate is not dealing in big data. So, what type of data do we have?
Real estate data is more appropriately referred to as Wide Data. In comparison to big data, wide data has low volume and velocity, medium-high veracity and high variety. Wide data generally requires prepping data from multiple sources, think MLS or Costar and public records. It requires much more effort in accessing, analyzing and joining relevant data sources.
We spend a great deal of time collecting data from our primary source, verifying it with a second source, filling in the blanks, and combining information to create a useable data set. Our data is wide, it’s not big.
More interesting, there is no set number of observations for either big or wide data. It has more to do with those 4 characteristics of the data: volume, velocity, veracity and variety. Volume is the total amount of relevant data being collected. Velocity is the frequency of incoming data. Veracity is the quality or trustworthiness of the data. And variety is the diversity of data.
Using these characteristics of data, we know that real estate has low volume and velocity. Veracity can be good, but if your MLS is like mine, it can also be pretty poor. Variety is wide in real estate; we have a variety of buyers and sellers, we have a variety of property characteristics: type, quality, condition, age, etc. We have wide data.
So, let’s get away from this term of big data. We aren’t working with it, and it doesn’t refer to a specific number of observations (sales or listings). Our data is wide, and it’s our job as appraisers to wade through all this wide data and determine the relevant, competing sales for a subject property. We must sift through the wide data down to the competing market segment. This could be 8 observations, or it could be 108 observations. Either way, it’s still not big data. What is important is that it is the complete competing market segment.
The answer then is, “No, you are not using big data in appraisals.”
Michael V. Sanders, MAI, SRA
December 28, 2022 @ 12:30 pm
John, very informative and useful. Big data gets a lot of media coverage, and you are entirely correct that we are not dealing with THAT. Never heard the four v’s used to describe big data, but I did a little additional research, and I’ll certainly remember that in the future. Thanks for your continuing contributions to the Asset Analyst community.
Mark Hastert, ASA
December 30, 2022 @ 5:07 am
Sales comps cited in our reports aren’t, and never have been “proof”, merely examples that support our opinion.
J.parsons
January 4, 2023 @ 12:40 pm
algorithms. The new terminology or definition is “Imbedded Silicon Valley Bias”. Algorithms have been proven (when made public) to be biased. One of the reasoning behind this is that 90% or more of the software developers in Silicon Valley are white males. AI also is proven to mimic what is reality. Now, 95% of the users of algorithms (appraisers/asset analysis) are also white. Most MLS system have statistical data available at the push of a button.
Was This Home Overimproved or Underimproved? – Cleveland Appraisal Blog
January 19, 2023 @ 1:04 pm
[…] Are You Using Big Data in Appraisals? – George Dell’s Analogue Blog […]