A Natural Language Processing Model for House Price Forecasting

Market Overview:

This artificial intelligence (AI) algorithm quantifies the hidden value embedded in property descriptions using natural language processing (NLP) to more accurately forecast property values. The global NLP market is forecasted to be $16.07 billion in 2021, an increase of $8.44 billion since 2016. The majority of real estate transaction sites use regression models, which are based on property features such as the number of bedrooms, square footage, construction year, etc. to predict listing prices. Unfortunately, this method excludes one major component of data - the unique qualities of a property that may significantly increase its value. Clemson University researchers have developed a novel tool that accounts for this information, in conjunction with current/past selling prices and the information of comparable properties, to provide economic insights about the impact of real estate uniqueness on sales prices.

 

Application                                                                         Stage of Development

Machine Learning; Artificial Intelligence                           Prototype and Animal Studies Complete

 

Advantages

  • Natural Language Processing (NLP) analyzes hidden value in text descriptions, increasing the property value between 1% and 6%, on average
  • Neural network deep learning approach detects detailed nuances, increasing prediction accuracy 
  • Machine AI is highly detailed, delivering improved market insights to home buyers/sellers, investors, bankers, and policymakers

 

Technical Summary

Real estate agents frequently write property descriptions to convey its unique features and history, reduce market friction, and emphasize any competitive advantages, thereby increasing the likelihood of a sale.This algorithm is able to analyze the text based on mathematical theories and neural network deep learning, an AI model that imitates the human brain, to obtain distributed representations of words, sentences, and paragraphs. This technology preserves the semantic meanings of words within the context of the paragraph and detects more detailed nuances than sentiment analysis methods based on words’ positive/negative polarity. Additionally, the algorithm is able to learn the individual agent’s writing style and understand abbreviations/typos, as MLS systems impose a 250-words limit on the description length. This technology takes the arrangement order of sentences and paragraphs into consideration, as the relative position of a sentence in a description often implies the importance of the corresponding feature it describes. 

View printable PDF version of this technology

____________________________________________________________________________________________

 

Inventor:                       Dr. Yannan Shen, Dr. Yiqiang Han

Patent Type:                  NA

Serial Number:             NA

CURF Ref No:              2018-034

Patent Information:
For Information, Contact:
Andy Bluvas
Technology Commercialization Officer
Clemson University Research Foundation
bluvasa@clemson.edu
Inventors:
Keywords:
© 2020. All Rights Reserved. Powered by Inteum