PhD-BPD Dissertation Defense Presentation: Han Li
Title: Improving Residential HVAC Performance with Data Discovery and Advanced Modeling of a National Smart Thermostat Database
Name: Han Li, Ph.D. candidate in Building Performance and Diagnostics (PhD-BPD)
Date: Wednesday, January 14, 2026
Time: 10:00am-12:00pm ET
Location: Jared L. Cohon University Center, Dowd Room (2nd Floor) & Zoom
Advisory Committee:
Prof. Vivian Loftness, FAIA, LEED AP (Chair)
University Professor
School of Architecture
Carnegie Mellon University
Prof. Erica Cochran Hameen, Ph.D., Assoc. AIA, NOMA, LEED AP
Associate Professor
Co-Director, Center for Building Performance & Diagnostics
School of Architecture
Carnegie Mellon University
Tianzhen Hong, Ph.D., PE, FIBPSA, FASHRAE
Senior Scientist, Deputy for Research
Building & Industrial Energy Systems Division
Lawrence Berkeley National Laboratory
Abstract:
Residential buildings consume nearly 20% of total U.S. energy, with heating, ventilation, and air conditioning (HVAC) systems accounting for over half of household energy use. Smart thermostats, now deployed in more than 25 million American homes, generate continuous measurements offering unprecedented opportunity to unveil human-equipment-building interaction characteristics. Yet a fundamental gap persists between observational richness and actionable insights for building energy research, energy efficiency programs, product development, and policy making decisions. This dissertation addresses this data-knowledge gap by demonstrating how a multi-year, large-scale smart thermostat dataset can enable rigorous characterization of occupant behavior, building thermal properties, and their integrated implications for residential HVAC performance improvement.
The research analyzes the ecobee Donate Your Data dataset, the largest publicly available residential smart thermostat archive, comprising metadata and telemetry from over 200,000 U.S. homes across more than eight years (2017 to 2025). A cloud-native processing and analytical pipeline was developed, which reduces the large-scale analysis time from months to hours while ensuring cost-effectiveness and robustness. The analytical framework operates across the occupant behavior and building thermal properties dimensions but can be extended for other types of analysis.
Behavioral analysis investigates three integral aspects: (1) Thermostat schedule clustering identified distinct behavioral archetypes across the population, enabling applications such as synthetic schedule generation for building simulations, and differentiated operation intervention targeting. (2) Setback analysis reveals that Away mode is effective in reducing runtime, but adoption declined significantly since the COVID-19 pandemic, quantifying re-engagement opportunities and achievable savings potential. (3) Override classification shows that a substantial fraction of manual interventions seek conservation rather than comfort, identifying engaged users for efficiency programs and informing personalized automation design. Together, these characterizations provide empirical benchmarks for demand-side management and behavior-driven program design.
Thermal property inference introduces a novel regime-specific approach that extracts scale-invariant parameter groups (i.e., thermal time constant, duty-load sensitivity, and capacity-to-mass ratio) from thermostat-only signals without requiring metered energy data or detailed equipment specifications. The proposed parameter identification method is proven to be generalizable to buildings with different meta characteristics. The identified parameters form a valuable database that can provide benchmarks, inform retrofit prioritization, guide utility program design, and equipment performance diagnostics.
Integrated analysis couples behavioral and thermal dimensions for decision-relevant applications: (1) To quantify the energy impact of behavioral disengagement, a weather-normalized analysis was carried out to answer: what if users restore the pre-pandemic Away adoption behavior? Results show 10-17% runtime savings potential, highlighting substantial re-engagement opportunities for utility programs. (2) To understand what determines temperature setback effectiveness, interpretable logistic regression models were applied, revealing that building thermal characteristics shape whether setbacks achieve runtime reductions effectively. This enables differentiated intervention targeting: behavioral campaigns for thermally favorable homes versus retrofit prioritization for HVAC capacity-constrained homes. These two case studies illustrate the potential of actionable analysis built upon the integrated framework. Further applications can build on the same behavior-thermal foundation.
By establishing what large-scale smart thermostat data can reveal about residential occupant behavior and building thermal performance at population scale, this dissertation bridges the gap between observational richness and practical decision-making. The methodological frameworks and empirical benchmarks provide a foundation for scalable, behavior-aware residential energy research and practice.
Keywords: residential buildings, smart thermostat, occupant behavior, building thermal properties