Supervised Machine Learning with Scikit Learn for Thermal Comfort

saman aboutorab
Jan 18, 2024
2 min read

In this supervised machine learning project, our primary goal is to develop a model that can predict thermal sensation based on a set of diverse features. The feature set encompasses various environmental, climatic, and personal factors that could potentially influence thermal comfort. The features include information such as the year, season, climate, city, country, building type, cooling strategy at the building level, sex, clothing (Clo), metabolic rate (Met), air temperature (Celsius), relative humidity (%), and air velocity (m/s).

The target variable for our predictive model is "thermal sensation," a crucial aspect in understanding how individuals perceive and experience thermal comfort. Thermal sensation is a subjective measure that encapsulates the overall feeling of warmth or coolness experienced by individuals in a given environment.

Our machine learning model will be trained on a labeled dataset, where instances are characterized by the feature values mentioned above, and the corresponding target is the recorded thermal sensation. Utilizing a supervised learning algorithm, such as a regression model, we aim to capture the underlying patterns and relationships between the features and the thermal sensation responses.

The training process involves feeding the model with historical data, enabling it to learn the mapping between the input features and the target thermal sensation. Once trained, the model can then generalize its knowledge to make predictions on new, unseen data. Evaluation metrics, such as mean squared error or R-squared, will be employed to assess the model's performance and ensure its predictive accuracy.

This project holds the potential to enhance our understanding of the intricate dynamics influencing thermal comfort, providing valuable insights for optimizing building design, climate control strategies, and personal well-being in various locations and contexts.

Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_curve, auc, precision_recall_curve
from sklearn.externals import joblib
from sklearn.model_selection import RandomizedSearchCV
from sklearn.feature_selection import SelectKBest

Load Data

ieq_data = pd.read_csv("ashrae_thermal_comfort_database_2.csv", index_col='Unnamed: 0')

ieq_data.info()

Classification Objective -- Predict Thermal Sensation using a Random Forest Model

ieq_data.head()

list(ieq_data.columns)

feature_columns = [
 'Year',
 'Season',
 'Climate',
 'City',
 'Country',
 'Building type',
 'Cooling startegy_building level',
 'Sex',
 'Clo',
 'Met',
 'Air temperature (C)',
 'Relative humidity (%)',
 'Air velocity (m/s)']

features = ieq_data[feature_columns]

target = ieq_data['ThermalSensation_rounded']

target.head()

features_withdummies = pd.get_dummies(features)

Create the Train and Test Split using SK Learn

features_train, features_test, target_train, target_test = train_test_split(features_withdummies, target, test_size=0.3, random_state=2)

Train the Random Forest Model and make the classification prediction

model_rf = RandomForestClassifier(oob_score = True, max_features = 'auto', n_estimators = 100, min_samples_leaf = 2, random_state = 2)

model_rf.fit(features_train, target_train)

Out-of-Bag (OOB) Error Calculation

mean_model_accuracy = model_rf.oob_score_

print("Model accuracy: "+str(mean_model_accuracy))

Model accuracy: 0.4872587380396541

Create a Baseline Model to compare the accuracy of the model

#Dummy Classifier model to get a baseline
baseline_rf = DummyClassifier(strategy='stratified',random_state=0)
baseline_rf.fit(features_train, target_train)
#DummyClassifier(constant=None, random_state=1, strategy='most_frequent')
baseline_model_accuracy = baseline_rf.score(features_test, target_test)
print("Model accuracy: "+str(baseline_model_accuracy))

Model accuracy: 0.2833908707326429

Classification Report

y_pred = model_rf.predict(features_test)
y_true = np.array(target_test)
categories = np.array(target.sort_values().unique())
print(classification_report(y_true, y_pred))

Feature Importance

importances = model_rf.feature_importances_
std = np.std([tree.feature_importances_ for tree in model_rf.estimators_], axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(features_withdummies.shape[1]):
    print("%d. feature %s (%f)" % (f + 1, features_withdummies.columns[indices[f]], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure(figsize=(15,6))
plt.title("Feature Importances")
plt.barh(range(15), importances[indices][:15], align="center")
plt.yticks(range(15), features_withdummies.columns[indices][:15])#
plt.gca().invert_yaxis()
plt.tight_layout(pad=0.4)
plt.show()

Classification Confusion Matrix Visualization

def plot_confusion_matrix(cm, categories, title='Confusion matrix', cmap='Reds'):
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(categories))
    plt.xticks(tick_marks,categories, rotation=90)
    plt.yticks(tick_marks,categories)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()

# Compute confusion matrix: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
sns.set(font_scale=1.4)
cm = confusion_matrix(y_true, y_pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
print(cm)
plt.figure(figsize=(12,10))
plot_confusion_matrix(cm, categories)

# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
print(cm_normalized)
plt.figure(figsize=(12,10))
plot_confusion_matrix(cm_normalized, categories, title='Normalized Classification Error Matrix')
plt.show()