In Part 1 of this two-part series, I examined some of the ways the artificial intelligence (AI) algorithms that are used in HVAC control systems use data to approximate functions. In a simplistic approach, the AI algorithms used in our industry could be grouped into two types: classification and regression, both of which fall under the category of supervised learning, i.e., learning to approximate a function by “seeing” the data. Classification algorithms are used to predict discrete values, i.e., true or false.
This article will focus on regression-type algorithms. Some examples of regression-type algorithms include simple linear regression, multiple linear regression, polynomial regression, support vector regression, decision tree regression, and random forest regression. These algorithms are used to predict continuous variables, i.e., chiller plant energy consumption, cooling coil leaving air temperature, etc. Each one of these variables can be approximated as a function of a single input variable or multiple input variables.
For example, the graph shown in Figure 1 shows the chiller plant energy consumption noted as y (output variable) being represented as a function of the chiller plant load (input variable identified as x) and outside air temperature (identified as z). The mathematical representation of this function would be y = f (x,z). Having the trend data from the building automation system (BAS) for x and z, one could train an AI algorithm to approximate y as a function of x and z. In order to determine the performance of an AI algorithm, various metrics are necessary. It’s important to note that accuracy is not a metric for regression-type algorithms; it is a metric for classification-type algorithms. Some of the metrics used of regression-type algorithms are mean absolute error (MAE) and root mean squared error (RMSE).
MAE measures the average magnitude of the errors in a set of predictions without considering their direction, i.e., positive prediction error versus negative prediction error, with the prediction error being the difference between the actual value and the predicted value for that instance. Prediction errors are also called residuals, and they are a measure of how far from the regression line data points are. MAE is the average over the data sample (i.e., training data from the BMS) of the absolute differences between predicted values and observed values.
RMSE is the square root of the average of squared differences between predicted values and observed values. RMSE could also be interpreted as the standard deviation of the residuals. RMSE is used to indicate how close the data is relative to the predicted function. RMSE is a negatively oriented scoring method: The lower the score, the better the AI algorithm is at predicting values close to the actual values.
Let’s assume we have an AI algorithm that predicts the electric demand (i.e., kW) of a 1,000-ton chilled water plant, and the algorithm has an MAE of 10.5 kW and RMSE OF 20 kW. Further, let’s assume the contract the owner has with the utility company has a provision that reads similar to: “The billing demand will be the maximum 10-minute demand (either kilowatts or 90% of kilovolt-ampers) as determined by the meter during the monthly billing period.” In other words, the utility company will look at the highest demand (in kW) over a 10-minute period (within a billing cycle) and use that number to charge the owner. Other utility companies may also have a demand threshold, i.e., a kW value that if the owner’s building were to exceed, then he or she will incur additional charges. This overall building demand threshold (in kW) could then be overlayed/translated into a chilled water plant demand threshold. Assuming the 1,000-ton plant operates at peak capacity with an efficiency of 1 kW/ton, an RMSE OF 20 kW, or 2%, may grant an impression that the AI algorithm has done a good job at approximating the demand of the chiller plant; however, this approach may give a false impression. Because both RMSE and MAE use an averaging approach when scoring the performance of an AI algorithm, there is risk that some of the predicted values are more than 20 kW from the actual values. This, in turn, may cause the actual plant demand to push the overall building demand (in kW) over the threshold, and the owner will incur additional charges. In this scenario, one will need to improve the performance of the AI algorithm such that the RMSE and the MAE are lowered even further.
No matter what AI algorithm one chooses to use, it will only be as good as the data that is used in its training. AI algorithms, in particular machine learning (ML) algorithms under the supervised learning category, do not see outliers in the data, nor will they understand what caused the outliers to occur. Figure 4 shows an example of outlier data; in the case of the function that the AI algorithm is trying to approximate is negatively impacted by outlier data. An example of outlier data may be a pump’s variable frequency drive (VFD) put in hand mode or chiller control valve actuator that has failed open. To try to compensate and/or identify outlier data, one needs to manually process the data, identify outliers, and then remove them from the training data. An alternative to manual processing of data is to use anomaly detection algorithms. These algorithms are programmed to identify variables that are operating outside of an "expected" range of operation and, depending of the amount of variation, may issue an alarm to a building operator. In addition to identifying anomalies in data, a strong anomaly detection algorithm typically provides a list with potential "causes" and solutions to "fix" the anomaly. For example, an algorithm may detect that a chilled water pump has been operating at a horsepower outside of an expected range for more than 30 minutes. The algorithm may then issue an alarm at the BMS and provide the building engineer with a couple of action items: 1) check the VFD's hand mode; 2) check chiller control valves; and 3) check the leaving air temperature set points at the air-handling units served by the chiller plant. The building engineer will then need to take one action item at a time until the cause of the pump operating at a higher horsepower is identified.
Just because the anomaly detection algorithm provided a list with potential causes, it doesn’t mean the algorithm has performed some sort of causal analysis. Usually, the list of potential causes behind an anomaly is generated based on past experiences. One could reasonably infer that neither an AI algorithm nor an anomaly detection algorithm are causal algorithms.
Causal analysis is a controversial topic. Causal analysis attempts to draw dependable/valid inferences about cause-and-effect relationships from measured data. Most of us are taught, early in our college programs, that correlation does not imply causation; just because the outside air temperature has increased from 75°-95°F, it does not necessarily imply that such an increase is the direct cause of an increase in the energy consumption of a chilled water plant. The two variables, i.e., y (energy plant consumption) and z (outside air temperature), are, at minimum, correlated, but, which one causes the other? In order to attempt to identify a "causal relationship" between our variables, one will need to draw a causal model similar to what is shown in Figure 5. In this graph, the outside air temperature may have a direct effect on the building load (variable x), which, in turn, has a direct effect on y, meaning variable z has a direct effect and an indirect effect (thru x) on y. However, each of the three variables (x,y,z) are also affected by latent variables (V1, V2, and V3). Latent variables, aka confounding variables, are those one may choose to not measure but still account for in a causal model.
In general, causal models are used in social sciences, medicine, and similar fields where the number of variables being analyzed is relatively small. In the case of HVAC systems, we are confronted with hundreds of variables, sometimes thousands. These variables interact with other variables within the same system and across multiple systems; the air-handling unit (AHU) supply air setpoint has an effect on the zones the AHU is serving and also an effect on the energy performance of the chilled water plant. One will need to design and use relatively complex causal algorithms to attempt to identify the causal effect of one variable across multiple systems.