One of the aims of my Tiny Dishwasher project was to look at how the Matter Device Energy Management works.
Part of this protocol is understanding how a device might forecast its own energy usage. For some appliances, this is easy. Take my Tiny Dishwasher. It runs a program of the users choosing and that program will be reasonably predicable. 20 minutes wetting all the dishes. 30 minutes heating water. 20 minutes blasting the dishes etc. etc. All parts of the cycle will be known and repeatable.
Other appliances are less predictable. Take a solar inverter. Its output will vary day to day, hour to hour, minute to minute.
Or take a heat pump. Its output (and therefore its input) will vary depending on weather conditions & schedule.
As part of my Tiny Matter Appliances project, I want to add a heat pump emulator. Before writing the code, I thought it would be interested to build something to forecast a heat pump’s power demand.
You essentially want an idea of power consumption for a given outside temperature. Then, armed with a forecast of the day’s temperatures, you know how much power will be required at each part of the day.
Something that’s very good at making predictions like that is machine learning (ML).
Machine Learning
When it comes to making predictions, machine learning (ML) is the tool to deploy. Armed with some historical data, an ML model can be trained to make future predictions. ML is used pretty widely from thermostats to calendars. From Netflix recommendations to fraud prevention.
I’ve done precisely one piece of ML at this point in my life. I trained an iPhone how to recognise a radiator. Whilst that was surprising straight forward, it’s iOS only. I wanted something a little more open source and a little more portable.
From what I understand at this point, my model needs to know what sorts of questions I’m going to ask. As I’m interested in power forecasting, I only need to ask one question:
If it’s -5°C outside, how much power will the heat pump consume to keep a house at 21° inside?
Obviously, I would need to ask that question quite a few times across a day, as the outside temperature changes. Hopefully you get the idea.
To answer that question, I need information about the heat pump’s input and output at different temperatures. The more data, the more accurate better the model will be. This data is called a training set.
Python seems to be the programming language of choice when it comes to ML. The internet is filled with guides on how to load and analyse data to build models. I’m okay with Python so it seemed like a good starting point!
Unfortunately, I don’t have a heat pump (yet), so I don’t have historical data of my own.
Fortunately for me, people have been sharing their heat pump data for years. Enter Heat Pump Monitor!
HeatPumpMonitor.org
Heat Pump Monitor is a website where people share their heat pump’s data. It’s powered by Open Energy Monitor, an open-source hardware and software pairing.
At the top level, it has location, make and model of heat pump and rating. If also logs data like temperatures (indoor, outdoors, flow and return), power input and heat output. To name a few!

All the historical data you could ever want for heat pumps is available. The cherry on top is that there is a public API available at https://heatpumpmonitor.org/api-helper. This API gives access to all historical data and it’s what I’m going to use to build a model.
The first endpoint of note if this /system/list/public.json. It returns all the public systems in their database and an entry looks something like this (in JSON):
{
"id": 576,
"userid": 574,
"published": 1,
"last_updated": 1738081331,
"location": "Folkestone",
"installer_name": "Heat Geek",
"installer_url": "https://www.heatgeek.com/",
"url": "https://emoncms.org/app/view?name=VaillantArothermPlus5kW&readkey=52f63b7f22950aac289b6bb36d3129dd",
"share": 1,
"hp_model": "Arotherm+",
"hp_type": "Air Source",
"hp_output": 5,
"refrigerant": "R290",
"dhw_method": "Cylinder with plate heat exchanger",
"cylinder_volume": 210,
"dhw_coil_hex_area": 3,
"new_radiators": 1,
"old_radiators": 1,
"fan_coil_radiators": 0,
"UFH": 0,
"hydraulic_separation": "None",
"flow_temp": 48,
"flow_temp_typical": null,
"wc_curve": 0.4,
"freeze": "Anti-freeze valves",
"zone_number": 1,
"space_heat_control_type": "Weather compensation with a little room influence",
"dhw_control_type": "Daily scheduled heat up of tank",
"dhw_target_temperature": 55,
"legionella_frequency": "Disabled",
"legionella_target_temperature": 0,
"property": "Detached",
"floor_area": 180,
"heat_demand": 12591,
"heat_loss": 5.49,
"age": "1983 to 2011",
"insulation": "Fully insulated walls, floors and loft",
"electricity_tariff": null,
"electricity_tariff_type": null,
"electricity_tariff_unit_rate_all": null,
"electricity_tariff_unit_rate_hp": null,
"solar_pv_generation": 0,
"solar_pv_self_consumption": 0,
"solar_pv_divert": 0,
"battery_storage_capacity": 0,
"electric_meter": "SDM120 Modbus/MBUS Single Phase (class 1)",
"heat_meter": "Heat pump integration",
"metering_inc_boost": 0,
"metering_inc_central_heating_pumps": 1,
"metering_inc_brine_pumps": 0,
"metering_inc_controls": 0,
"notes": "EBUSD integration for all data apart from Electric (SDM120). Mixergy Cylinder with plate heat exchanger. Flow and return sensor difference of 0.1 °C corrected in return temperature feed. ",
"mid_metering": 0,
"kwh_m2": null,
"design_temp": -2,
"water_heat_demand": 4585,
"EPC_spaceheat_demand": null,
"EPC_waterheat_demand": null,
"heatgeek": 1,
"indoor_temperature": 1,
"betateach": "",
"emoncmsorg_userid": 0,
"installer_logo": "heatgeek.png",
"ultimaterenewables": 0,
"heatingacademy": 0,
"youtube": "",
"data_flag": 0,
"data_flag_note": "",
"hp_max_output": 6.5,
"measured_base_DT": 0,
"measured_design_DT": 0,
"measured_heat_loss": 0,
"measured_heat_loss_range": 0,
"hp_max_output_test": 4.8,
"volumiser": 0,
"legionella_immersion": 0,
"metering_inc_immersion": 0,
"uses_backup_heater": 0,
"metering_inc_secondary_heating_pumps": 0,
"weighted_average_flow_minus_outside": "0",
"measured_mean_flow_temp_coldest_day": 36.44,
"measured_max_flow_temp_coldest_day": 0,
"measured_outside_temp_coldest_day": 1.75,
"measured_emitter_spec": "0",
"measured_system_volume": "0",
"weighted_average_ideal_carnot": "0",
"weighted_flow_temp": "0",
"measured_room_temp_coldest_day": 20.77,
"latitude": 51.0791,
"longitude": 1.17941,
"hp_manufacturer": "Vaillant"
}
There is quite a lot of data in there, but you can find things like installer, manufacturer, outputs, temperatures etc.
Whilst this is great for identifying the types and locations of heat pumps, it’s not suitable to train a model. It doesn’t contain the power or temperature information I need. Thankfully, the API has some timeseries endpoints.
Started with /timeseries/available, we can fetch a list of all the different time series available for a heat pump. Timeseries is a word that sounds complications, but just a value against a point in time. If we run that endpoint for the system above (576), we get back this list:
{
"feeds": {
"heatpump_elec": {
"start_time": 1718373650,
"end_time": 1757396690,
"interval": 10,
"npoints": 3902305
},
"heatpump_elec_kwh": {
"start_time": 1718381640,
"end_time": 1757396690,
"interval": 10,
"npoints": 3901506
},
"heatpump_heat": {
"start_time": 1737458950,
"end_time": 1757396680,
"interval": 10,
"npoints": 1993774
},
"heatpump_heat_kwh": {
"start_time": 1737383660,
"end_time": 1757396680,
"interval": 20,
"npoints": 1000652
},
"heatpump_flowT": {
"start_time": 1737145810,
"end_time": 1757396680,
"interval": 10,
"npoints": 2025088
},
"heatpump_returnT": {
"start_time": 1737145810,
"end_time": 1757396680,
"interval": 10,
"npoints": 2025088
},
"heatpump_flowrate": {
"start_time": 1737459250,
"end_time": 1757396680,
"interval": 10,
"npoints": 1993744
},
"heatpump_roomT": {
"start_time": 1737557730,
"end_time": 1757396680,
"interval": 10,
"npoints": 1983896
},
"heatpump_outsideT": {
"start_time": 1737145750,
"end_time": 1757396680,
"interval": 10,
"npoints": 2025094
},
"heatpump_dhw": {
"start_time": 1737383090,
"end_time": 1757396680,
"interval": 10,
"npoints": 2001360
},
"heatpump_ch": {
"start_time": 1737383270,
"end_time": 1757396680,
"interval": 10,
"npoints": 2001342
},
"heatpump_targetT": {
"start_time": 1737147640,
"end_time": 1757396680,
"interval": 10,
"npoints": 2024905
}
},
"start_date": 1737504060
}
This system appears to have started logging on the 22nd of January, 2025 (1737504060 is Unix Epoch time)
You can see there are different headings, like heatpump_elec, heatpump_heat, heatpump_flowT and heatpump_outsideT. These represent different slices of the data. From the names, I can guess what they represent. Let’s look at heatpump_elec. This should return the input of the heat pump.
To access this, we use the /timeseries/data endpoint. We set the dates and the feeds and set the interval to 3600 (I’m guessing this is 1 hour in seconds)
timeseries/data?id=576&feeds=heatpump_elec&start=22-01-2025&end=22-07-2025&interval=3600&average=1&timeformat=notime
It returned this
{
"heatpump_elec": [
961.9548570033483,
790.3271179199219,
976.7832861908574,
786.7259376371971,
901.8389181657271,
682.6818187020042,
672.9968927243335,
655.919999759298,
692.3785302113679,
704.5058650011457,
751.9529911052127,
747.5420613833456,
747.4431091487583,
742.6933523189126,
755.602000906808,
733.6570210716444,
786.6092688742648,
1043.6349173604442,
1427.6960215568542,
2226.885254876462,
1778.6068580845424,
1456.4853881573608,
1332.1820511777178,
828.7937511097301,
859.0935949469011,
1028.7005701119558,
...<truncated>...
}
I’m guessing this is the wattage of the heat pump at hourly intervals.
The API also appears to allow multiple timeseries to be requested. I tried asking for the input (heatpump_elec) and the output (heatpump_heat). Sure enough, it returned both values.
I was beginning to get a clear idea of what I needed based on the question I needed to ask.
Step one would be identifying all the systems of the same type. Let’s pick a 5kW Vaillant aroTHERM as they are pretty popular. The exact make and model isn’t important as it’s just for demonstration purposes.
Building the Training Set
Now that I had found a rich source of data, I need to get into a format I could use with Python ML. The most common format I could see in the example was CSV (comma separated values).
I needed to take the HeatPumpMonitor JSON data from the API and flatten it into a CSV file. This would put each related value alongside each other, rather than separate like in the JSON.
To accomplish this, I started building a python script. It fetches the public systems and then filters by manufacturer, model and output.
import urllib.request, json
endpoint = "https://heatpumpmonitor.org/system/list/public.json"
with urllib.request.urlopen(endpoint) as url:
data = json.loads(url.read().decode())
#print(data)
vaillant_systems_dict = [x for x in data if x['hp_manufacturer'] == 'Vaillant'
and x['hp_model'] == 'Arotherm+'
and x['hp_output'] == 5]
print(len(vaillant_systems_dict))
This script outputs a value of 84, which I’m hoping means we have 84 5kW Arotherms available.
We now need to iterate through these 84 systems and pull down their time series.
I started by opening a CSV file and adding a header row
with open('trainingset.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['Input', 'TargetTemperature', 'OutsideTemperature'])
I then made a call to the data endpoint, starting from the 1st of Jan, 2023 up to today, 6th of September 2025.
I wrote all the results, row by row, into the CSV, being careful to omit null values:
for i in range(len(time_series_data['heatpump_elec'])):
elec = time_series_data['heatpump_elec'][i]
roomT = time_series_data['heatpump_roomT'][i]
outsideT = time_series_data['heatpump_outsideT'][i]
if elec is not None and roomT is not None and outsideT is not None:
writer.writerow([elec, roomT, outsideT])
After running this, I got a few errors about missing data but ended up with almost 900,000 rows of data! More than enough for me to move onto the model part.
Building and training a Model
I had now reached the point where I was at sea. I knew what I wanted to be able to do, I just didn’t know how to do it.
I had three columns of data and I knew the values I wanted to input and the value I wanted to get back.
After a little googling, I settled on multiple regression as the approach I needed. I followed this guide – https://www.w3schools.com/python/python_ml_multiple_regression.asp
My train_model.py script looked like this
import pandas
from sklearn import linear_model
df = pandas.read_csv("trainingset.csv")
X = df[['TargetTemperature', 'OutsideTemperature']]
y = df['Input']
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedInput = regr.predict([[21, -5]])
print(predictedInput)
This reads the trainingset.csv file and splits it into TargetTemperature, OutsideTemperature and Input.
I then ask it to spit out the predictedInput at 21°C inside and -5°C outside.
On first run, after a few seconds, it printed a value of 914W.
I ran it again using 21 and 10 and I got an output of 343W.
I guess that means it’s working!
Summary
I’m quite surprised how easy that was. It didn’t take very long to write some basic python capable of predicting the output of a Vaillant aroTHERM.
To be honest, a value of 914W for an outside temp of -5°C seems low. That said, I lack context. It doesn’t factor in heat loss, flow temperature, size of property, time of day (solar gain!), hot water and numerous other factors. That data is available from HeatPumpMonitor and I’ll add it into the model in due course.
I want to run this prediction on an ESP32, so my next step is understanding how that might work!
All of the code is available here – https://github.com/tomasmcguinness/ml-python-heatpump-model

Leave a comment