Home > My Projects > Home Automation > Lessons Learned |
There are always mistakes along the way one makes when developing a system. In my journey towards a robust, reliable and long term home automation system I've learned some lessons along the way. Some of these have been the result of a minor inconvenience such as device control failing. Others have resulted in the need to completely recover the system. Hopefully this information will save someone from having to learn these the hard way by avoiding the scenarios which caused me to experience them.
Don't Abuse The Recorder Database |
We have to talk about the database. I will reiterate again: Home Assistant is an amazing, stable, active, high quality product made by a talented team of dedicated employees and mostly volunteers. If I had the Python knowledge so I could contribute, I would and hope that documenting my Home Assistant story is worthwhile contribution. That said, the database used to be crap.
However, as is typical in a fast moving project subject to continual improvement, during the considerable time it took me to make this writeup, there have been significant changes made to the database schema. Starting in release 2022.4, Home Assistant moved to using a relational database to store entity attributes. Thus eliminating a large amount of redundant data and improving query speed dramatically. This continued in release 2022.5 where optimizations on the frequency of database reads/writes were made (especially important for SD-card users). 2022.6 changed event storage to a relational model.
You can read about the database structure in the HA Data Science Documentation.
These changes essentially nullify most of my criticisms below, thankfully!. There are still unfortunately a lot of VARCHAR(255) columns, and states are not stored as their native data types but instead the aforementioned VARCHAR. However removing the massive JSON blob of redundant attribute data is probably the most important improvement that could have been made.
So take what is below as sort of a historic overview of a previously huge problem with Home Assistant from the beginning to 2022.4 which the devs have since begun actively fixing based on feedback from the community. If this isn't a net positive and a perfect example of why it is important for a platform like Home Assistant to be community developed and open, then I don't know what is. And maybe some of this information will be useful to someone just begining a database design of their own.
Home Assistant ships with SQLite as the engine managing the internal database to which the Recorder integration saves data. Primarily entity state data is saved to the database, such as a temperature sensor reading, or whether a light is on/off. This data is saved to the "states" table. Unfortunately, this table used to be treated like a flat file with each entity state change stored as a row containing the full entity name, the state, a date stamp, and a JSON string containing detailed attributes. It looked something like this example from the "states" table:
state_id (int(11)) | domain (varchar(64)) | entity_id (varchar(255)) | state (varchar(255)) | attributes (text) | event_id (int(11)) | last_changed (datetime) | created (datetime) | last_updated (datetime) | old_state_id (int(11)) |
---|---|---|---|---|---|---|---|---|---|
21555648 | sensor | sensor.power_grid_voltage_phase_1 | 123.71 | {\"unit_of_measurement\": \"V\", \"friendly_name\": \"Power Grid Voltage Phase 1\"} | 21735303 | 2020-10-11 12:29:25 | 2020-10-11 12:29:25 | 2020-10-11 12:29:25 | 21555638 |
Upon viewing that, everyone who has worked with databases before is in full facepalm mode.
And therein lies the problem. Every state change (think about a power monitoring sensor posting 10 readings every 2 seconds) added another row to the DB full of the excess entity name varchar data, and JSON attributes varchar data. This had several effects:
It was an absolute horror of repeated data in varchar fields and JSON encoded data in text fields.
How much did this bloat the database? Just search the Home Assistant Community for "database size" to see some examples. All it took was a few dozen entities and a 30 day database purge time to have a multi-gigabyte database. And if you were trying to save more frequently occurring data for a longer period (for example, the completely reasonable goal of saving 60 days of power monitoring data supplied updated every few seconds) then a 20GB+ database was inevitable.
This meant that the database is was an option to store long term frequently occurring data, so users had to tightly control how much data was written and go through extra steps to migrate data for long term retention to another database. And if not done, the database was a ticking time bomb which would eventually grow to fill the drive. Worse if someone is running Home Assistant on an SD card based system (Raspberry Pi, for example). Frequent database writes will quickly burn out the SD card and there are many, many examples of this. In fact, it happened to me early on.
The database needed to be redesigned into a proper relational database. Here's my very quick and dirty example of how that could have happened: The "state" table could contain 5 columns: state_id (int), entity_id (int), StateNumeric (int), StateText (text), LastChanged (datetime). entity_id should like to an "entities" table which contains the entity_id, a text representation of the entity name, and the the rest of the entity metadata but NOT attributes. Then an "attributes_names" table contains a master list of all possible attributes names in varchar fields, with an attributes_name_id. Then a table to link those two, and another table to store the attributes values, linked to their friendly name and state ID. A proper relational databse. No redundant information, using native database types, elimination of all JSON. The result would be a database multiple orders of magnitude smaller, and multiple orders of magnitude faster.
This would have required the entire database abstraction of Home Assistant to be rewritten but should not require any changes above the abstraction level.
I had previously stated: I hope in the future this is considered because until then, in every Home Assistant installation the database must be managed lest it become an inevitable failure point. Thus making the database useless for long term storage, trend analysis and other things one would think is a key component of a home automation system.
An interesting past comparison I made between a correctly set up database with native data types was an energy monitor I made posting 10 numeric values to Home Assistant via MQTT every 2 seconds. Within 1 week, my Home Assistant database had increased in size by 8GB. History, graphs and the logbook were useless as all this JSON data needed to be churned through and I was unwilling to wait the 5+ minutes this takes. It also had other effects on HA which included slowing automations, making camera display take several minutes to never, and causing the system to essentially lock during recorder maintenance. So I changed the energy monitor to post the same data to a MySQL database directly using native data types. 1 week of saved data was approximately 25MB. I then used SQL sensors in Home Assistant to take a 10 second average and display those values. Which also bloated the Home Assistant database by about 4 GB after a week or two, but at least was tolerable for display.
Don't Run HA On Raspberry Pi With SD Cards |
This may come as a shock to you, since Home Assistant heavily suggests using a Raspberry Pi at nearly every opportunity, but just don't do it.
If you install HA on a Raspberry Pi (or other SBC such as OrangePi, Banana Pi, etc.) to an SD card, you have created a ticking time bomb that will eventually fail. The flash memory (as used in SD cards and many other devices like SSDs) has a limited number of erase/write cycles per cell. Home Assistant makes many constant small writes due to the recorder database, logging and state storage (in .storage). This therefor is murder on SD card storage.
Many SD cards include wear levelling to assure that the same cells are not overwritten constantly, but many do not. It can sometimes be hard to tell because even the same SD card may change between revisions. You can absolutely mitigate this by buying high endurance or industrial SD cards which include lots of extra flash which can be used by more advanced wear levelling. Just be careful because many of these cards are designed for surveillance camera applications and thus optimized for continuous full block overwrites of the entire card, not many small constant writes.
Having burned out multiple SD cards ranging from no-name junk drawer cards to expensive brand name high endurance industrial rated cards in SBC (Raspberry Pi, OrangePi, etc.) applications involving a database I'd just recommend avoiding the hassle yourself and don't do it.
Wear aside, it is well known that SD cards on the RasPi have the propensity to become corrupted if power is interrupted. Indeed the Home Assistant Community is full of stories of corrupt SD cards after unexpected power loss in addition to those users who have burned out their cards with excessive writes.
I'd go one step further and just advise to avoid RasPi style SBCs entirely unless you know for absolute certain that your Home Assistant installation will never be much more than the basics. It is almost inevitable that you will outgrow the limited resources of the SBC and will need to migrate to something with more horsepower eventually so you may as well just start properly from the beginning. That 8 year old laptop sitting in the closet is going to be faster and more reliable than a Raspberry Pi (and it has a built in UPS!). Looking at it another way, by the time you have purchased a Raspberry Pi, a quality SD card, a case and a quality power supply (which is extremely critical) you are spending more money than just buying a used Lenovo Tiny from eBay. And unlike the RasPi you don't need to jump through hoops to avoid using the SD card.
Those inexpensive SBCs are quite useful to use to explore Home Assistant or test a configuration though. But then, so is a virtual machine.
Don't Use ENC28J60 Chip Based Ethernet Modules |
If you are building your own hardware just save yourself a lot of hassle and don't try to use ENC28J60 Ethernet modules. My experience with them echos that of others; these modules almost universally suck. Every piece of hardware that I have built with them has in short term has experienced freezing issues and in multiple cases these issues progress to total failure of the module to do anything beyond establishing link at the physical layer.
Thing is, the cause is not a single problem but a combination of factors.
There are well known errata with the chip itself. Both the UIPEthernet and EthernetENC implement the "errata 14" fix which I can only assume is effective. I've not spent any time debugging the code to determine this.
Many of the modules are just simply poor quality. They don't implement the required number of filter capacitors or use capacitors of too small a value. Often the voltage regulators are generic "jellybean" (common) parts that are of questionable quality or under specced. I've even seen some PCBs that were not cleaned of flux or have just awful soldering jobs. Some of the modules use the Ardunio's built in 3.3V regulator which cannot supply the necessary current (especially the Nano which gets its 3.3V from the USB to serial chip).
Save yourself a lot of time and trouble by using Wiznet 5100/5500/5200 based modules. They cost a dollar or two more but that is a well paid premium for a part that simply works. They also have far better libraries with wide support for most MCUs (Atmel AVR, ESP8622, ESP32, STM32, etc.). However even then you must be careful because there are some W5100 modules out there with the wrong value resistors installed on the TX/RX lines before the magnetics which mean that these modules may not establish link reliably depending on the switch at the other end (some are more sensitive than others).
Use Virtual Machines |
You're going to want to likely run Home Assistant Core (see below), however don't do so "on the metal". Instead, running HA within a virtual machine provides some significant advantages and could even save your bacon in the event of problems.
My favourite hypervisor environment is VMware vSphere ESXi. Free for home use or mild commercial use, widely supported and extremely stable. One of the advantages over Microsoft HyperV (another solid platform) are lower hardware requirements and the ability to use hardware passthrough to connect a device plugged into the host to a VM.
Virtual machines give you the advantage of being able to abstract away the hardware from the system running on it. The virtual machine is just a set of files so it is easy to back up and move to a new platform in the event of a hardware failure of the host system, or simply a migration to a new host system.
The ability to take a snapshot of the VM before major changes or upgrades means that it is easy to revert back to the previous configuration if something goes wrong. This can be especially important for breaking changes. For example, if one wasn't paying attention and upgraded to Home Assistant 2022.6, one would have found all of their MQTT devices no longer working with multiple configuration errors. With a snapshot, it becomes a 5 minute fix to roll back and then make a plan to update the MQTT configurations for every device when not in a state of abject panic.
A virtual machine is also easily duplicated for convenient creation of a test environment to confirm those major changes aren't going to cause problems without having to effect your live environment.
You also get another level of remote access. If there is an issue which prevents the virtual machine from booting fully, one might not have SSH access. Opening a console to the virtual machine from the hypervisor is the equivalent sitting a the physical monitor and keyboard. With the advantage of being able to do it from wherever you are able to make a TCP/IP connection.
Run Home Assistant Core |
This may be an unpopular opinion but if you want home automation platform which is truly under your control, you need to run Home Assistant Core. Home Assistant Core is simply the standalone HA software itself installed directly on your own operating system, usually in a Python virtual environment. The key word there is "standalone".
The confusingly named "Home Assistant" installation method, which was previously called "hassio" is actually a complete operating system image (Alpine Linux) with Home Assistant Core configured in a Docker container and an additional component called the Supervisor which manages the Home Assistant Core and Add-On installation.
By using the "Supervisor", you give up a lot of control. Automatic updates will happen to a number of components which may introduce breaking changes or create unintended breakage. There are many instances of users reporting Supervisor issues which have broken their entire Home Assistant installation, messed up the Docker containers, broken network connectivity and many other issues. With the ability for a remote system to command an upgrade to HA, you run the real risk of waking up to a broken installation.
One of the long term complaints of a non-core installation is the struggle of preventing it from bypassing local DNS and instead making DNS queries to 1.1.1.1 (Cloudflare) and 8.8.8.8 (Google). There are topics upon topics on the Home Assistant Community forums where this has caused many connectivity, discovery and network traffic issues. Not to mention the obvious privacy concerns with forcefully bypassing the user's DNS choice. What is more concerning is the developers unwillingness to take these concerns seriously.
The operating system and dependency requirements of the Supervisor component may change as the Supervisor is updated by the development team. Once those requirements have changed so your underlying OS doesn't meet them (in the case of a Supervised installation) your installation is now marked as "Unhealthy" which prevents future updates.
The limited Alpine Linux configuration of the base OS does not give the user the control or tools expected in a typical Linux configuration. One constantly sees questions in the community forums from users attempting to do something in the base OS only to run into problems because the specific utility and command is not available. If one intends to use Home Assistant as supplied like a sort of appliance with expansion or advanced features limited to add-ons, then this may not be an issue. However if you are the type of user who expects to be able to use the underlaying operating system (for example, to run other services) then it would be best to avoid this limited installation and instead run HA Core on the operating system of your choice.
That said, just be aware that a Core installation is considered advanced. You may have to chase down dependancies during install (2022.8 required rust which was not documented, plus the sqlalchemy and fnvhash libraries were missing from the install script) or an upgrade. As new version of Python are required, you will have to recreate your venv to use them. On the plus side, if you experience an installation/upgrade problem, just searching the error typically turns up the solution.
Use Watchdogs |
If you are building and programming your own hardware, assuming that neither you nor the hardware are perfect (a big assumption, I know) it is inevitable that mistakes may be made. Not even your mistake of course, but possibly a mistake by the author of one of the many libraries you may be using, or even a flaw in one of the modules you may be using (see ENC28J60 freezes and W5100 incorrect resistors). The end result is that your hardware may lock up or end up in an endless loop. Or lose and be unable to re-establish the network connection.
In this scenario, there is nothing more annoying then having to power cycle some controller from the basement because nothing happened when hitting a light switch one floor above. This scenario should never happen.
The fact is, even the best hardware may lock up due to no fault of your own. Surges, spikes, radiation, moist dead spider laying across PCB traces or any number of other factors may cause this.
All of the common microcontrollers (Atmega, ESP8266, ESP32) support built in watchdogs. A "watchdog" in this context is a dedicated piece of hardware (etched onto the microcontroller die) that must be continually reset, or "petted", otherwise it resets the main processor.
By enabling the built in watchdog, you can assure your hardware is reset in the event of a lockup. Within a preset period of time (configurable) if your code stops petting the watchdog, the dog bites the processor to issue a reset.
Enabling the watchdog on Atmega chips (Arduino UNO, Nano, Mega, Atmega1284, etc.) is easy via the Watchdog Library. Just be aware that a lot of Nano clones have an old bootloader that has a bug causing the processor to continually reset when the watchdog is triggered. This is fixed by updating the bootloader to the newest Optiboot.
The Arduino ESP8266 core exposes the watchdog timer functions, so it is very easy to use with a few lines of code and no external libraries. ESP32 is similarly easy.
The Arduino core for STM32 includes a library to use the watchdog timer.
One can also use the old school approach and use an external astable timer which needs to be continually reset by the processor. The 555 timer is a good choice and there are many, many, many examples of this. The advantage of this is that you can tie the reset easily into external components (Ethernet chipset, I/O expander) without using additional processor pins to reset these items. The disadvantage are the additional components and the need to dedicate a processor pin to resetting the external watchdog.
An important thing to remember is that your software must be capable of recovering from a reboot in a state which is desirable and safe. Depending on what you may be controlling, it can very important to bring pin states to a safe configuration. For example, if you are controlling two relays to reverse the rotation of a motor, activating both relays at once on a reboot will short the power supply. In many Arduino libraries, this means writing your desired pin states (ie. digitalWrite() ) as early as possible (in the first lines of setup() ). On a higher level, the device has to recover to a desired state. So if a board had 3 relays active which turned on 3 lights, it needs to reboot as quickly as possible to the same state. There are various options for this such as saving the pin states to flash on each change (however you must be careful not to wear out the EEPROM) and restoring those pins by reading flash state early in the boot process. I also publish MQTT messages as Retained so that upon connection with the MQTT broker, the board receives the desired state.
Avoid Songle Relays |
The ubiquitous Songle relay. If you've bought or shopped for a relay board (usually found searching for "arduino relay module" or similar) then you either own them, or have seen them. They're even in pictures on this website because these boards are so cheap and so universal. They can't be too bad, right? Wrong.
One can assume the Ningbo Songle Relay Co. certainly makes some of the relays marked as "Songle". The problem is, they likely don't make most of them. Counterfeits are rampant on eBay and Aliexpress so when you can buy a relay board of 8 10A rated relays for the cost of one quality Omron relay, something has to give. These boards are either going to be using the poorest quality real Songle relays (perhaps even re-marked for higher currents? Pure speculation of course) or counterfeits. Just looking at the variety of "Songle" markings displayed on these boards makes me very suspicious. Companies tend to be consistent in branding.
I have seen a large failure rate of these relays in the area of 50%. It doesn't matter whether they are switching 10A (for example, the large amount of fluorescent lighting in my shop) or only a few mA (for example, the 24VAC thermostat connection on my shop heater), the symptom is the same; the contacts become intermittent. Most often this causes the relay to stick on, and then run hot. But at lower voltages, they tend not to make contact.
This is like due to poorly made or poorly coated contacts. Quality relays have silver based contacts: silver tin oxide, silver nickel or silver cadmium oxide. These are hard wearing, very electrically conductive materials that resist erosion and contact welding. However they are, because of the silver content, expensive materials. And almost certainly where cheap or counterfeit relays will skimp during manufacture. The end result being contacts which wear out quickly and are severely limited in both switching current and number of cycles.
At the minimum this is a nuisance. The danger is that stuck relay contacts could leave a device energized which is intended to be turned off. Worse, stuck contacts are being held together by tiny welds and/or the friction of separating their rough surfaces and as such are high resistance. This causes heat at the contact point, which increases resistance, causing more heat and before you know it you have a melted relay.
I also wouldn't trust the current ratings on these relays. The most common relays used seem to be rated at 10A 120/240VAC and 10A 30VDC. I've observed with half that current at 120VAC the relays getting rather hotter than I would expect. The expectation being no heat. This is further evidence to low quality contacts.
Finding a quality relay board is rather difficult unless you buy the relays from a known reliable source such as Digikey or Mouser. Relays marked with reputable names sold on Aliexpress or eBay out of China are quite likely counterfeit. Due to the lack of quality relay boards I will be designing my own in the near future.