Algorithms
Before designing a (machine learning) algorithm to solve a certain problem, it is always wise to start with a very good understanding of the input data. Consider aspects such as different types of noise, missing values, labels, potential bias, unexpected interdependencies, over or under sampling. A proper data cleanup is essential to make any machine learning algorithm work. This is an area where data science meets data engineering. Especially when the data is not static but regularly refreshed or even realtime and requiring a data ingestion architecture. Automated learning on previously unseen data is exciting but has a number of pitfalls and requires a lot of safety valves and preparation for the unexpected.
As for algorithm design, I usually prefer to start as simple as possible, and incrementally add complexity while keeping fallback options for exceptions. It is often not necessary to immediately optimize for all possible inputs, and instead focus on the important 80%. But all inputs must give a reasonable output. (With the exception for non-ergodic systems with a risk of ruin. Then the extremes are all-important). By going from simple to complex I also mean to gradually add black box solutions such as neural networks for function approximation or feature selection. But always be aware that while black box methods (deep learning) will in the end outperfom systems based solely on domain knowledge, they may behave weird in unexpected situations, simply because of less examples in the training data.
Below you can find some of my recent data science projects.
Relevant projects
- Automatic detection of solar installation malfunctionDesign of an algorithm to detect and diagnose problems with solar installations.Oct 2021April 2022More
- 3D LIDAR Low Poly modeling3D low polygon estimation for all buildings in the Netherlands, based on LIDAR and Kadaster data. Used for estimation of roof and wall surface areas.Apr 2018Dec 2019More
- Automated Solar Panel Layout PlansAlgorithm design and implementation of an optimization engine to determine both optimal ad maximal number of solar panels for a building based on LIDAR dataJan 2019Dec 2019More
- Energy Measures Computation EngineAlgorithm design and implementation to estimate the effect of insulation and installation measures on energy consumption.Sept 2017Dec 2019More
- Housing TypesAlgorithm to determine the housing type for all buildings in The Netherlands with Open DataJan 2018Mar 2018More