A case for Fixed-Horizon Temporal Difference methods in RL

Introduction Value learning algorithms usually centre around the infinite horizon Bellman equation. When we make estimates of the value of an action, we are estimating the value of the entire future given a current state and proposed action. However, such value learning approaches are notoriously unstable. Finite-Horizon Temporal Difference methods were recently reintroduced to shorten the time step horizon where value is being maximised over, to a finite number of steps. The authors boast proven training stability and higher predictive power, by taking this approach. We additionally found them to converge in fewer time steps. A finite number of steps can approximate an infinite number…

Lighting for living and working by; a British winter at home

Evenings are drawing in, lockdown 2.0 is in full swing, and most office-based companies are messaging there won’t be a return to work until at least March 2021. Watching the sunset from 4pm with one to two hours of work to go, and an evening planned beyond, is a draining experience that usually lasts the whole of Winter. What I found out about brightness and “colour-temperature” recommendations for productive work, whilst being conscious of sleep quality.