In this work, voice activity detection (VAD) system with dynamic energy-quality (EQ) scalability is presented. EQ scalability is enabled through the insertion of multiple knobs at different levels of the signal chain, starting from analog-digital conversion and ending at the classification stage. Such knobs are co-optimized at runtime to achieve a given quality target with minimal energy. Such co-optimization is also shown to improve the fit of the machine learning algorithm, allowing for more graceful quality degradation. The proposed system, fabricated on a 28nm test chip, classifies at 81.2% accuracy while consuming 51.9 nJ/frame in a 10dB noise context. Scaling up energy consumption by 3.5x improves accuracy by 5.2%.