This study applies machine learning methods to nationally representative Thrive by Five Index 2024 data to predict early learning outcomes among children exposed to socioeconomic and developmental risk. It evaluates how different types of indicators – including child, caregiver, and programme-level variables – contribute to predicting whether children are on track. The analysis compares multiple predictive models and assesses their performance in real-world contexts. It also explores how routinely collected data can be used to improve early identification of vulnerability. The findings contribute to a growing evidence base on how data science can support more targeted and effective early childhood interventions.
Key findings
Child and caregiver characteristics – including age, home language, and household resources – are the strongest predictors. Programme-level factors add limited value.
Why this matters
Better targeting can improve efficiency and ensure support reaches the most vulnerable children.
Who this is for
Government planners, data teams, and organisations designing targeting systems.
Author: Michelle Leal
Paper: On-Track in Early Childhood: Machine Learning Prediction of Early Learning Status in the Context of Socioeconomic and Developmental Risk
Download the paper