Automating Feature Preprocessing Beats Manual Tuning for Tabular ML
Study of 15 search algorithms on 45 datasets reveals evolution and random search outperform complex surrogate models for automated feature pipeline construction.
Automating feature preprocessing selection and ordering outperforms manual construction; evolution-based and random search algorithms lead.
- — Feature preprocessing order and selection critically affect classical ML model performance on tabular data.
- — Manual pipeline construction requires data scientists to make many sequential decisions with unclear payoff.
- — Auto-FP problem maps to hyperparameter optimization or neural architecture search frameworks.
- — Evolution-based algorithms achieve best average ranking across 45 public datasets.
- — Random search performs surprisingly well, beating many sophisticated surrogate-model approaches.
- — Bandit-based and surrogate-model algorithms underperform for Auto-FP despite success in HPO and NAS.
- — Bottleneck analysis identifies gaps between current algorithms and optimal preprocessing discovery.
- — AutoML tools show limitations when integrated with automated preprocessing pipelines.
Astrobobo tool mapping
- Knowledge Capture Record the preprocessing steps you currently apply by hand to tabular datasets, including order and rationale. Use this as a reference when evaluating Auto-FP tools.
- Focus Brief Summarize the key finding—evolution-based and random search outperform complex methods—and share with your ML team to reset expectations about automation complexity.
- Reading Queue Queue the full arxiv paper for deeper study of the bottleneck analysis section, which identifies specific failure modes in surrogate-model and bandit-based approaches.
Frequently asked
- The preprocessing search space appears to be irregular and high-dimensional, making it difficult for surrogate models and bandit algorithms to build accurate predictive models of performance. Random search avoids the overhead of model building and explores the space more uniformly. Evolution-based methods succeed because they adapt through mutation and selection without relying on learned surrogates.
cite ▸
Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang. (2026, April 17). Automating Feature Preprocessing Beats Manual Tuning for Tabular ML. Astrobobo Content Engine (rewrite of arxiv/cs.AI). https://astrobobo-content-engine.vercel.app/article/automating-feature-preprocessing-beats-manual-tuning-for-tabular-ml-619375
Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang. "Automating Feature Preprocessing Beats Manual Tuning for Tabular ML." Astrobobo Content Engine, 17 Apr 2026, https://astrobobo-content-engine.vercel.app/article/automating-feature-preprocessing-beats-manual-tuning-for-tabular-ml-619375. Based on "arxiv/cs.AI", https://arxiv.org/abs/2310.02540.
@misc{astrobobo_automating-feature-preprocessing-beats-manual-tuning-for-tabular-ml-619375_2026,
author = {Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang},
title = {Automating Feature Preprocessing Beats Manual Tuning for Tabular ML},
year = {2026},
url = {https://astrobobo-content-engine.vercel.app/article/automating-feature-preprocessing-beats-manual-tuning-for-tabular-ml-619375},
note = {Astrobobo rewrite of arxiv/cs.AI, https://arxiv.org/abs/2310.02540},
}