LightGBM Factor Stack · CSI 300
Gradient-boosted nonlinear factor interactions on A-shares, with proper walk-forward validation and turnover caps. A working production ML template.
Most "ML in quant" tutorials are data leakage with extra steps. This template shows the validation discipline that separates a deployable model from a cherry-picked backtest.
GBDT factor stack, walk-forward on CSI 300
Why this works
Tree-based models capture nonlinear interactions between factors that linear Fama–French style regressions miss — for example, the interaction between accrual quality and momentum in A-shares. Walk-forward training is the key discipline: fitting once on the whole history leaks future information. Turnover cap is what turns a paper strategy into something deployable with 15 bp round-trip costs.
Common pitfalls
- Training on the full history at once. Even with cross-validation, temporal leakage destroys out-of-sample Sharpe.
- Forgetting the T+1 rule on A-shares. Same-day close-to-open fills are illegal; your backtest must respect it.
- Letting turnover run free. Without the cap the model wants 600% yearly turnover; nobody pays for that edge.
Try it yourself
Fork the template into your workspace. The entire configuration — code, parameters, backtest window, cost model — lands in a new private session. Tweak it, break it, and see how robust the edge actually is.
Backtest result
Equity curve
Walk-forward 3y train / 6m test. Top/bottom decile long/short. Cost model: 15bp round-trip for A-shares. Rebalanced monthly.
Related tutorials
Momentum: the only factor that keeps working
From Fama–French critique to A-share rotation — a working primer on the most robust anomaly in finance.
50ETF Momentum Rotation · the most robust factor
Monthly cross-sectional momentum on CN blue chips with an Amihud illiquidity penalty. The simplest viable production momentum strategy.
Deep RL · Perp Funding Farmer
A PPO agent trained to harvest funding carry across 40 crypto perps with dynamic directional hedge. Advanced ML with a real reward signal.
Fork it into your workspace.
The whole template — code, parameters, backtest config — lands in a new private session. Tweak it, run it, break it, learn.