Deep RL · Perp Funding Farmer
A PPO agent trained to harvest funding carry across 40 crypto perps with dynamic directional hedge. Advanced ML with a real reward signal.
Most RL-for-trading tutorials are toy environments with fake rewards. This one trains on real perp funding data with real transaction costs — the learning curve is harsh and informative.
PPO agent trained to farm perp funding across 40 contracts
Why this works
Perp funding is a structural cash flow from longs to shorts (or vice versa) that shows up reliably in crypto. RL fits here because the decision is sequential — when to scale in, when to hedge, when to exit — and the state space has enough non-stationarity that rules-based approaches miss the regime transitions. This is the tutorial for teaching RL with a real financial reward signal, not a toy env.
Common pitfalls
- Training on 30 days of data. Funding regimes shift on 90+ day cycles; shorter windows overfit to a single regime.
- Using reward = raw PnL. Subtract turnover cost or the agent learns to flip positions every tick.
- Deploying without OOS validation. RL policies look spectacular in train; check the full walk-forward curve before risking capital.
Try it yourself
Fork the template into your workspace. The entire configuration — code, parameters, backtest window, cost model — lands in a new private session. Tweak it, break it, and see how robust the edge actually is.
Backtest result
Equity curve
PPO, 128-hidden MLP. State: funding z-score, OI delta, basis, realised vol. Reward: PnL − 0.1*turnover. Trained 2M steps.
Related tutorials
BTC/ETH Pair Trade · stat-arb with a Kalman filter
Classic cointegration trading on the crypto majors, with a dynamic hedge ratio and funding-aware position sizing. Demonstrates why static OLS hedging fails.
LightGBM Factor Stack · CSI 300
Gradient-boosted nonlinear factor interactions on A-shares, with proper walk-forward validation and turnover caps. A working production ML template.
Fork it into your workspace.
The whole template — code, parameters, backtest config — lands in a new private session. Tweak it, run it, break it, learn.