Skip to main content
Back to Tutorial
AdvancedCryptoML18 min readUpdated 2026-03-30

Deep RL · Perp Funding Farmer

A PPO agent trained to harvest funding carry across 40 crypto perps with dynamic directional hedge. Advanced ML with a real reward signal.

Most RL-for-trading tutorials are toy environments with fake rewards. This one trains on real perp funding data with real transaction costs — the learning curve is harsh and informative.

Fork the template to follow along:
TemplateCryptoML
Deep RL · Perp Funding Farmer

PPO agent trained to farm perp funding across 40 contracts

Sharpe
2.64
Return
+44.7%
Max DD
−9.3%
Forks
77

Why this works

Perp funding is a structural cash flow from longs to shorts (or vice versa) that shows up reliably in crypto. RL fits here because the decision is sequential — when to scale in, when to hedge, when to exit — and the state space has enough non-stationarity that rules-based approaches miss the regime transitions. This is the tutorial for teaching RL with a real financial reward signal, not a toy env.

Common pitfalls

  1. Training on 30 days of data. Funding regimes shift on 90+ day cycles; shorter windows overfit to a single regime.
  2. Using reward = raw PnL. Subtract turnover cost or the agent learns to flip positions every tick.
  3. Deploying without OOS validation. RL policies look spectacular in train; check the full walk-forward curve before risking capital.

Try it yourself

Fork the template into your workspace. The entire configuration — code, parameters, backtest window, cost model — lands in a new private session. Tweak it, break it, and see how robust the edge actually is.

Backtest result

Sharpe
2.64
Return
+44.7%
Max drawdown
−9.3%
Win rate
+68.0%
Trades
1,420
Days
180

Equity curve

Strategy
Benchmark

PPO, 128-hidden MLP. State: funding z-score, OI delta, basis, realised vol. Reward: PnL − 0.1*turnover. Trained 2M steps.

Ready to learn?

Fork it into your workspace.

The whole template — code, parameters, backtest config — lands in a new private session. Tweak it, run it, break it, learn.