Prevent H12 and H15 errors when deploying Socket.IO on Heroku

It's 2017, you want to build a real-time web application. WebSocket is fairly mature but to cover your base, you have chosen Socket.IO. You happily typed away the implementation and deployed your first cut on Heroku. Then, the horror begins…

Your metrics and logs start throwing rows of H12 - Request timeout and H15 - Idle connection errors and you stare at it with disbelief. What the hell went wrong?

That was me yesterday and I have spent a day or so trying to resolve this issue. Here, I document my solution.

First of all, make sure that you did not hit the domain issue described here.

Also, ensure that you are using sticky sessions (or Session Affinity for Heroku).

Else, follow along…

heroku logs --tail

Reading my logs, I have discovered that H12 is caused by long polling implementation of Socket.IO. For long-polling and streaming responses, Heroku require the server to react within 30 seconds. Heroku also leaves the responsibility of detecting disconnected clients to the server, which must then promptly close the connections.

H15 on the other hand is caused by WebSocket connections which have no activity for 55 seconds.

As for Socket.IO, it sends ping/pong between server and clients to check the responsiveness of their connections. 2 values dictate how the server react to unresponsive connections: pingInterval (default to 25s) and pingTimeout (default to 60s).

// Server
import SocketIO from "socket.io";

const sio = SocketIO(server, {
  pingInterval: 15000,
  pingTimeout: 30000,
});

In my trial and error, I have discovered that the total value of pingInterval + pingTimeout should be below 55s to avoid H15, rather than just the pingTimeout. So I setup mine to be around 45s (15s for interval and 30s for timeout), giving my server a 10s buffer before hitting Heroku time limit.

// Client
var socket = io({
  transports: ["websocket"],
});

On the client, I have turned off polling. My previous tests hinted that Heroku might be recording server returned 408 Request Timeout as H12 but it is a hypothesis that I have not fully tested. Also, its 2017, the need for polling should be diminishing…

With this setup, I was able to remove both H12 and H15. I have not fully tested my hypothesis and solution. If you discover anything interesting, I love to hear. Hope this help!

Cheers!