How NAT Hole Punching Works


Trying To Avoid The Relay Service

In a Unity networked game you’d like to connect clients directly, bypassing a costly Relay service that also adds latency.

For Unity users who read the Netcode manual about NAT Hole Punching I also provide some corrections / amendments.

The STUN Protocol

NAT Hole Punching is implemented through the STUN protocol.

STUN provides no guarantee that a connection can be established between two clients – this depends on both clients’ network topology and configuration. 

The graph below explains the complexity of the NAT Hole Punching process:

NAT Hole Punching process diagram
By Christoph Sommer – Own work, CC BY-SA 3.0, Source: Wikimedia

There are no less than seven possible outcomes (rounded boxes)!

Three of them lead to a failure (red), three of them are successes (yellow) and the last (green) is the rare case where a direct route is possible.

The direct route means either the client has port forwarding set up, or the client is connected to the Internet directly, no router nor firewall. Either way, Hole Punching is not needed in this rare case.

Hole Punching Requires A Server!

Initially the clients cannot talk to each other, thus they have to find some way to exchange their connection information.

This is only possibly by relaying that information through a public server. The server is only used to initially establish the connection.

I specifically mention this because I’ve read numerous times that developers intend for clients to connect to each other without the use of a public server. It ain’t gonna work.

So who is providing a STUN server?

Free STUN Server List

There’s a STUN server list but it makes no guarantees to each server’s availability or location or responsiveness or limits.

If you search around you’ll likely find STUN servers from reliable sources. But most importantly for a world-wide game you have to consider using a STUN server that’s as close to the clients as possible.

Unless the STUN service integrates or helps you with choosing a regional server, you may have to look up the region by the host’s IP address and then choose the STUN server accordingly.

If you don’t pick a STUN server that’s close to the host, it will decrease the chance of success and also slow down the connection process.

The TURN Protocol

If STUN fails, you could resort to the TURN protocol. Or so you might think …

However, with TURN, all game traffic is routed through the TURN server, which also adds latency to the communication and it’s a paid service.

Hmmm … doesn’t that sound strangely familiar?

Correct: From our perspective you can consider Unity’s Relay service and a TURN server as equals. Both relay all traffic to clients, both add latency to all clients, both are paid services.

If you do happen to find a free TURN server, you can expect it to be incapable of handling sustained realtime game traffic.

What About ICE?

ICE is not actually a NAT Hole Punching protocol but a means to determine if clients can directly communicate or whether they ought to use a relay service instead.

In other words: ICE decides whether TURN or STUN is used.

That makes ICE meaningless from our perspective, as if you can’t succeed in NAT Hole Punching with STUN you know you’re going to have to use Unity’s Relay service.

How To Use STUN and Relay?

Can you use both STUN and Relay together?

No Per-Client Relay Fallback

Unfortunately, you cannot simply try to connect a client via STUN first and then fall back to using Unity Relay on a case-by-case basis.

This is because Unity Relay has to be enabled by the host before the network session starts so that the host gets a relay join code to share with clients.

Without that join code, a client failing STUN would not be able to join through Relay. But by enabling Relay on the host side, all traffic is already routed through the relay server.

Other P2P solutions, such as Epic’s P2P, also do not allow a single network sessions with mixed connections, where some connect directly while others connect via Relay.

There may be technical reasons for this, but it’s also more fair and consistent. Similar latency for all players, and everyone’s IPs are either obscured or not.

Make The Decision In A Lobby

This is only theoretical but I suppose it will work.

The host first creates a Lobby. A lobby is simply a group of players connected via the Lobby service but they don’t necessarily have an established network connection with each other. They may not even have decided who is to host the game.

As clients join the lobby, the Lobby host can run a STUN test with each connecting client. If STUN succeeds, the Lobby remembers that state for the given player in the Lobby player data.

When starting the game, the host would then check if any connected Lobby client requires the Relay service. If so, the host would enable the relay service.

Possible Implementation?

This is a hypothesis and needs to be proven. Feedback welcome!

Assume the Lobby host is also the game host. The host will start a Netcode session without Relay right away. Any Lobby client would also automatically join the network session after or with a successful STUN connection test.

If at any time a client joins that cannot punchthrough, the host will shut down the network session and restart it, this time with Relay enabled. All Lobby clients are told to re-join the session via Relay.

If the last Relay-only client leaves the Lobby and the game hasn’t actually commenced yet, the Host could then do the same as above but in reverse – restarting the session without Relay.

But as soon as the gameplay commences, the network session is locked in to work either with or without Relay.

Late-Joining Clients

For a client who was in the original non-Relay session, got disconnected, kicked or left and wants to join the same session at a later time there should be no issues. It’s already been proven that punchthrough works for that client.

But if the game session has already started without the relay service, any late-joining clients who cannot connect directly to the host can’t join the session at all.

Host Chooses Manually

You could also offer the host the choice of only allowing NAT Hole Punching connections before starting the network connection. This would work without a Lobby service.

Inform Your Players!

You ought to make both hosts and clients aware that sessions without Relay may be unreachable for some clients.

You don’t want to risk players calling out for support or leave bad reviews in such cases.

Hole Punching Privacy And Security

You should not overlook the fact that NAT Hole Punching leads to both clients being aware of each other’s public IP addresses. This is both a security and privacy concern.

Security Concerns

A malicious host could collect joining clients’ IP addresses. For an attacker it’s quite nice to know that this IP address is currently online!

Furthermore, the host knows the IP is from someone playing a specific game and can deduct things about the prospective victim.

For instance that the machine is likely running gaming-specific services, such as Steam, Discord, Twitch, and what not. Perhaps even a long forgotten game server background process with a known security hole?

The host may even be able to exploit a security flaw within your game, directing the game to, maybe, delete all the user’s savegames (griefing) or gifting all the clients’ virtual currency to the host.

Privacy Concerns

If a client has a static IP address it is generally considered personal data. Whether an IP is static or dynamic cannot be determined reliably.

This means you have to assume that all IP addresses are personal data regardless of legislation.

If the game itself stores the host’s or client’s IP addresses in the cloud, even temporarily, you ought to be aware of and abide by the governing privacy laws.

You may have to get user’s consent, or allow them to opt-out.

“Developer Defined Data”

The Unity Relay privacy page provides a glimpse into what needs to be considered in terms of privacy regulations.

Especially the “developer defined data” is where you take full responsibility – collecting IP addresses from Hole Punching clients falls under your responsibility too.

Do You Trust The STUN Server?

Including the STUN server!

Why? Because you can’t tell whether the STUN server stores the client IP addresses or not, and as the enabling party in this communication you take responsibility.

Therefore, don’t use a STUN server from an untrusted source!

NAT Hole Punching On The Web

Since Web games require the use of WebSockets, no direct connections between clients are possible to begin with.

Meaning: A web client cannot host a networked game session. No way to go around this.

Furthermore, in order to connect web clients to a session hosted by a non-web application, either everyone uses WebSockets and or everyone uses the Relay service.

Summary

The idea of relying solely on NAT Hole Punching for your client-hosted game is wishful thinking.

Many clients will not be able to connect to hosts, and some hosts may not allow any incoming connections whatsoever. Negative reactions and reviews to such a game are to be expected.

Integrating Relay as a fallback option when NAT Hole Punching fails is currently not possible.

If you can be patient, the Unity Netcode for GameObjects roadmap lists Hole Punching as “under consideration”. At least then you wouldn’t have to worry about unreliable or possibly compromised STUN servers.

PS: If you liked this article please check out my Write Better Netcode tutorial series.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

WordPress Cookie Notice by Real Cookie Banner