5. Netcode With Relay (Write Better Netcode)

After I explained the Statemachine system in-depth in the previous article, it’s time to deep-dive into code details and the overall flow of the Netcode Statemachine:

Netcode With Relay Statemachine Diagram — *(click to view full resolution)*

The Netcode Statemachine

I’ll walk you through each State, and its Conditions and Actions. I’ll also provide rationale why this Statemachine helps to write better, safer code.

If I had to sum it up, I’d point to the graph above. In my experience, it allowed me to follow its logic visually, which is incredibly helpful!

I found loopholes and optimization opportunities much more easily than if the flow were dispersed throughout textual C# code. I was able to condense the initially “designed” logic to (my best guess) 30% less after two cycles of “Oh, I don’t need to handle this separately!”.

This improved the readability and reduces the potential for issues.

The NetcodeState MonoBehavior

The Netcode Statemachine is in the NetcodeState component which I’ve added to its own GameObject. This is a pattern I frequently use.

Rather than cramming multiple components on the same object, I prefer to have one object per component for the sort of “central” components. This makes locating the component easier and let’s you focus on just that component’s properties in the Inspector.

Fields

We only need a reference to the Statemachine, and the configuration objects (structs with data) for some of the Conditions/Actions:

These “config” types are part of the C/As that use them. You can consider them data transfer objects to exchange data (state) between the Statemachine and outside code. I’ll detail those later.

Initialization

Upon Start, the new FSM() is instantiated with a name (required) and WithStates which is simply a list of strings:

I also allow the SM to perform multiple state changes in quick succession to avoid adding unnecessary latency. I was careful to avoid any looping back and forth between two states.

Since the State’s names were taken from an enum, I can then assign each FSM.State instance to a local var for readability.

I also define the config vars and assign them to their fields, as well as adding a “RelayInitOnce” flag used to prevent a transition from activating repeatedly.

Defining States

Just take any state variable and call AddTransition to create a transition in that state:

The name makes reading and debugging the SM easier, but other than that it has no purpose and could be omitted. For more details on the SM setup, read the previous article.

Start Your Machines

After we setup the States and Transitions, we have to call m_Statemachine.Start() once and only once to initialize it.

The Start() method verifies the integrity of the Statemachine and tries to find any logic loopholes, such as unused States or Transitions which want to activate a State that’s not in the SM.

Start() also calls the OnStart() event method in all Conditions and Actions, if implemented. The IStatemachineEvents interface provides a default no-op implementation. OnStart() and the corresponding OnStop() can be used to perform once-only initialization or disposal.

The MonoBehaviour equivalents would be Start() and OnDestroy().

Then the first state is set active, firing the OnEnterState() message for that particular state. There is of course a corresponding OnExitState() and these can be used to perform, you guessed it, code that may be needed to set or clean up the Condition/Action.

These may get called repeatedly, so their MonoBehaviour equivalents would be OnEnable() and OnDisable().

The States Of The Machine

Now let’s check each State and what it does in detail.

Initializing

The Initializing State exists to avoid any execution order issues:

Uhm … initializing what?

The NetworkManager.Singleton instance isn’t created until the component’s OnEnable ran. Thus the Statemachine waits until after that occured, if necessary.

Note: If you add a component to the same object as NetworkManager and call GetComponent<NetworkManager>() you can get the NetworkManager reference even in Awake. But doing so provides no guarantee that the NetworkManager is actually initialized.

Offline

The Initializing State moves to the Offline State:

Offline State: waiting to start an online session.

Here we’re waiting for a network connection to be initiated through code outside the Statemachine, be it a GUI button or the command line or Multiplayer Playmode.

Here’s the SM setup code for the Offline State:

I haven’t created IsRelayDisabled because it’s used only once.

NetcodeConfig

Whichever way we start, it’s assumed that it will change the Role field of the NetcodeConfig variable to anything other than None:

The Role also toggles starting and stopping an online session.

This is the NetcodeRole enum:

The None role simply signals that we have no role assigned. This is also used to end a session if the role gets reset to None.

The IsNetcodeRole condition is, as most are, rather simple. I’ll provide this one as an example but I’ll omit posting others that are similarly simple as this one:

Theoretically, I could have also used an IntVar but .. what role is 3? And then having to cast an enum to (int) .. not very readable.

It’s exactly because parameters like these are often related that I created the various config structs so I can easily add more fields.

IsRelayEnabled?

Lastly, the IsRelayEnabled condition simply checks if the UseRelayService bool is set in RelayConfig. We’ll get to that.

If we don’t use Relay, the Offline state moves directly to the NetworkStarting state. But let’s assume we intend to use Relay, in that case we take a small detour to the RelayStarting state.

RelayStarting

In here, we’re simply making the necessary Service calls to initiate a Relay connection, either as a Host (Create) or as a Client (Join):

Awaiting the Authentication and Relay Service calls …

This is the corresponding C# definition:

Initializing Services and waiting for the result of the awaited Actions.

The relayInitOnceVar toggles true to prevent repeatedly activating the Relay transition. This var gets reset back to false in the compound action resetNetcodeState which runs when we’re either leaving the RelayStarting state or stopping the session.

Note that we need this safeguard because this is a self-transition: it doesn’t change State. And we also won’t be leaving this State for several more Statemachine updates.

I’ll just skip the Services initialization because it’s trivial. Instead, we focus on SignInAnonymously:

As you’d expect.

Since it implements IAsyncAction it has to implement the ExecuteAsync method returning an awaitable Task. Here it simply returns the Service call’s Task.

It is crucial to check whether we’re in the position to make a Service call. If we’re already signed in, we don’t need to sign in, and we actually can’t .. because trying to sign in while being signed in will throw an exception!

On the other hand, it’s safe to assume that the AuthenticationService.Instance is non-null. If the UnityServicesInit call would have failed (and it can!) then the Transition would already have activated the Error state and won’t even get to execute the SignInAnonymously Action.

Relay: Create Or Join

The Relay code is trivial. It would complicate the SM if we had to add Transitions checking our NetcodeRole – therefore RelayCreateOrJoinAllocation makes this decision internally:

Relay: To Host or to Join, that is the question.

The NetcodeConfig‘s Role is used to decide whether we host or join. The RelayConfig has corresponding methods to set a host or join allocation, which will null the respective other allocation to make sure we don’t hold onto an old one unnecessarily.

The RelayConfig has these fields:

RelayConfig fields. Methods omitted.

The JoinCode field will later be used to present it to the host user for sharing. For now, we just move that data from the service calls into RelayConfig and call it a day. Some other code will make use of it.

Again, we needn’t worry about exception handling within the Action because the Transition takes care of this for us.

Waiting For The Awaitables

Let’s quickly focus on the Transition that leads out towards the NetworkStarting state:

Initialized? Check. Signed in? Check. Relay? Waiting, waiting, …

It’s obviously waiting for both the user being signed in and Relay being readied. But why?

The reason are the awaitable ISyncAction with the Service calls. In a single Statemachine update, we began awaiting three calls: UnityServicesInit, SignInAnonymously and RelayCreateOrJoinAllocation. These will be awaited for several Statemachine updates since at least the latter two send and receive network messages.

Hence the waiting, but also the RelayInitOnce var to prevent these Actions to run on every update while we’re in RelayStarting.

Note that technically we needn’t check for IsSignedIn as that is a prerequisite of IsRelayReady becoming true. Still I found it best to express that we’re waiting for both to succeed.

And yes, strictly speaking there ought to be an AreServicesInitialized condition. I took the middle ground.

NetworkStarting

Now that Relay is ready we move to the NetworkStarting State. And if we weren’t using Relay we’d already gotten here directly.

Are you not listening?

This is what it looks like in the Statemachine setup code:

Let’s start with the misnomer: IsNotListening. Aren’t we all?

Listening is a technical term for having an open network connection. I might just rename it to IsOffline but that’s also pretty vague. IsListening is a NetworkManager flag that is set true the moment you call StartServer, StartHost or StartClient.

Which we do after TransportSetup.

TransportSetup

This Action is responsible for configuring the Transport. I want to handle all common cases: with or without Encryption, WebSockets, Relay and who knows what else – all in a single, reusable Action:

Encryption is not yet implemented. But adding that will be trivial and non-disruptive since only data needs to be added to TransportConfig and the corresponding strings (secrets, certificates) applied. No Statemachine changes needed.

_{With encrypted WebSockets, the connection type must be “wss”. That’s the simple double ternary in the code. What a lovely phrase: simple double ternary. 🤓}

For the rest, the code checks if we’re using Relay or not. With Relay, it calls SetRelayServerData with either Join or Host allocation. Without Relay we merely set the Address, Port and Listen Address.

Pretty neat and simple once you decompose those Transport calls.

_{The ServerListenAddress needs to be set to “0.0.0.0” if you want your server, be it dedicated or host, to be reachable by clients. Not using this third parameter is a common source of “client’s can’t connect to the host / server” issues because SetConnectionData has made that third parameter optional, with the default being interpreted as “127.0.0.1” (localhost). This will allow only client connections from local processes, and fails if the client is not on the same machine.}

NetworkStart

With the Transport configured, we can now start networking:

Start with your Role of choice …

Based on the NetcodeConfig Role we call the appropriate Start method. The only curiosity, and often overlooked fact, is what happens when a Start method returns false?

What, StartXxxx can fail? Yes, absolutely! Although rarely but good production code ALWAYS considers every possible failure point to occur and handles it accordingly.

Here, I deliberately throw an Exception if start networking failed, to cause the exception handling in the Transition to get triggered. Otherwise we could be deadlocked in the NetworkStarting state.

Error Handling

And that is why the ToErrorState and ErrorActions exist in the Network Starting Transition:

NetworkStart might fail!

Any error (exception) activates the offline state and executes the compound Action ResetNetcodeState. This ensures that the role is None again and any Allocation in the RelayConfig var is null’ed:

Now, if an error were to happen, we can simply start over again.

_{Pro Tip: Always handle all expectable Exceptions! More precisely, every I/O operation – be it File I/O or Network I/O – may fail and REQUIRES exception handling without exception nor excuses!}

Started: Server Or Client?

Assuming everything went okay, we still have these two Transitions that lead us out of the NetworkStarting state:

They activate the ServerOnline or ClientOnline states, respectively. Also note that if we’re playing as the Host, logically speaking we’re the Server. The Netcode Statemachine has no need to distinguish the Host separately from the Server.

Both Conditions look alike, here’s the IsLocalClientStarted:

Do note that the transport failure event shuts down NetworkManager, so it has to be handled in this Condition because this will also stop the client.

Now we don’t have to separately handle a Transport Failure anywhere within the Statemachine – it’s just a sort of automated “Stop” event that we automatically handle. We don’t get any information other than “transport failure” which we might relay to the user generically phrased as: “connection interrupted”.

_{I’m a fan of telling the user what the situation is in a way that he understands. Avoid technical developer jargon. We may understand what a transport failure is, while the user doesn’t even know what a Transport is (Beam me up, Scotty?).}

ServerOnline

Let’s assume we started a Server and it got started, then we’re in the ServerOnline state:

The corresponding State setup:

Since NetcodeState is only about handling the transition between online and offline, we don’t need to do anything but to wait for the session to end.

For the server this happens either when IsLocalServerStopped (remember: this is also the case if we encounter a transport failure) or when some outside code deliberately set the NetcodeRole field back to None to end the session.

NetworkStop

If either case becomes true, we execute NetworkStop. This simple Action merely calls NetworkManager.Shutdown:

Note that NetworkManager ignores the Shutdown call if it’s currently shutting down or already shut down. So we’re safe to call this regardless of the current NetworkManager state.

_{I try to double-check such behaviours. Fortunately, we can just navigate into the corresponding code since NGO is open source.}

ClientOnline

The client is a bit more involved. Once the Client is online, it still needs to wait for its connection to be approved:

While waiting for the connection, the client could be stopped, either due to a transport failure or by manually cancelling the connection. That’s why we also handle Client stopped here. I won’t repeat this code because it’s effectively the same as for the Server.

_{While writing, I discovered that I have a loophole here: the case of a client getting rejected via connection approval is not handled. In such a case, a ClientDisconnect event is received. The likely course of action is to create a condition that checks if DisconnectReason is non-null in the disconnect event.}

IsLocalClientConnected

This condition is similar to the Started/Stopped conditions as it hooks into NetworkManager’s OnConnectionEvent:

This waits for ClientConnected and ClientDisconnected events and sets the m_IsClientConnected flag accordingly.

Only notable thing is that we use the convenient switch expression style available since C# 8.0. The latter underscore part just assigns the existing value – seems odd but otherwise I’d get a warning the the switch “does not handle all possible inputs (it is not exhaustive)“.

ClientPlaying

Once the client is connected, we move to the ClientPlaying state:

This is essentially the same as the ServerOnline state. The only difference is the additional IsLocalClientDisconnected condition. This might become true if the client gets kicked by the server.

Just like for the Server, we call NetworkStop if one of these conditions combined with logical OR becomes true. If so, we move to the NetworkStopping state.

NetworkStopping

In NetworkStopping we check whether we went offline:

You may be wondering why we aren’t going directly to the Offline state? After all, we did run NetworkStop.

The reason is that Shutdown is not an instant operation!

At least not for the Server (or Host) since the Server has to wait for remaining client messages, and client disconnect events, and possibly a timeout if a client happens to become unresponsive at this very moment.

IsNetworkOffline

To be on the safe side, IsNetworkOffline checks for all possible flags that may indicate whether we’re still not fully offline:

First, if NetworkManager happens to be null, we have to assume we’re offline.

Both ShutdownInProgress nor IsListening were set after a shutdown in my tests, but I’d rather be safe than sorry.

The main indicators whether we’re still in the process of shutting down are the IsServer and IsClient flags which, according to the NGO source code, are reset last.

Only if all these checks pass can we safely assume that we’re fully offline, and thus ready to start a new network session.

_{A common failure point is to call NetworkManager.StartXxxxx directly after NetworkManager.Shutdown. This just won’t work!}

Starting All Over Again

Aaaaaand now we’re back in the Offline state:

It means we’re ready to repeat the same process all over again – either in the same way or differently. With or without Relay, with or without Encryption, as a Server, Host or Client, whether we’re still signed in or not.

The whole loop is entirely handled by the Netcode Statemachine!

Now we can move on to other things. For me, the next steps will be confirming that this works under all conditions, like a Web build.

To that end, I’ll have to add a minimalistic GUI (with UI Toolkit) and perhaps command line parameters before I get to actually spawn network objects.

Next …

Continue reading with ….

Return to the Write Better Netcode Overview

Source Code on GitHub (GPL3 License)

Join my Patreon – it’s free! Get the latest updates by email.

Leave a comment below if you have any questions or feedback!

CodeSmile