4. Netcode Statemachine (Write Better Netcode)


I finally cracked the flow of starting / stopping Netcode, with or without Relay, with all potential failure cases handled:
(click image to view full resolution)

Netcode Start/Stop flow diagram
This diagram was generated from C# Statemachine code!

I only needed to write a Statemachine system (read how that came to be). Not from scratch but from memory what I had created 20 years ago and refined over several years. Then I just had to add a PlantUML dump.

It was totally worth spending two weeks on this! 😊

In this article I’ll explain how the Statemachine works, and in the next I’ll dive into the actual Netcode within the Statemachine. In the future, I’ll aim for writing shorter, more frequent articles.

Statemachine Introduction

Already know what a Statemachine is? Feel free to skip this section.

In a Statemachine (SM), logic is divided into multiple States (S) where only one State is active at any given time.

A State is a container for Transitions (T). Transitions contain Conditions (C) that, if all its Cs are satisfied, execute the Transition’s Actions (A) and may optionally change the active State.

I will use the SM/S/T/C/A abbreviations to shorten the text here and there.

I clearly need to work on my mouse-writing skills …

Conditions and Actions are lightweight classes, often containing very little code (you’ll see soon). Cs return a truth value while As can run any code – including awaitable methods. They’re C# classes implementing either ICondition, IAction, or IAsyncAction.

My Statemachine has equatable variables (BoolVar, IntVar) and a Var<T> type for complex data. They are simple container classes. These variables either have local (SM) or global scope.

Variables can control the flow of the SM as well as any other condition. They’re also used to exchange (complex) data with the outside code.

“Running” the SM can be done in Update() or FixedUpdate(), or in a coroutine, on a timer, at an interval – it’s totally up to you. This allows for time-interleaved SM updates to spread the load.

By default, the SM update stops evaluating the active State’s Transitions every time a state-changing T activates. This prevents fast-tracking through several states.

Alternatively, the SM can stop as soon as it reaches a State where, after updating all Ts, there was no state change. This allows for multiple State changes in a single update but might cause an infinite loop (safeguarded, logged).

Statemachine Benefits

The main benefit of programming in States is reduced mental load!

Anything goes wrong in a particular State? You only need to focus on that State’s Transitions and its C/As. Whatever the issue may be, it cannot be hiding just about anywhere. This immediately narrows your focus close to the issue’s origin.

In addition, computational cost is limited to only the Cs that the active State evaluates, and possibly a few of its As.

The other major benefit is that you will program modular Cs and As. Much of that code is trivial. The more complex code still typically fits entirely on your screen.

With these building blocks, which can extend to combining multiple C/As and even entire States with transitions. Programming and debugging efforts tend to scale down over time.

One last thing to consider: the detailed implementations (C/A) and the flow of logic (SM, S, T) are clearly separated. Meaning you mentally switch between lower and higher levels of implementation.

For those like me with ADD this clear focus is a productivity boost.

Statemachine Code Examples

The Statemachine diagram at the top looks like this as C# code:

First and last states of the diagram not shown here. It’s a 1440p screenshot.

I couldn’t fit all lines in the shot. You can review the full code here.

It may take a moment to let this kind of logic click, not unlike LINQ (which is way, way more obtuse). There’s no ifs and elses nor trys, catches, and whatnots. But you can read this out loud:

If the Condition(s):

IsNotListening

is/are satisfied, then execute the Action(s):

TransportSetup
NetworkStart

Try to find this transition in the screenshots above as an exercise.

I’m aware that IsNotListening is ambiguous. I may refactor at a later time, perhaps to IsNetworkManagerNotListening (Bad boy!). I strive towards brevity first, disambiguation when the (expanding) context requires it.

Creating States

The Netcode Statemachine creates states from an existing enum by using Enum.GetNames():

Just supplying a list of strings is possible, too. I opt for an enum for type safety and it’s just easier to work with than const string fields.

Adding Transitions To A State

Pick one of the State variables from above, and call AddTransition:

LINQ-alike you can neatly chain calls. Most are optional. It is not necessary for a transition to specify a “goto” state – that makes it a self-transition and should be guarded against repeat execution.

Omitting Cs in a Transition will make the T to be always true.

You can nicely read this block of code from top to bottom:

The initState transitions to offlineState if the condition IsNetworkManagerSingletonAssigned() is true. If so, execute the Action UnityServicesInit().

Did you notice? The diagram lists IsNetworkOffline as the condition for this transition, which includes a NetworkManager null check. But while writing, I decided to create a separate condition just for clarity, as I often do.

Transition Exception Handling

You may have noticed that some Ts have a ToErrorState and WithErrorActions:

These are used for error (read: Exception) handling. If either of the above two Actions were to throw an exception, the T activates (goes to) the error state instead and executes the error actions. Error actions will typically need to reset variables.

Note that Conditions are required to catch all and not throw any exceptions. Instead, the C must always return a truth value. The logic flow of the SM should not be interrupted by exceptions.

In the above SM, many Cs use NetworkManager.Singleton which is initially null. Using IsNetworkManagerNotListening as the very first C in the very first State allows me to avoid try/catch this potential issue in all NetworkManager conditions.

If, however, you were to read a file’s contents in a C then any exception, like permission or file not found, needs to be caught within the C so that it returns false in such cases. Exception logging is recommended, at least during development.

There is still the issue of informing outside code of an error. For example, the GUI may need to inform the user about a connection failure. It’s under consideration.

Order Of Execution

Unless “random logic” is used anywhere within the Cs or As, the SM logic is fully deterministic by default.

Each State’s transitions are evaluated in order. Here in the diagram they are ordered from left to right:

T: Network Starting is evaluated first, then Server started, Client started last.

In C# code, the execution order is from top to bottom.

Conditions are combined using logical AND by default. The first C which returns false ends the T’s evaluation. This is the same early out behaviour that the C# compiler performs.

If IsNetcodeRole is None, the IsRelayEnabled C is not evaluated.

Later I’ll explain other logical operators for conditions.

If the Actions execute, they too run from top to bottom. This is just as important as we’re ordering statements in C# code:

A: First setup transport, then start networking.

You wouldn’t start networking without configuring the transport.

Naming States And Transitions

A State is usually ongoing. Whereas a Transition encodes a fact that happens at a specific point in time.

Therefore I prefer to name States with the -ing suffix (present progressive) wherever it seems feasible, such as:

  • Initializing
  • RelayStarting
  • ClientPlaying
  • NetworkStopping

Exceptions include Offline and Online states because frankly we know that these are ongoing continuously, and often too damn long.

Contrast this with Transitions where I prefer to use past tense:

  • RelayStarted
  • ClientDisconnected
  • NetworkStopped
  • Init Complete(d)

Transitions encode the thing that just happened, or happens momentarily. Verbs work fine in some cases, as in “complete” vs “completed” but may sound awkward eg ClientDisconnect.

I also use <object-adjective> (ClientDisconnected) rather than <verb-object> (DisconnectClient) because the object we act upon has greater importance. The adjective signals that we’re altering one of the object’s attributes.

Declaring Variables

Variables allow you to control the flow of the Statemachine where you need custom variables. You also use them to pass runtime modifiable data into Cs and As.

All variables inherit from the abstract VariableBase class. The base class provides default equality checks.

Variables need to be defined on the SM before use:

Not shown: You can also specify an initial value for a Var.

The Vars property is where SM scoped (local) variables are stored. There’s a corresponding GlobalVars property that all Statemachines share.

Global vars should be used with care but they can make communicating with other SMs easy. Global vars allow relaying information without an event system. It is best to prefix all global vars to avoid name clashes.

Speaking of which: Variables are indexed by name, with nameof(T) being the default for Var<T> types. Once a var is defined, you should prefer to use the returned instance rather than string indexing for obvious reasons (typos).

A benefit of storing variables within the Statemachine: at any point in time you can explore or dump all the SM’s variables without having to use the debugger.

Conditions & Actions

The modularity of Conditions/Actions is part of what makes a Statemachine system so wonderful. The C/A code tends to be minimal, and on the logic level you’ll work with a relatable keyword like IsLocalServerStarted.

Conditions

IsLocalServerStarted for instance is a relatively complex condition. It subscribes to three events: start, stop, failure.

Since events will occur outside the Statemachine update, it merely updates its private bool field based on these events occuring:

Never ‘misses’ an event by storing the state in a bool.

About transport failures: this calls NetworkManager.Shutdown, whether you want to or not!

So if you’re not handling transport failure events in your code, this won’t just end the session for the user but likely leave the app in an unusable state, still considering the player to be connected.

Handling Transport failures within this Condition as part of its state avoids having to specify extra “failure” Transitions in the SM.

Logical Operators For Conditions

Enclose Cs with FSM.OR to combine the contained Cs with logical OR, where the first C that’s true ends evaluation:

C: The T activates if client disconnected or stopped or the role is ‘None’.

The code for this is:

Code for the diagram above.

It may sometimes help to improve performance by merely reordering Cs for both AND and OR. The same is true for C# conditionals of course.

It’s best to check the C that’s “least likely to be true” first – unless perhaps if it’s heavy-weight. And the C that’s consuming the most CPU cycles should be checked last.

I always cringe seeing the simplemost bool (that’s almost never true) being the last one checked in a complex conditional. 🫤

You can do even fancier logical combinations – if absolutely necessary – by nesting two FSM.AND within a FSM.OR. You also have FSM.NAND, FSM.NOR and FSM.NOT available.

To NOT or Not To NOT

While you can negate any C by using FSM.NOT(condition) I find that readability is improved when there’s a separate negated C.

The difference in readability to me is obvious:

  • FSM.NOT(IsSatisfied)
  • IsNotSatisfied

There’s two ways to implement a negated condition:

  • subclass, override IsSatisfied, return negated result of the base class IsSatisfied
  • duplicate class, negate the conditional, best for one-liners: return !NetworkManager.Singleton.IsListening

Complex negated conditions should take the subclass approach. This is how IsLocalServerStopped does it:

For simple conditionals like IsNotListening, the negated C would be a separate class that does the same check, but negated:

Variable Conditions

For variables of type BoolVar, IntVar and FloatVar exist several Conditions as you’d expect.

  • IsEqual
  • IsNotEqual
  • IsGreater
  • IsGreaterOrEqual
  • IsLess
  • IsLessOrEqual

All of these accept a VariableBase as input and a value or another VariableBase as the compare value.

For BoolVar there are shorthand Conditions to avoid writing them verbosely as IsEqual(variable, true):

  • IsTrue
  • IsFalse

These only take a single VariableBase parameter as input.

As I said earlier, these classes are named as brief as possible. Perhaps I will have to refactor them eventually towards IsVarEqual. For now there is little ambiguity.

Actions

Let’s have a look at an Action, like the generic LambdaAction:

LambdaAction invokes a System.Action (sorry for the confusion).

The LambdaAction is intended for quick prototyping and once-off actions, since it’s not reusable. This simply invokes the System.Action that was passed into its ctor.

An Action’s Execute can run any code that “does something” – be it changing a variable value or calling a method. It should however only rarely use conditionals as the conditional logic should preferably be expressed through the SM itself.

Compound Conditions/Actions

A CompoundCondition or CompoundAction combines multiple C/A into a single, named action. This is great for brevity, readability (code, logs, diagram), and to prevent copypasta of the same As. Remember: DRY!

For example, ResetNetcodeState seen earlier is a CompoundAction used in multiple places. This resets any variables that need resetting when we go offline, be it unexpectedly or deliberately:

It’s like defining a ResetNetcodeState() method.

Awaitable Actions

Now what about actions that need to be awaited? We obviously have to handle these when starting a Relay connection:

Awaitable Actions

To prevent the Relay Alloc/Join Transition to run repeatedly while awaiting, the RelayInitOnce bool var is used to toggle the transition off after it activated once.

On first T activation, both SignInAnomymously and RelayCreateOrJoinAllocation are being awaited – one after the other, and both taking some time to complete.

The SignInAnonymously Action implements the IAsyncAction interface returning a Task:

The Transition knows it needs to await IAsyncAction instances:

The SM will receive multiple updates while an IAsyncAction is awaited. This has to be accounted for in the SM logic. Consider that the SM is polling every update: are we done yet? Are we done yet?

Looking at the entire state, you’ll notice that ‘Relay Started’ has corresponding Conditions:

Waiting until signed in and ‘Relay Ready’.

Here, since we’re merely checking the IsSignedIn bool and IsRelayReady also only checks if the HostAllocation or JoinAllocation field in the RelayConfig struct is non-null, checking those repeatedly is not a performance concern:

Relay is considered ‘ready’ when either host or join allocation exists.

I suppose all IAsyncAction can be checked for completion with a similarly simple conditional. Thus there is currently no need for IAsyncAction completion events.

In all failure cases during Init, SignIn or Relay allocation, this will trigger the error handling part of the Transition.

Variable Actions

Of course we’ll need to modify variables. There exists a set of Actions that perform the usual mathematical operations:

  • SetValue
  • AddValue
  • SubValue
  • MulValue
  • DivValue
  • IncValue
  • DecValue

All of these take a variable as input and another variable or a value as the operand.

And once more, shorthands for BoolVar specifically:

  • SetTrue
  • SetFalse

These only take a BoolVar as input.

Summary

That’s all you need to know about the Statemachine implementation I wrote. The SM code is here (GPL3 License).

I hope this wasn’t too distracting but I plan on using the SM system for many other things. Consider player authentication for instance:

Typical first-time user experience (FTUE). Source: Auth. Best Practices

You can see right away how that is best handled with a Statemachine. And that graph isn’t even including error handling, token expiration, unlinking an ID provider or account deletion.

Next …

Continue reading with 5. Netcode With Relay

Return to the Write Better Netcode Overview

Source Code on GitHub (GPL3 License)

Join my Patreon – it’s free! Get the latest updates by email.

Leave a comment below if you have any questions or feedback!

One response

  1. […] I explained the Statemachine system in-depth in the previous article, it’s time to deep-dive into code details and the overall flow of the Netcode […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

WordPress Cookie Notice by Real Cookie Banner