In Week 11 of the SPL, we experienced broadcast-breaking issues during the Obey Alliance vs Luminosity match. Persistent technical problems forced us to take the broadcast offline and complete the set off the air. This report will detail what happened with this set, what tech issues led to the broadcast coming down, and what steps the Hi-Rez team is taking to ensure that a similar situation doesn’t happen again.
What Happened With The Broadcast?
Game 1 of LG vs Obey went off without a hitch. With LG up one game in the set, we rolled straight into Game 2. Eleven minutes into this game, spectator crashed into stasis mode. After a brief pause, we resorted to our “failsafe” for spectating matches and began broadcasting the game from the third-person player POV. We were able to keep this up until about 21 minutes into Game 2, when Joshy’s client crashed and he was unable to rejoin the match.
In instances of a complete game crash where a player is unable to rejoin, our league operations team can take two courses of action — either remake the game or automatically grant the victory to the team that was winning at the time of the crash. In these cases, ALL of the following win conditions must be met in order for a team to be given an automatic victory:
- 20 minutes or more into the game
- At least a 15,000 gold lead
- All towers down on the opponent’s end
- At least 2 phoenixes down on the opponent’s end
- Friendly team must have at least 1 tower up in all lanes
LG did not meet these win conditions at the time of the crash, so our league operations team asked the teams remake the game entirely with the same picks and bans. However, we started to experience persistent issues with loading into custom matches while trying to recreate this game. We made multiple attempts to get all players loaded into the game, but each time we created a custom lobby at least one player would crash out and be unable to rejoin. This caused a significant delay on broadcast.
After several failed attempts and lots of troubleshooting with the help of our game ops team, we were finally able to get all players loaded into their game and spectate the match. But three minutes into the remade game, we experienced another spectator crash, followed by another game crash after which players were unable to rejoin the match. By this time the broadcast had been delayed by almost an hour, and because we were unable to immediately identify the source of the persistent issues we were experiencing, were were unsure of how long it could take to fix them. Because of this, we chose to take the broadcast offline rather than having users sit on a wait screen with no update.
After going offline, we determined that our only options were to either move players to a backup server and try to finish the set, or to reschedule the match entirely. We also determined that if we did move players to our backup server, we would not try to broadcast the rest of the set for two reasons:
- We were unsure whether or not the persistent issues were in any way related to the fact that games were being broadcast.
- We did not want to force our audience to sit through a stream that could continue to experience game-breaking issues and broadcast delays.
LG and Obey were presented with these options, and the teams unanimously decided that they would rather continue their set off-air via backup servers than try to reschedule the match. So we moved them to a secondary server, remade Game 2 with the same picks and bans, and continued the set. Games 2, 3, and 4 were played all the way through with no further issues, and LG took a 3-1 set victory over Obey Alliance. While we were unable to broadcast any footage from the rest of the games in that set, we did release a play-by-play breakdown of each game, including picks/bans and post-game stat screens.
What Caused These Issues?
During our weekly server/spectate tests, the league operations team didn’t experience any of the technical issues that we saw later in the week — nor did we experience them during the matches on the Thursday prior to this incident. So everything seemed to be working fine, and there was no indication that we were headed towards a broadcast-breaking technical error.
Let’s get the obvious culprit out of the way. Unlike other delays/remakes we’ve had in the past, the issues we experienced during the LG vs Obey set were not related to spectator. While spectator did crash as a result of these issues, spectator itself was not the source of the problem. In this case, we were experiencing server issues that led to crashes with the Instance Manager, which affects everything from the game client down to the spectator client.
According to our game operations team, the Instance Manager on our primary server began crashing due to an OS failure in which the OS simply locked up and became unable to do anything. This OS hang was caused by the sheer amount of game files housed on the system. While the drive itself had plenty of disk space, there were a lot of game files on the system, causing the OS to freeze up.
At first glance this might sound like a simple oversight in server maintenance, but we’d like to break down how files are generated on this system so our audience can better understand how something like this might happen.
We have very specific configurations for spectating on our LAN servers. The files that get created when an instance is spectated are housed in their own directory so that there’s as little lag as possible and our spectator response time is as fast as possible. Depending on match length, we might generate upwards of 250+ such files for a single Match ID that’s being spectated. After a while, that starts to add up to an unwieldy amount of files.
While our operations team had implemented a clean-up script to help keep disk space free, it didn’t affect the folder that housed those specific game files. So in spite of consistent server maintenance, that dedicated directory was packed with game files dating all the way back to the server’s creation — circa mid-2018. As you might have guessed, that’s a ton of files.
Because we had a clean-up script in place to keep the servers properly maintained, we knew we never really had to worry about disk space. But unfortunately, we did not have an accompanying script to help us clear out that specific part of our directory and prevent this specific type of OS hang-up. Why is that? The answer is deceptively simple: because in the 6+ years that we’ve been running SMITE esports broadcasts, we have never experienced this specific type of hangup before.
With this being the first year where all pro league games are spectated on the same LAN server, it would make sense that this problem is just now cropping up instead of being noticed in prior seasons. This also explains why moving to a backup server immediately resolved the issue. Because it’s a backup server and we don’t spectate nearly as many matches on it (so there aren’t nearly as many game files), the OS was running smoothly and the Instance Manager was working as intended.
After clearing all the old game files off the primary server, the OS hangup was immediately resolved and the Instance Manager stabilized. We were able to load into custom matches and spectate them without any further issue. It’s unfortunate that we were unable to implement this fix in time to salvage the broadcast, but the troubleshooting process that led us to this solution took several hours.
How Are We Fixing It?
Fortunately, the solution to this problem is simple. Our operations team has implemented a script that will better maintain the directory where we keep game files generated by the Instance Manager. This script will be run regularly in order to clear out old match files at more consistent intervals. If our repeated server tests are any indication, this solution should prevent anything like this from happening in future matches — and now our team is aware that we need to keep a closer eye out for this type of OS failure to make sure that our LAN servers are performing reliably.
If by some slim chance this does happen again during future broadcasts, we now know that we can switch players to a backup server and spectate their games there to keep the stream going.
Okay, That’s Great. But…Spectator is Still a Problem.
While the issues with the LG vs Obey set were in no way related to spectator, we’re aware that spectator is another persistent issue that our development team needs to address. To this end, we’d like to share a little bit of extra information regarding the work that’s already been done to spectator this year, and the work we’re currently undertaking to improve spectator for the future.
Although the repeated spectator issues this year might make it seem like we don’t devote resources to fixing the client, we’d like to emphasize that following every spectator crash this season, we’ve isolated and fixed the bugs that have caused those specific crashes — and issues related to those specific bugs have not happened again once a permanent fix was implemented. A few examples include:
- The Morrigan bug. This issue took several attempts for us to fix permanently, but after several weeks of test fixes we were finally able to resolve the bug where Morrigan could crash spectator if games went over 50 minutes. (Reported in 2018, resolved in Spring 2019.)
- Gold Fury and Fire Giant UI elements not displaying properly. This issue was fixed within 10 days of being reported. (Reported in late April 2019, fixed by May 2019.)
- Objective secure messages display multiple times when teams secure objective. This issue was fixed just over 10 days after being reported. (Reported in mid-March, fixed late March 2019.)
These are just a handful of the larger spectator bugs the team has resolved. Just this year, we’ve reported and fixed over 50 issues with spectator. Unfortunately new issues can and do arise, but we are committed to addressing each of those issues as they come up. Our goal is to never experience the same issue twice, and our dev team works diligently to squash new bugs in hopes of minimizing their impacts on broadcast.
The Good News: Changes & Additions Are Coming to Spectator!
With that said, we’d also like to take this opportunity to let fans know that a small subsection of our development team is taking on a “Live Spectator” project that aims to rework and improve the spectator client for our esports broadcasts — not so much the UI and controls, but the core tech behind it.
Our current form of spectator uses a file system, where the spectator receives a single file that constantly updates the game state. But this new “Live Spectator” mode intends to connect the spectator to the game directly as though they are another player, and allow them to spectate all game states in real time. The real-time nature of this tech has a few pros and cons that will limit its applications. Since it’s a real time mode and does not save the match in a file, the rewind feature won’t work — but Live Spectator can be run at the same time as traditional spectator so the match is saved and can be rewatched later.
Given the real-time nature of this reworked spectator mode, we can never release it to general players for normal or ranked matches. No delay can be implemented, and therefore it could easily be used to cheat if we made it publicly available. It’s possible that we could implement it for custom matches, because then all participants could “opt-in” to a live spectate mode, but we do not want to commit to any specific release plan for Live Spectator at this time.
Initially, we will only be using Live Spectator internally and on specific accounts. SPL coaches will have access to the feature, but only in custom games. This should hopefully eliminate stasis mode for both our coaches and our production team, and ultimately decrease any chances of spectator crashes or server issues during our esports broadcasts.
Although this project is still very much underway, we wanted to share this much information to let our community know that spectator is still very much a priority for us. We believe that this Live Spectator initiative is our best chance of mitigating the persistent issues we’ve seen with spectator over the years, and we look forward to updating our community with more information when we’re ready to share it.
Once again, we’d like to apologize to our community for the technical difficulties that affected the LG vs Obey set. We’d also like to apologize to the pro players for any impact these issues had on the competitive integrity of their game, and we appreciate their patience and willingness to work with us as we tried to find solutions.
Moving forward, we hope that the fixes we’ve implemented will keep our LAN servers happy and healthy — and just to be safe, we now have many more eyes checking in to make sure those servers stay well-maintained and our broadcasts run smoothly.
We appreciate our community’s patience as we resolve this issue, and as always we will strive to be better and bring our fans the amazing content they deserve. If anyone has further questions about the information in this report or other outstanding concerns regarding last week’s set, please reach out to our esports community manager, TitanAuvey (Twitter: @auverin) to connect with us directly.