›

DD bug-hunting help needed.

edited in General
Ok, so we're working with Unity right now to see if we can figure out the root cause of the bugs that have been plaguing Desktop Dungeons for the last couple of months.

Symptoms include:
-Missing textures
-Illegal operation crashes preventing the plugin from starting

This seems to happen across most browsers right now, although it started in Chrome and still remains pretty bad there.

What you can do: Play the game and, if you get any of those two bugs, post your logs in this thread on the QCF forums (don't worry, I explain how to get your logs in that thread).

I could also do with some advice and/or ideas on what I might be able to do to solve the problem. I'm tired of going crazy here... There are some things that it probably isn't though:

Straight up browser running out of memory issues: No. Chrome runs out of memory on texture 167 on my laptop with only 1 tab running. It did not do this 2 months ago. Firefox on the same machine runs the game perfectly fine with my usual 20+ tabs open. If there's a memory restriction, then it's got to be a sneakier issue, like the game only requesting memory per scene and our scenes being too small to give the plugin container enough memory to use (due to the game dynamically loading all textures and creating quads on the fly).

Changes to our rendering/material code: This is exactly the same code that was running fine for all our users 3 months ago. About the only thing that's happened is memory load has come down as I plugged memory leaks around textures that were being loaded multiple times. I have no idea how that could trigger out of memory issues...

So yeah, ideas and thoughts welcome. Or, y'know, just post error logs here and I'll copy-pasta them across to Unity support. If anyone needs a DD beta login (ie: the game doesn't crash on load for you), PM me and I'll sort you out.

Comments

  • How reliably can you reproduce the bugs? Memory issues are notorious for being difficult to predict, but I wager that if it has been around for long enough, you might have found a reliable way.

    It is strange that there is a difference between Firefox and Chrome, as I would assume that the core part of the plugin is identical, and it is merely the interface that is slightly different. Does Unity have plugins for other browsers?

    Unity also exports to a standalone binary format, correct? Does this have the same bug?

    One other thing I just thought of is: I would guess Unity uses some sort of garbage collection, so is it possible to explicitly do GC at some regular interval and see if that helps?

    I tried looking at the logs, but I am not very familiar with Unity and they don't mean very much to me. However, I quite like fixing difficult bugs and I can confidently say I haven't found a bug that I couldn't solve in my 10+ years of programming (I could have a small memory lapse here), so if you require some external help, I would be interested.
  • Not sure I understand exactly the symptoms.

    Does it break on chrome only? or on Firefox, IE and chrome?

    Does a stand alone build show memory leaks? Would be easier to monitor without the browser bumph around it the footprint.
  • The standalone has none of these issues. This is strictly a web plugin problem. Sorry, I should have made that clear earlier. The standalone used to have a memory leak caused by Unity duplicating materials as a side effect of using certain accessors, that has since been fixed and the standalone's memory usage is rock solid - comes down as the game is played and monsters/covering meshes are removed.

    I can reproduce the failed texture load bug on my laptop in Chrome, but not Firefox, I tend not to consider IE a real browser... I can't reproduce the texture bug on my desktop, nor can I reproduce the illegal operation crash on either machine (barring running the game in a browser while also running the Unity editor, which is a known issue anyway and probably a different crash). Users seem to be split into two groups: Those that can reproduce their particular problem reliably and those that can still sometimes play the game.

    Unity has plugins for FF, Chrome, IE and Opera. We've had users complain that it happens in all of them, other users complain that it only breaks in a subset. There seem to be no hardware similarities across the symptoms reported :(

    Nope, it's not the GC. GC calls aren't excessive, meaning the app isn't straining for memory. Even when I've bound a GC purge to a keypress and mashed that continuously, I still get the same problem. When profiling the game in the editor, it doesn't have any large memory spikes or issues around loading textures...
  • A few more questions

    Have you considered any security elements that might be the cause? There are a few differences that the Unity team highlight between the web player and the stand alone the two main ones been capped frame rate when not in full screen and the security sandbox. I haven't found a specific side by side list of the differences mind you.

    Do you load resource bundles from the web or are they all included at run time?

    From the conversation we had in cape town I understand you build all the geometry manually in your code, does this include the textures too or are they already assets that are accessed?

    Sorry to ask so many questions pretty tough to debug without that information.

    I ran DD in firefox with no issues. Will give Chrome a try today.
  • Questions are cool, at the very least they help me frame my thinking around the problem. Plus, even if some of this is stuff I've gone over before, it never hurts to take a second look at something I may have dismissed too fast the first time around ;)

    All our textures are already pre-packaged at compile time, we don't stream anything, despite players asking us to in order to make the game load quicker! The framerate thing shouldn't make too much difference, the only real issue I can see there is the game somehow getting too fast after the optimising I've been doing recently and maybe some internal Unity threading issues suddenly surfacing, although that seems unlikely, especially how people are reporting the errors when not in fullscreen anyway (at which point the framerate is limited anyway).

    The thing that really gets to me is that this isn't an issue that's cropped up due to something we changed: The game was working fine, all we were doing was adding content and changing menus, removing temporary filler resources as we did so. These symptoms have appeared over the course of browser and plugin updates, not major DD code changes... We went from stable to unstable with no alterations to the parts of the game that started breaking.
  • Glad the questions are not annoying :)

    These types of intermittent issues are always a pain to solve. Have you tried isolating elements of the game in test? Deleting whole sections till you get down to nothing see if you can find the aspect of the engine responsible. It might not be something you can solve, but perhaps it could be something you could work around. A system you avoid using.

    Other then just been curious and wanting to assist I really would like to know what the root cause is since we betting hard on Unity for a few of our up and coming projects (mostly for the cross platform thing aspect but not wholly) and as such defiantly want to know if there are some pitfalls.

    You said that in some cases the game crashed on start up? That situation if repeatable would be a easy place to try isolate the issue, just start up less and less until it disappears. Trial and error debuging sucks and of course it doesn't help if the issue is sporadic and tied to some sort of callback from the OS you don't control so the results might seam random.

    When I first looked at the logs I though it might be texture ram that was the issue but I see now one of the logs simply reported running out of ram loading a string resource. I might be stretching here but do you have any resource pre-building threads? If so are you sure they under any circumstance would not run away and build too many resources. One thing I noticed common in a lot of the logs was the BoxCollider could not be built as such an Object already exists. Is that something you expect? Like I said bit of a stretch but if there was some kind of exit semaphore, that when the wrong thread finished too soon, was missed it could trigger a runaway loop.

    I still need to test it on Chrome, hope to have a little more insight then.
  • Your suspicion about threading issues could be quite valid. It is possible that some part of Unity is not thread-safe, but you assume it is. I assume you have multiple threads, otherwise you probably wouldn't raise this. So have you tried to put some concurrency primitives in to limit concurrent system calls, especially those related to memory allocation?
  • I've been doing some testing. In the latest build (Beta 083), I consistently get the "content error" in Chrome and IE9 when the loading bar reaches 100%, but before the Desktop Dungeons login screen appears. Firefox loads the login screen fine, though.

    Using this tool (http://jsfiddle.net/BinaryCaveman/G5pv6/embedded/result/), I've gone through several previous versions of the beta (found here http://www.desktopdungeons.net/Game) in Chrome:

    Beta 072 (07 Sep) - Shows login
    Beta 073 (14 Sep) - Shows login
    Beta 074 (21 Sep) - Shows login
    Beta 075 (28 Sep) - Shows login
    Beta 076 (05 Oct) - Shows login
    Beta 077 (12 Oct) - Shows login
    Beta 078 (19 Oct) - Shows login
    Beta 079 (26 Oct) - Shows login
    Beta 079a (31 Oct) - Content error <-- Problem starts
    Beta 079b (01 Nov) - Content error
    Beta 080 (02 Nov) - Content error
    Beta 081a (09 Nov) - Content error
    Beta 081b (09 Nov) - Content error
    Beta 082 (16 Nov) - Content error
    Beta 083 (23 Nov) - Content error

    It looks like the change that is causing the problem happened between Beta 079 and Beta 079a. I suggest comparing these two versions to see what changes were made in 079a.

    I've attached my Chrome log file.
    txt
    txt
    ddlog_chrome.txt
    3K
  • edited
    Okay, update on this:

    I've been seriously bug-hunting, getting into numbered printf mode eventually. Give me breakpoints, code-stepping and memory inspection instead pls :(

    So, I discovered that all our graphical errors actually had nothing to do with graphics. The TL;DR version? Unity's Web Player has some sort of maximum Mono collection size issue that breaks Unity's communication with a lot of different things. It's not like the heap gets too big, because splitting the same data structure into two things neatly excised the problem (well, I'm assuming it only does this on most machines). So yeah, be aware that a cause of illegal operation errors in the Web Player can be structures simply getting too large.

    More in-depth explanation:

    DD has a lot of text. Seriously, 3176 separate localized strings (lots of those are multi-line monstrosities, full of jokes). Not counting names of characters or shops. This has grown over the course of the game as we added more quests, more items and more rewards. I am actually afraid to run this through a word-count because it might convince @Nandrew that he's written a novel...

    All these strings need to be localised. Eventually. So we loaded them into a dictionary and organised them via keys: A game object might need to know what it's called as far as the player is concerned, so it asks the TextManager for the string and font associated with the particular key that it knows to ask for. This is pretty standard stuff. It works great too, provided you don't run out heap memory in a sandbox environment... The standalone exe can simply shunt memory around and we don't actually give a crap how large that Dictionary eventually gets. For some reason (I assume it has to do with paging concerns) the web player's heap management is less awesome and different machines have different thresholds at which their heaps fall over.

    Except it doesn't properly fall over, nor tell you that this is happening. It just fails to allow other things to allocate memory as needed, which is what happened with our missing textures: Unity goes "Hey, I need a texture, creates memory for that, heap gets stressed out, Unity runs a GC and life is ok" but then it asks D3D for the data to load into that texture, which doesn't have access to the GC, so it goes "Hey, there's no room in this heap for me to do this kind of memory access" (because you need chunks of memory available for D3D to read from disk into, before it smacks stuff into the texture waiting in Unity-space) and promptly falls over. So you get things that look and behave like the texture you asked for in Unity, all the right size and stuff, but they're missing the actual texture data because D3D didn't have space to store that temporarily.

    The vertex buffer thing seems to be a related issue with render to texture setup and a stressed-out heap: Unity apparently builds a plane for the texture anyway, which then somehow gets converted into a vertex buffer of exactly 1GB in size by D3D when it's stressed out. I'm less sure of the actual mechanics there, but it's definitely RTT-related.

    So yeah. Strings will kill your game.

    The solution so far seems to be splitting our enormous Dictionary into smaller chunks. Right now I've got a rather derp solution that means that some key lookups will incur multiple "misses" across the stitched-together Dictionaries before returning a result, but it's not the end of the world. I initially worried that this would simply postpone the problem again and move it to really low-end machines who don't have space for even 1 Dictionary with 1000 entries, so I'll probably re-work it into a proper solution next week. Ironically enough, trying the make the World's Largest Switch Statementâ„¢ actually made the issue recur (and made me feel like a horrible programmer), which is what makes me think that it's actually heap management/pagination rather than sheer size.
    Thanked by 1EvanGreenwood
  • <whistles> Wow, that is one obscure bug! Congratulations on finding it.
  • edited
    because it might convince @Nandrew that he's written a novel...
    Neato, you could sell DD the "extended version" on steam with game art and joke book. hehe
  • edited
    @Dislekcia Wow. Well done facing off with that truly terrifying bug and coming out on top.
  • @BlackShipsFilltheSky: I reckon that the crash problems you were having with Broforce might have been a similar sort of problem - It could simply be that a collection has to be a crapload bigger before it hits the same problems in a standalone exe.

    I mean, 3000 strings aren't that huge :( We use up more texture memory.
  • @Dislekcia Yeah... thanks for the tip... I'll look into it... Our problem was more like an infinite loop than a crash (it freezes but the music plays)... but I'll see if there are any big data structures that might get unstable or something.
  • @BlackShipsFilltheSky: Have you got logs from when the crash occurs? Sometimes they're rather useless, but if you have general progression debug messages, you can use them to find out where a crash is happening at least.
  • @dislekcia hey awesome! Well done. Would love a programmer orientated talk at some point about memory concepts like the heap and how it affects games. Would help all the people who have vaguely mystical ideas about how memory really works.

    Also breakpoint debugging is possible with monodevelop and unity? Not sure if you were missing that. Though I agree great memory inspection tools is a bit of a silly thing to be missing from Unity.

  • Glad to hear the bug was found and resolved. At least it wasn't one of those bugs that disappears when you slow the program down with printf calls - those are "fun" to find.
  • edited
    @Dislekcia We've never been able to produce our freeze bug... so no logs (really sad face).

    @TheFuntastic I suspect because the Desktop Dungeons bug only occurred in the webplayer QCF had to resort to lots of printf (instead of breakpointing etc).
  • @TheFuntastic: I couldn't get Monodevelop debugging to work with the WebPlayer. Is that actually a thing or was I just being daft? I couldn't get the profiler to hook up to the WebPlayer either though: The illegal operation error would nuke the plugin container before the profiler got any frame data, so it was pretty useless :(

    Also, the debugging stuff is the only reason I'd try to use Monodevelop... I'm, uh, not a fan.
  • @dislekcia zing! lightbulb at the back of my brain goes off. Remember that profiling wasn't possible. I've also had to use printf statements before. Not fun.
    This page though seems to imply it is possible http://docs.unity3d.com/Documentation/Manual/Debugger.html .Could still be a difficult to attach the debugger though if it's happening on start up.

    Also from the feature list of unity 4 is "Remote Unity Web Player debugging", so I would say it's definitely possible in the new version. Might be worth a trial license install to see if things are better in four.
Sign In or Register to comment.