Tuesday, June 23, 2009

Again fixes and a benchmark (part II)

I had focused this day for some of last remaining NewShapeModifiers issues like zero length input. But most of the day I tried to what is the minimalist part installing. This means installing Windows in VM (thanks VirtualBox!) but still I am not find either which are the default VC Runtime 2008 SP1 dlls and how to enable them. Was a frustrating experience regarding installers and the fact that Windows does not have either bundles (as OS X does) or a package manager to say that NaroCAD need this, this and that, and if you don't have installed, the OS will do this job for you. But I still searching.

Do you remember this benchmark? This shows the scalability of NaroCAD engine. And the main limitation seem to be Undo on high level count shapes.

Entering a bit under the hood it was very easy to see why it was such an amount time (and was growing) doing Undo/Redo diff.

The algorithm of doing Undo was this: at every Document.Transact() call, there was a call to make a copy snapshot of the entire tree. This is time consuming, at least on high count of shapes. But this is only the first part of the story. At the Document.Commit call, there is another entire snapshot of the entire tree. At the end, those two trees are compared branch by branch and leaf by leaf to make two diffs: Undo diff and Redo diffs.

This approach is for sure safe, testable, the tree can be serialized to Xml (so this is the format that is saved in xml files, is the entire scene saved) but for sure is slow and it does a lot of computations that are not needed. Also, working on big trees, will create a lot of small objects on heap, and mostly strings that fragment deeply the heap.

The new algorithm right now was started from the day I've seen this bottleneck and was almost finished in weekend, only today I had did test if all tests are passing and do following optimizations:
- AutoTransact is Enable by default (means almost never you will serialize your entire tree)
- modified nodes are marked as changed, so the computation is not on entire tree, but on nodes that are changed. This will make a small overhead on access but it will pay off on undo in it's worst case.

Let's see the results:


As it seems, the increase in time for marking is under 10% for high count shapes so most users will not notice. The decrease in undo time means that you can work adding shape by shape.
Also, as I looked more in details, the Undo time include the UpdateUI code, which is constant (like updating the tree), and removing them from counting will mean for example that for adding 10K shapes, Undo only time (using the new algorithm is aproximatively 800ms, means that for 10k shapes the new algorithm is aproximatively 2 times faster, and the difference increase as much as the shape count increase. Adding at 10 k shapes three shapes, Undo takes with the new algorithm only 15 ms (excluding updating the UI).

What seems to be still the bottlenecks?
UI itself, means that if I will hide the TreeView, to not update it and to not generate the entire tree, excluding the first time is used.
OpenCascade is still the main issue, OpenCascade seems to be inoperable on high count of small shapes. Probably some efforts will improve it. OpenCascade seems to use glVertex/etc. in it's code and does not rely on VertexBuffer Objects (known as VBO) so it will be till the future refactor CPU bound, as glVertex & co OpenGL functions need a lot of driver work (as need to do transformations on CPU or at least to push them explicitly to graphic card frame by frame).

Doing testing I solved a crash on Document.Revert case, that was pretty hard to notice (if Revert was done in an inconsistent scene state, the Redo to it will make scene crash as was inconsistent, so was not either a Revert problem or a restore problem, but was because Restore keeps in redo list the last state that was inconsistent. The Revert right now do not save this in the list).

No comments: