Thursday, September 3, 2009

Integer comparisons faster than string comparison in .NET

NaroCAD is as a high level diagram a high level CAD/CAM like framework on top of OpenCascade. Naro can stand independently of OpenCascade, it can have by design parametric modeling, a tree XML like document. Also all changes of what you see in NaroCAD are stored in a persistent way with undo/redo events.

This abstraction gives a lot of advantages mostly when you debug and when you work with your classes as you can stop your program, save all your custom attributes to NaroCAD tree (and can be stored in a file). They are also exposed as generic classes, meaning that you will mostly get a really clean as design class like.

As we've got Lua added to Naro we did found much easier the limitations of Naro. One of them is that may be slow (we did previous tests on Naro to fix the Undo/Redo times) mostly because the attribute like convention is string based, also the Undo/Redo have a big ram increase as it store everything.

Once we had a benchmark test and we found that a lot of time is spent in OpenCascade and Undo time. But as there are lies, lies and benchmarks, I've tried to optimize the performance on NaroCAD regardless of other components. As there is a specific blog just presenting OpenCascade performance, and NaroCAD is a framework on top of OpenCascade, we tried to optimize performance on NaroCAD framework level.

For this we used a Lua script to test how it behave Naro on performance level:
i = 0
while(i < 40000) do
line(0, 0, 0, 100, 100, 100)
i = i + 1
end

First, some persons may ask: why you draw 40000 lines? Why not 40000 circles? The answer was the problem we wanted to check: is Naro framework enough fast for most users? Is .NET platform not only easy to use, but fast for common operations? Also, we did wanted to make other components to be our code limitation, like OpenCascade or Windows.Forms TreeView's update speed, not the NaroCAD framework. By optimizing NaroCAD's framework the visualization of shapes using OpenCascade will be as limited as OpenCascade is.

What was found after doing this testing?
- Lua interpreter is really fast, I can say that you will feel it as real-time for even complex code
- OpenCascade it feels slow. It is CPU bound and the entire logic regarding OpenCascade we had, took a lot of CPU time (close to 30+ % before optimization to close to 40% of entire operation time after we've optimized). This time was mainly (90% of OCC time) one function: context.Display(AIS_Shape, false);
- NaroCAD wrapper have no penality regarding speed, but for small objects (like in our case, was a lot of small points), the Naro wrapper store a bridge pointer which may add a bit to memory consumption. So the solution is to store as much as possible from separate words in their worlds. I encourage you to write as much as possible in .NET language of your choice as is easier to debug and finish this code.
- Log.net's Debug function for this drawing task it took a hefty 5% of time. May not sound much, but a lot other operations took much less. This is because of disk IO I think and String.Concat calls that are made inside
- NaroCAD attribute lookup was fast, because was mostly binary search to locate a node and binary search to locate an attribute, but pays back on large(r) attribute/shape count. This is because we use SortedDictionary class (equivalent with C++ std::map). I've used strength reduction for calls, and right now attributes are located via integer operations instead string one. This in itself speeds up the Naro code by 25%. This is also combined with simplified attribute declaration and a mapped name for attribute. This will mean also that at first time you create an attribute and you add to Naro's document tree, you will be able to save this document.
- Undo/Redo used a lot of memory. And was slow. It was somelike for every shape you have declared in the Naro's document tree, you should have around 1.5 times more memory used for storing the Undo/Redo information. Right now we compute minimalist data needed like: we compute Undo but not Redo until is not needed. Also, the complex attributes like: Transform, Point3D, Layers are saved using helper methods that save/restore much faster arrays of integers and doubles. Those optimizations did make computing a 40k shapes tree to speedup with 75%. Also will mean a small memory decrease, depends a lot of your usage and the garbage collecting policy. For sure you will see some memory free. For debug purposes, I've make diffs to show their changes as an XML, but there is no visual way to see in NaroCAD. But may be interesting for users to see what changes they've did or what means an Undo for them.
- I've tried extreme solutions for Undo/Redo like zip-ing in-memory the diff datas, but I'm not sure if my algorithm have a bug, was a SharpZipLib bug or a .NET memory policy/bug, but the RAM usage was increased and the "gained" memory was added with the compressing/decompressing diff times. I've tried a forced GC.Collect to make things better but no changes.

Those code techniques will improve long usage/high count primitives usage cases that may appear in using NaroCAD. Also may make persons that considered OCAF tree was fast(er) than any "managed" or "interpreted" implementation to reconsider and to give to NaroCAD framework a try. It may be surprised in a good way about it's versatility and performance.

No comments: