Hi Maarten!
The algorithm is a little bit different than usual and is not working in ways that find shapes and apply a combination of CV filters, but just one very tricky first pass that scans the whole frame and a second pass that collects the data. These can run in every frame or in some of the frames. There is no different engines for finding the markers and to track them - there is only one tracking engine that is low-latency and should work immediately.
It is originally designed to track a lot of 2D markers with fast speed and very low time to start the tracking. It is up to the user to decide whatever they do with the results of this. If you want to have vuforia-like experience and add the 3D pose tracking, then you need to use more than one of these small markers and triangulate. If you do not need a small marker to anchor the objects, but for example you want to measure the exact location of the viewer in a hall, then you better not do it like a vuforia marker, but use as many 2D markers in patterns as you want to and not only know where you are, but maybe even map walls around you by knowing what patterns you are expecting.
I am not sure yet how much information you can code into the marker itself, but it seems to be really limited with my current approach. When more information is needed, you combine more than one markers - but the markers I am talking about here, look very simple and should work even when they are small.
Strengths:
- Scales well for a lot of markers and I am not talking about tracking 10 of them simultanously, but much more
- There is no different recognizer and tracker. The tracker also recognizes and it has very low (close to zero) latency between first seeing the marker and starting to track it.
- The markers are really simple-looking, should be easy to track well with good quality
- Low resources are needed because of simple and low level operations, few passes and good asymptotic runtime.
- Scales well for tracking whole walls and maybe movements in a big (more than one room) environment should be possible.
Weaknesses:
- For encoding data in a tracked pose usually more than one marker is needed
- For pose estimation more than one marker is needed + geometry calculations
- I was not doing any lens distortion handling yet - methods should be added from open source things like ARToolkit5 or something. I think this must be some fast or constant-time operation maybe when done well, but I have never tried it…
- It is not done yet
I think this is the worst of the weaknesses as most of the information provided here are just speculations out from my head about how it will work.
I hope I can answer some of your questions with this. Could you tell me (and possibly others) what kind of native support we could get? It would be awsome to start experimenting and stuff.