Friday, October 2, 2009
Augmented Reality (AR) has been all over the web recently, but is everybody talking about the same thing? Is everybody talking about the whole scope of AR or just a subset? Even inside YDreams we disagree on some of this…Wikipedia, while not the holder of the ultimate truth, is a good place to start from. The AR definition, as of 9/30/2009 15:11 (UTC), is the following:
“Augmented reality (AR) is a term for a live direct or indirect view of a physical real-world environment whose elements are merged with-, or augmented by virtual computer-generated imagery – creating a mixed reality. The augmentation is conventionally in real-time and in semantic context with environmental elements, [...]. With the help of advanced AR technology (e.g. adding computer vision and object recognition) the information about the surrounding real world of the user becomes interactive and digitally usable.”
Well, I agree with this definition even though it might be prone to several interpretations. I’ve highlighted the parts I think are important and I’m going to explain my interpretation.
I believe everybody agrees that AR has to be real-time. So, this excludes video post-processing. Videos like the Coca-Cola Avatar ad and street test are not AR but, as technology evolves, they may become real-time in a not so distant future.
The definition includes a “live direct or indirect view of a physical real-world environment”. “Direct” can be achieved by using “see-through” displays, goggles or contact lenses, showing the “computer-generated imagery” and allowing the “physical real-world environment” to be viewed behind it.
In the “indirect” method, images of the “physical real-world environment” are captured, the “computer-generated imagery” is blended into these images and the resulting images are displayed to the user. The capture can be performed by the camera of a mobile device or a cellular phone, or by the webcam of a desktop or laptop computer.
From the definition, blending computer-generated imagery in real-time is not enough. It has to be “in semantic context with environmental elements”. This means that, if the camera or any of the “physical real-world” objects move, the “computer-generated imagery” has to move accordingly. Great!…
Assuming that the “physical real-world” objects never move, e.g. real estate, only the camera movement needs to be detected. One way to do this is to use a GPS to detect location and a digital compass to detect orientation. That’s how it’s done by Wikitude, Layar, Sekai Camera, Bionic Eye, Monocle, Acrossair, Bradesco, Gule Sider, Cyclopedia, iPhone ARKit, etc.
If the objects move and/or the accuracy of the GPS and compass are not enough, the image detected by the camera can be analyzed to find the position of objects relative to the camera. Unfortunately, the computer vision algorithms are not as fast and accurate as our brains. We currently have to cheat by adding to the “physical real-world environment” a set of easy to detect markers. The technical name for these is fiducial markers. The most commonly used markers are planar black-and-white squares with an image inside, made popular by the ARToolKit and the Flash port named FLARToolKit. Int13 uses planar markers with circles on them.
This is the AR that has been hyped recently. But these methods have several limitations (mainly lack of accuracy on GPS-based apps and interactivity on the online apps) and may even create a backlash on the AR concept, making an earlier than expected dive on the hype cycle. I already went through this in the late 90s when the hype was all about virtual reality…
Just like regular bar-codes, fiducial markers don’t usually blend well with the environment and we can’t assume that all environments have them. Detection of the natural features is the way to go but these algorithms are computationally more expensive. Companies like YDreams, Total Immersion and Metaio are leading the market in this field of AR. Amazing stuff is coming out of research groups like Active Vision Laboratory in Oxford, Christian Doppler Laboratory in Graz, Computer Vision Laboratory at Lausanne, etc. I bet many surprises will come out of this year’s ISMAR.
The article “16 Top Augmented Reality Business Models”, by Gary Hayes, is an excellent reading about this broader view of AR.
YDreams is particularly working hard to make possible the last sentence of the Wikipedia’s AR definition: “the information about the surrounding real world of the user becomes interactive and digitally usable”.
Tags: augmented reality