Nvidia's AI Turns Photos Into 3D In No Time

Converting 2D images into 3D scenes is a relatively old concept, but modern technology is soon making it a formality.

It’s a love story that has lasted for many years between the graphics technology enthusiasts and the magicians of Nvidia, which is constantly being cultivated with impressive technical demonstrations. And the green team just did it again with a rather eye-catching new proof of concept: Here’s Instant NeRF, a system capable of converting a handful of images into photorealistic 3D renderings in a matter of seconds.

The basic idea is not new; For years, specialists have been trying to transfer 2D images into a three-dimensional space, with very different results. The older ones have trouble producing identifiable images. But the latest ones, based on a technique called NeRF, are already able to offer visually acceptable, even very good, 3D conversions.

On the other hand, even the most advanced suffer from two often prohibitive problems. The first, as is often the case in AI-related projects, is computation time; These are often extremely heavyweight algorithms that can do this to take long to process a handful of images.

A computer magic trick in no time

The other concern relates to the angles available. With old-school systems, it’s difficult to get a clean result from all angles unless you’re using hundreds or even thousands of images for 3D reconstruction. Otherwise we end up with visual artifacts such as blurring or distortion on the final product (see video above).
In summary, this technique already exists, but it is the subject of constant compromise between quality and processing time; Today’s best systems are capable of producing a very good quality rendering in minutes, but it still takes many hours to train the model in advance.

But with Instant NeRF, NVIDIA promises to extract the best of both worlds to offer ultra-precise 3D rendering from just a few dozen photos, and most importantly, with up to 1000x the performance of the best current systems – or in ” a few hundredths of a second”!

To achieve this breathtaking performance, Nvidia developed a technique called “Multi-Resolution Hash Grid Encoding”. The company gives very little information about its activities. Broadly speaking, however, the concept is to create multiple small neural subnets, each of which is exponentially faster to train than a single mega-network.

Opportunities for use in all sectors

The proof of concept is already impressive, but the most interesting thing is that it has a lot of very concrete future prospects. For example, the next step will probably be to go up a gear, namely converting photos not into simple 3D renderings, but into real digital objects in 3 dimensions that can be manipulated by professionals.

For example, the press release explains that a system like Instant NeRF could be used in architecture or in the entertainment world. It could then be used to quickly create content on the fly; a few photos of a historical monument, and phew, here it is integrated into a game or a movie, in two stages, three movements!

Instant NeRF might even find functional applications. Think in particular of autonomous piloting; In fact, there are many machines, such as cars, drones or even industrial machines, whose level of autonomy and reliability depends directly on their ability to translate 2D images into a 3D representation of their environment. You can’t stop the progress!