When we open our eyes, we immediately see our surroundings in great detail. How the brain is able to form these richly detailed representations of the world so quickly is one of the biggest unsolved puzzles in the study of vision.
Scientists who study the brain have tried to replicate this phenomenon using computer models of vision, but so far, leading models only perform much simpler tasks such as picking out an object or a face against a cluttered background. Now, a team led by MIT cognitive scientists has produced a computer model that captures the human visual system’s ability to quickly generate a detailed scene description from an image, and offers some insight into how the brain achieves this.
“What we were trying to do in this work is to explain how perception can be so much richer than just attaching semantic labels on parts of an image, and to explore the question of how do we see all of the physical world,” says Josh Tenenbaum, a professor of computational cognitive science and a member of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM).
The new model posits that when the brain receives visual input, it quickly performs a series of computations that reverse the steps that a computer graphics program would use to generate a 2D representation of a face or other object. This type of model, known as efficient inverse graphics (EIG), also correlates well with electrical recordings from face-selective regions in the brains of nonhuman primates, suggesting that the primate visual system may be organized in much the same way as the computer model, the researchers say.
Source: “A new model of vision”, Anne Trafton, MIT News