| |
| Electrical engineer
helps machines make sense of the world. |
Parham Aarabi (PhD 2001 EE ) has been interested
in connecting computers to the physical world since he was 11. He now directs
the Artificial Perception
Lab at the University of Toronto where he is an associate professor
of electrical and computer engineering. His research focuses on systems
that allow machines to make some sense of the world around them. Projects
include building arrays of microphones that can locate a speaker in a room
and software that can classify images based on what they depict.
Aarabi completed his PhD with advisors Vaughn Pratt and Bernard Widrow in
just two years. Technology Review magazine recently named Aarabi
one if its top 35 innovators under 35 years of age. In fact, when he was
hired at age 24, he became Canada’s youngest professor. |
|
|
What is Artificial
Perception? |
|
|
It is a term that I use to mean
computers extracting any kind of information about an environment that allows
them to understand certain aspects of the environment or the people within.
It’s extracting important information and making sense of it. |
 |
One source of that information
is sound. Why is that something we want machines to “perceive”? |
| |
Speech recognition would be useful for many applications. It would be a lot easier to simply tell your car to turn the radio station off or to change the channel than pressing buttons. For using computers or using appliances, speech is an interface that comes very naturally for us humans. For a long time we’ve had to adapt to learn to use computers. With speech the goal is to make computers adapt to how we communicate. |
 |
You’re not just doing speech recognition.
You are also localizing it in a room. |
|
|
Yes. My general work is extracting useful information from noisy sources.
The problem with speech recognition is — we have systems that companies
make commercially available which do not work well in very noisy environments.
Not, for example, in an office building where many people might be talking.
So what I try to do is design systems that in very simple ways try to mimic
how we humans communicate through speech.
We focus on a single conversation and we tune out other conversations so that we can understand, even in a very noisy environment, what the other person is saying. It boils down to listening in a specific direction — and you have to know what direction you want to listen to — and trying to tune out or cancel voices and sounds from other directions. There is the first step of localization — finding the direction you want — and the second step of speech enhancement.
|
 |
What is another example of what you do at the
lab? |
| |
My speech work has been mostly in the past two to five years. More recently
in the last two years my group and I have become very heavily involved in
searching images. As an example, consider a very large database of images
that are not all tagged. So you don’t have a person sitting there
writing that an image contains an apple and a flower. How would you search
this database if you wanted to find a flower? We’ve focused on trying
to extract the contents of these images – very simple information
relating one flower in one image to a different flower in a different image.
By these relational links that we produce we allow this database to be very
easily searched. So you could click on one flower and all of the flowers
in the database would automatically come up without having a human operator
directly describing what each image contains. |
 |
How did you become interested helping computers
perceive? |
| |
It all goes back to when I was 11 and my family was living in Atlanta
at the time. My parents got me my first personal computer, a PC XT. I remember
that after playing with all the games that came with it, I started taking
it apart and looking around at all the wires and poking around the back
to see what all these wires did. For the next few years I was intrigued
by trying to connect devices to this computer. I tried to connect my exercise
bike to the computer so that when I would bike I would see a virtual image
going by. The faster I would bike the faster I would see these images go
by. I would pretend that I was biking on a lane or a road. So I was very
intrigued about making computers connect to the world. Later on I realized
that this connection is sometimes very hard because it is very hard for
computers to make sense of information. The automatic extraction of information
is somewhat difficult. Making sense of it is extremely difficult. This became
my undergraduate thesis, my Masters thesis and eventually my PhD thesis
at Stanford. |
 |
What did you work on while you were here? |
| |
I worked on sound localization using microphone arrays. The idea that if you have multiple microphones you can find out the location of the speaker has been known for a long time. What I tried to do in my PhD thesis ‚ the novelty ‚ was to answer the question of what if you are not sure about the location of your microphones? Some of the microphones could be moving around. Some of them could be faulty or broken. What if you had a microphone array that was either damaged partially, deformable, or was changing? Could you use those microphones to find out the location of the sound source? The answer in many cases turned out to be yes. You would try to find the location of the source and if the microphones couldnít agree on a location you would go back and see why they wouldnít agree and you would revise their positions or status. And then you would look again. So you would iterate through a series of microphone position estimates as well as localization estimates and after a few iterations you would obtain a good estimate of the location of both the microphones and the speaker. |
 |
Have you been back here since then? |
| |
I came back last year to give a talk. I love Stanford I must say. My two
PhD years at Stanford were the best learning experience of my life. I came
back last year to give a talk and hopefully I’m going to come back
again in a few months to visit some friends. It’s a wonderful place.
There is something in the atmosphere at Stanford. It is more than just the
weather. It is the people and the sort of environment that is so innovative
and intellectually stimulating. |
 |
Have any applications come out of the lab yet? |
| |
In the next few months my students and I are going to start a company
based on the image search idea. We have a Web site that is in an alpha testing
mode right now. We are trying to fine tune it. By next summer we will have
released this to the public. All I can say at this point is that it is not
exactly an Internet image searching site. There are some unique twists here
and there. It is certainly an idea that Google or Yahoo! might be very interested
in. It complements what they have, but doesn’t try to redo what they
do. |
| |
|