China's research enriches computing experience
China's research enriches computing experience
Zatni Arbi, Columnist, Beijing, zatni@cbn.net.id
Anyone can record the one-hour Liputan Enam news report from
SCTV, digitize it and store it on the hard disk. However, if you
want to get to the precise segment that contains the report on
the progress in identifying the party behind the cruel and
devastating bombings in Bali, you will have to rely on the fast
forward and reverse buttons on the screen and it will be a time-
consuming process.
Wouldn't it be nicer if we could record news stories on our
hard disk, run a program that would segment the entire session
into predefined chunks, create an index and then cut the entire
video file into smaller parts to be saved in different folders?
At the recent demo room set up by Microsoft Research Asia (MSR
Asia) at Kerry Center Hotel in Beijing, I had the rare
opportunity to see several of the interesting technologies being
developed at their lab, which is Microsoft's second largest
research lab. One of them was the very tool that would
automatically detect the transitions between the segments in a
continuous video recording and then cut them into separate,
individually indexed video files so that they could be easily
managed.
The process, however, is not as simple as going fast forward
and reverse. First, the audio needs to be normalized, so that
stronger voice will be softened and weaker voice will be
amplified. Once audio is normalized, the speech recognition will
detect the different speakers -- for example, the news anchors
and the interviewed officials -- and the segments that contain
too much noise to be processed. It will also detect segments
containing music or commercials. Then the sequences of the
segments will be shown in a bar with different colors. For
example, light blue represents the female news anchor, green
represents the male anchor, white represents a pause, dark blue
represents the voice of an interviewed official and so on. If you
want to listen to the interview, all you have to do is click on
the dark blue segment.
Then the voice recognition will analyze the content of the
speech of each of these speakers. This is a very intricate
process, in which the recognized words are compared to a set of
databases to determine whether the content is political news,
business news, cultural news, etc. In the demo that I saw, a
recent news report from CNN was broken down into a predefined
list of such topical areas. So, for instance, if I wanted to hear
President Bush' speech, I would just go to the politics folder,
find the video file and play it.
* Advancing the state of the arts
The video analysis was just one set of tools that were
demonstrated to around 80 Chinese journalists and around 20
journalists from other Asian countries during the MSR Asia Day.
There were dozens of other interesting technologies being
developed by its researchers. Across the room, for example, was a
demo of how we could quickly create a question and answer system
by first labeling around 400 models of answers for questions
found on the websites. So, unlike getting hundreds of web pages
containing the term "radicalism" with a search engine, the tool
demonstrated at the event would give a single definition of the
term based on all the information contained in the collection of
documents.
"With this tool, you can create a Q&A service very quickly and
automatically," I was told by the young Chinese researcher who
showed me the demo.
Earlier, Rick Rashid, Microsoft Senior Vice President for
Research Group, explained in his presentation: "When we use
Google to find an answer to a question, the search engine would
simply give us a list of web documents that contain the keywords
that we had entered. A tool being developed at this Beijing lab
can be trained to extract the right information so that we can
get the real answer to our question instead of just a list of web
documents".
The focus areas of research at MSR Asia include operating
systems, programming tools and methodology, databases and human
factors. However, in general, each of the researchers is free to
pursue whatever interest they may have. The overall direction was
probably given by chairman Gates, when he said: "The future of
computing is the computer that talks, listens, sees and learns."
Toward this goal, MSR Asia has been working closely with
universities in China, Hong Kong and Australia and has provided
grants to enable PhD candidates and PhDs to participate in
teaching, learning and research projects.
It is not a place for those who are more interested in
partying, obviously. As Dr. Ya-Qin Zhang, MSR Asia's Managing
Director, said at a group interview, the research institute
demands the best from their researchers and the beneficiaries of
its grants.
"When they submit a paper to me that they want to present in a
conference, I always make sure that it will be the most
prestigious scientific conference in the world," Zhang said. The
results have been quite encouraging. At the recent SIGGRAPH, for
example, 10 percent of around 50 papers presented at this highly
reputable conference came from MSRA researchers.
* Exciting new technologies
Are you trying to find a song the title of which you have
forgotten or do not even know? All that you remember is a part of
the tune, and the song is stored inside your hard disk. One of
the interesting tools shown during the MSR Asia Day would let you
hum the tune and then would find the song for you. It would not
matter whether you hum the beginning, the refrain or the end of
the song, the tool would find it.
"In the future, it will be possible to connect to a website,
hum the tune on your computer and ask the application on that
site to find the song that you are looking for," the MSRA
researcher told me.
Or, do you have a few photos that you want to combine into a
video? This tool is already made available by the researchers.
All you have to do is input the names of the photo files, the
effects that you want to apply, the required parameters such as
the speed and trajectory, and the music that will accompany your
video. The tool even has the intelligence to detect a human face,
and will zoom in to create a special focus. I was quite impressed
by the result of the automatic combining of still images into a
video.
Or, do you want to automatically shorten the home video that
you took as a novice some years ago by throwing away those boring
segments? The tool is also available. It uses complex processes
such as video content analysis to identify key frames, and use
these key frames to create a more exciting, more watchable video.
Some people from the media have conjectured that the actual
motive behind Microsoft's decision to invest in a research lab in
Beijing was to win the heart of the Beijing government, which had
reportedly stipulated the use of mainly Linux in their offices.
However, having seen such a broad range of research projects that
the center has produced in a period of four years, I think the
results are much more important than the question of the real
agenda.
Like in any leading-edge research lab around the world, not
all of the technologies and tools invented and developed at MSR
Asia will eventually be available commercially. Some will, and
some will not -- because, perhaps, a competing technology may
have emerged and made a commercial product more viable.
Obviously, to fund research projects such as the ones that MSR
Asia has engaged in requires a lot of cash, which a company like
Microsoft can only expect to get from the sales of its software
licenses. Having seen the work that they have done, I think we
can have a better understanding of why Microsoft is so concerned
with intellectual property rights.
All in all, however, I should tell you that it is always
exciting to talk with researchers like those I met at the MSR
Asia Day.