Mon, 21 Oct 2002

China's research enriches computing experience

Zatni Arbi, Columnist, Beijing, zatni@cbn.net.id

Anyone can record the one-hour Liputan Enam news report from SCTV, digitize it and store it on the hard disk. However, if you want to get to the precise segment that contains the report on the progress in identifying the party behind the cruel and devastating bombings in Bali, you will have to rely on the fast forward and reverse buttons on the screen and it will be a time- consuming process.

Wouldn't it be nicer if we could record news stories on our hard disk, run a program that would segment the entire session into predefined chunks, create an index and then cut the entire video file into smaller parts to be saved in different folders?

At the recent demo room set up by Microsoft Research Asia (MSR Asia) at Kerry Center Hotel in Beijing, I had the rare opportunity to see several of the interesting technologies being developed at their lab, which is Microsoft's second largest research lab. One of them was the very tool that would automatically detect the transitions between the segments in a continuous video recording and then cut them into separate, individually indexed video files so that they could be easily managed.

The process, however, is not as simple as going fast forward and reverse. First, the audio needs to be normalized, so that stronger voice will be softened and weaker voice will be amplified. Once audio is normalized, the speech recognition will detect the different speakers -- for example, the news anchors and the interviewed officials -- and the segments that contain too much noise to be processed. It will also detect segments containing music or commercials. Then the sequences of the segments will be shown in a bar with different colors. For example, light blue represents the female news anchor, green represents the male anchor, white represents a pause, dark blue represents the voice of an interviewed official and so on. If you want to listen to the interview, all you have to do is click on the dark blue segment.

Then the voice recognition will analyze the content of the speech of each of these speakers. This is a very intricate process, in which the recognized words are compared to a set of databases to determine whether the content is political news, business news, cultural news, etc. In the demo that I saw, a recent news report from CNN was broken down into a predefined list of such topical areas. So, for instance, if I wanted to hear President Bush' speech, I would just go to the politics folder, find the video file and play it.

* Advancing the state of the arts

The video analysis was just one set of tools that were demonstrated to around 80 Chinese journalists and around 20 journalists from other Asian countries during the MSR Asia Day. There were dozens of other interesting technologies being developed by its researchers. Across the room, for example, was a demo of how we could quickly create a question and answer system by first labeling around 400 models of answers for questions found on the websites. So, unlike getting hundreds of web pages containing the term "radicalism" with a search engine, the tool demonstrated at the event would give a single definition of the term based on all the information contained in the collection of documents.

"With this tool, you can create a Q&A service very quickly and automatically," I was told by the young Chinese researcher who showed me the demo.

Earlier, Rick Rashid, Microsoft Senior Vice President for Research Group, explained in his presentation: "When we use Google to find an answer to a question, the search engine would simply give us a list of web documents that contain the keywords that we had entered. A tool being developed at this Beijing lab can be trained to extract the right information so that we can get the real answer to our question instead of just a list of web documents".

The focus areas of research at MSR Asia include operating systems, programming tools and methodology, databases and human factors. However, in general, each of the researchers is free to pursue whatever interest they may have. The overall direction was probably given by chairman Gates, when he said: "The future of computing is the computer that talks, listens, sees and learns."

Toward this goal, MSR Asia has been working closely with universities in China, Hong Kong and Australia and has provided grants to enable PhD candidates and PhDs to participate in teaching, learning and research projects.

It is not a place for those who are more interested in partying, obviously. As Dr. Ya-Qin Zhang, MSR Asia's Managing Director, said at a group interview, the research institute demands the best from their researchers and the beneficiaries of its grants.

"When they submit a paper to me that they want to present in a conference, I always make sure that it will be the most prestigious scientific conference in the world," Zhang said. The results have been quite encouraging. At the recent SIGGRAPH, for example, 10 percent of around 50 papers presented at this highly reputable conference came from MSRA researchers.

* Exciting new technologies

Are you trying to find a song the title of which you have forgotten or do not even know? All that you remember is a part of the tune, and the song is stored inside your hard disk. One of the interesting tools shown during the MSR Asia Day would let you hum the tune and then would find the song for you. It would not matter whether you hum the beginning, the refrain or the end of the song, the tool would find it.

"In the future, it will be possible to connect to a website, hum the tune on your computer and ask the application on that site to find the song that you are looking for," the MSRA researcher told me.

Or, do you have a few photos that you want to combine into a video? This tool is already made available by the researchers. All you have to do is input the names of the photo files, the effects that you want to apply, the required parameters such as the speed and trajectory, and the music that will accompany your video. The tool even has the intelligence to detect a human face, and will zoom in to create a special focus. I was quite impressed by the result of the automatic combining of still images into a video.

Or, do you want to automatically shorten the home video that you took as a novice some years ago by throwing away those boring segments? The tool is also available. It uses complex processes such as video content analysis to identify key frames, and use these key frames to create a more exciting, more watchable video.

Some people from the media have conjectured that the actual motive behind Microsoft's decision to invest in a research lab in Beijing was to win the heart of the Beijing government, which had reportedly stipulated the use of mainly Linux in their offices. However, having seen such a broad range of research projects that the center has produced in a period of four years, I think the results are much more important than the question of the real agenda.

Like in any leading-edge research lab around the world, not all of the technologies and tools invented and developed at MSR Asia will eventually be available commercially. Some will, and some will not -- because, perhaps, a competing technology may have emerged and made a commercial product more viable.

Obviously, to fund research projects such as the ones that MSR Asia has engaged in requires a lot of cash, which a company like Microsoft can only expect to get from the sales of its software licenses. Having seen the work that they have done, I think we can have a better understanding of why Microsoft is so concerned with intellectual property rights.

All in all, however, I should tell you that it is always exciting to talk with researchers like those I met at the MSR Asia Day.