Welcome to Ming Zhao's homepage!

MAY GOD BLESS YOU!

For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life. (John 3:16)

The Spirit gives life; the flesh counts for nothing. (John 6:63)

Therefore, if anyone is in Christ, he is a new creation; the old has gone, the new has come! (2 Corinthians 5:17)


I'm currently working at Google Inc, Mountain View office, USA. Before that, I worked as a research fellow (postdoc), doing multimedia retrieval and face recognition, in National University of Singapore (NUS), under the supervision of Prof. Chua Tat-Seng, and also worked with Prof. Ramesh Jain and Prof. Terence Sim.


What's new?

Two papers are accepted by CVPR 2010 (details coming soon)

Google Landmark: Tour the World

  • ACM Multimedia 2009 Demo: Tour the World: a Technical Demonstration of a Web-Scale Landmark Recognition Engine: Yan-Tao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven, Jay Yagnik. demo
  • CVPR 2009 Paper: "Tour the World: building a web-scale landmark recognition engine," Yantao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, and Hartmut Neven. pdf
  • CVPR 2009 Demo: "A World-Wide Landmark Recognition Engine with Web Learning", Yantao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, Hartmut Neven, Jay Yagnik. demo

Face Recogntion on Web Videos and Personal Photo Ablums

Concept Detection/Visual Object Recognition


Research Experience

         TRECVID 2005 (Jul. 2005 -- Sep. 2005)
TRECVID is TREC Video Retrieval Evaluation. It is sponsored by the National Institute of Standards and Technology (NIST). TRECVID is the most challenging evaluation for video retrieval in the world. Most famous research groups in video retrieval participated, such as IBM, CMU, Columbia University.There are four tasks this year: Shot boundary detection, Low-level feature extraction (camera motion), High-level feature extraction, Search and Exploring BBC rushes. We gained the first position in the search task.

         3D Face Reconstruction and Animation (May. 2005 -- Now)
3D face reconstruction from image(s) has a wide range of applications, such as face animation and recognition. The slow speed of the 3D morphable model is due to the texture mapping. To improve the speed, we only use the shape matching to recover the 3D shape and use texture mapping to get the texture. However, only with the shape information, one image is not enough for accurate 3D face reconstruction. So we propose to use multiple images with the morphable shape model. First, with the feature points given on the multiple images, the 3D coordinates of the feature points are estimate by the pose estimation. Then, frontal and profile 2D morphable shape models are built to estimate the 3D morphable shape model. These two steps works iteratively to improve the result. At last, the texture is extracted from multiple images with the pose estimation from the estimated 3D face.

         Photo Album Annotation  (Feb. 2005 -- Now)
Home photos are becoming more common place and large quantity of home photos are available on the Internet. There is a need of efficient techniques to manage this large collection of photos, some with text annotations but many without. Basically, we need to identify the following essential attributes in home photos like the place, time, people. With these attributes, a series of questions can be asked about photos by time, place, people, and their combinations.

         TRECVID 2004 (Aug. 2004 -- Sep. 2004)
TRECVID is TREC Video Retrieval Evaluation. It is sponsored by the National Institute of Standards and Technology (NIST). TRECVID is the most challenging evaluation for video retrieval in the world. Most famous research groups in video retrieval participated, such as IBM, CMU, Columbia University.There are four tasks this year: Shot boundary detection, Story segmentation,Feature extraction and Search. We gained the first position in the search task.

         PersonX Detection in News Video (Aug. 2004 -- Feb. 2005)
With the development of computer technology, more and more digital videos are available, which demands more efficient access to video content. Video retrieval thus becomes a hot research topic in multimedia. To achieve the goal of video retrieval, it's important to find objects of interest to users in video. For news video, which is a significant source of video, persons are the most important objects. Thus finding a specific person, called finding "Person-X", is essential to understand and retrieve news video. The goal of finding Person-X is to find the shots where Person-X visually appears.

         Face Alignment for Face Recognition (Aug. 2003 ~ Jul. 2004)
 I aimed to use face alignment to improve the performance of face recognition. Although face alignment is very important for high performance face recognition, existing face recognition systems often use simple alignment strategies or assume that alignment is done beforehand. I planed to first improve face alignment algorithms and then combine face alignment with face recognition.

         Face Alignment and Iris Localization (Dec. 2002 ~ Jul. 2003)
This was the work performed when I was a visiting student in Visual Computing Group of Microsoft Research Asia. Research was focused on iris localization for iris recognition and face alignment for face recognition under the supervision of Stan Z.Li.

         Face Alignment for 3D Facial Reconstruction (Sep. 2002 ~ Nov. 2002)
The task for face alignment is to accurately locate facial features such as the eyes, nose, mouth and outline. Accurate extraction of facial features offers advantages for many applications and is crucial for highly accurate face recognition and synthesis. We used face alignment for "Real-Time Realistic-Looking 3D Facial Reconstruction and Interaction by Voice-Driven Expression Animation", supported by National Natural Science Foundation of China (60203013).

         Computer Aided Medical Imaging Diagnosis. (Dec. 2001 ~ Apr. 2002)
I cooperated with other medical students to develop a system to help medial imaging diagnosis for the "Science Research Challenge Cup" of Zhejiang University. This system won the third prize.

         Content-Based Video Retrieval and Browsing (Aug. 2001 ~ Jul. 2002)
The goal is to help people to rapidly get desired videos and efficiently grasp the idea of their contents. We developed a system of video analysis, segmentation, abstraction, classification, indexing, retrieval and browsing. As for home video abstraction, we proposed an audio and video combined algorithm which is especially suitable for home videos.

         Video Object Segmentation (Sep. 1999 ~ Jul. 2001)
The goal is to segment semantic video objects from videos. We developed two techniques: statistical inference based automatic video object segmentation and hierarchy optical flow based semi-automatic video object segmentation.


Publications

  1. Yantao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, and Hartmut Neven, "Tour the World: building a web-scale landmark recognition engine," CVPR 2009, pdf, demo
  2. Mehmet Emre Sargin, Hrisikesh Aradhye, Pedro J. Moreno, Ming Zhao, "Audiovisual Celebrity Recognition in Unconstrained Web Videos," ICASSP 2009. pdf
  3. Gang Wang, Tat-Seng Chua, Ming Zhao, " Exploring Knowledge of Sub-domain in a Multi-resolution Bootstrapping Framework for Concept Detection in News Video, " ACM Multimedia, October 2008. pdf
  4. Ming Zhao, Jay Yagnik, Hartwig Adam, David Bau, "Large Scale Learning and Recognition of Faces in Web Videos," The 8th IEEE International Conference on AutomaticFace and Gesture Recognition , September 2008. pdf
  5. Ming Zhao, Tat-Seng Chua, ^Markovian Mixture Face Recognition with Discriminative Face Alignment," The 8th IEEE International Conference on AutomaticFace and Gesture Recognition , September 2008. pdf
  6. Yan-Tao Zheng, Ming Zhao, Shi-Yong Neo, Tat-Seng Chua, Qi Tian, ^Visual Synset: Towards a Higher-level Visual Representation," CVPR , June 2008. pdf
  7. Ming Zhao, Tat-Seng Chua, and Ramesh Jain, ^Combining Metadata and Context Information in Annotating Personal Media," International Workshop on Advanced Image Technology , January 2007. pdf
  8. Ming Zhao, YongWei Teo, Siliang Liu, Tat-Seng Chua, and Ramesh Jain, "Automatic Person Annotation of Family Photo Album," International Conference on Image and Video Retrieval , July 2006, LNCS4071, pp.163-172. pdf
  9. Ming Zhao, Tat-Seng Chua, "Face alignment with unified subspace optimization of active statistical models," The 7th IEEE International Conference on Automatic Face and Gesture Recognition, April 2006, pp. 67-72. pdf
  10. Ming Zhao, Tat-Seng Chua, Terence Sim, "Morphable face reconstruction with multiple images," The 7th IEEE International Conference on Automatic Face and Gesture Recognition, April 2006, pp. 597-602. pdf
  11. Ming Zhao, Shi-Yong Neo, Hai-Kiat Goh, Tat-Seng Chua, "Multi-faceted contextual model for person identification in news video", The 12th Multimedia Modelling Conference, Jan 2006. pdf
  12. Tat-Seng Chua, Shi-Yong Neo, Hai-Kiat Goh, Ming Zhao, Yang Xiao and Gang Wang, Sheng Gao and Kai Chen, "TRECVID 2005 by NUS PRIS," TRECVID 2005 workshop, 2005. pdf
  13. Tat-Seng Chua, Shi-Yong Neo, Ke-Ya Li, Gang Wang, Rui Shi, Ming Zhao, and Huaxin Xu, "TRECVID 2004 Search and Feature Extraction Tasl by NUS PRIS," TRECVID 2004 workshop, November 2004, pp. 159-170. pdf
  14. Ming Zhao, Stan Z.Li, and Chun Chen, "Analysis and optimization for ASM based face alignment," The International Conference on Pattern Recognition, August 2004. (Accepted)
  15. Ming Zhao, Chun Chen, Stan Z.Li and Jiajun Bu, "Subspace analysis and optimization for AAM based face alignment," IEEE International Conference on Automatic Face and Gesture Recognition, May 2004, pp.290-295. pdf
  16. Chun Chen, Ming Zhao, Stan Z.Li, and Jiajun Bu, "Parameter optimization for active shape models," in Proceedings of the Asian Conference on Computer Vision, January 2004, vol. 2, pp. 1068-1073. pdf
  17. Ming Zhao, Stan Z.Li, Chun Chen, and Jiajun Bu, "Shape evaluation for weighted active shape models," in Proceedings of the Asian Conference on Computer Vision, January 2004, vol. 2, pp. 1074-1079. pdf
  18. Ming Zhao, Jiajun Bu, and Chun Chen, "Audio and video combined for home video abstraction," in Proceedings of the International Conference on Acoustic, Speech and Signal Processing, Hong Kong, April 2003, vol. 5, pp. 620-623. pdf
  19. Ming Zhao, Na Li, and Chun Chen, "Statistical inference for automatic video object segmentation," Journal of Computer-Aided Design and Computer Graphics(Chinese), vol. 15, no. 3, pp. 318-323,2003. pdf
  20. Ming Zhao, Caifu Chen, Chun Chen, and Jiajun Bu, "Automatic home video abstraction using audio contents," in SPIE: Electronic Imaging and Multimedia Technology III, October 2002, vol. 4925, pp. 317-325. pdf
  21. Ming Zhao, Jiajun Bu, and Chun Chen, "Semi-automatic video object segmentation basing on hierarchy optical flow," in SPIE: Electronic Imaging and Multimedia Technology III, October 2002, vol. 4925, pp. 307-316. pdf
  22. Ming Zhao, Jiajun Bu, and Chun Chen, "Robust background subtraction in HSV color space," in SPIE: Multimedia Systems and Applications V, Boston, USA, July 2002, vol. 4861, pp. 325-332. pdf
  23. Ming Zhao, Chun Chen, and Zhengping Wu, "Hierarchy optical flow based semi-automatic spatial-temporal video segmentation," Journal of Image and Graphics(Chinese), vol. 7, no. 8, pp. 759-764, 2002. pdf

Links:

Last updated on Feb 25, 2010 PDT.