Chunsheng (Victor ) FANG, Ph.D

 

Senior Data Scientist,

EMC Greenplum

EMC Greenplum riverain amazonUC CCHMC CAS ustc

Victor

facebook

Home

Projects

Publications

Personal

Currently I am a Senior Data Scientist at Greenplum, EMC Corporation, working on enterprise level large scale machine learning, Big Data Analytics, etc.

Jul 2011 to Jul 2012,, I was a Senior Research Scientist @ Riverain Medical and working on the next generation industry leading software product for early lung cancer medical images registration and detection, with advanced machine learning techniques.

I obtained my Computer Science PhD (Advisor Prof. Anca Ralescu) at University of Cincinnati in 2011 (GPA 3.96/4.00). During my PhD study I was a research assistant in CS@UC and BMI@CCHMC. Prior to that I was a Research SDE for intelligent video analysis in the Chinese Academy of Sciences, Institute of Automation, Beijing. I got my Bachelor of Electrical Engineering from USTC in 2006. Student member in SIAM and IEEE since 2008.

From Apr to Jun 2011, I was a SDE intern @ eCommerce Platform @ Amazon.com, a world-leading team which processes huge amount of global transactions on a cloud computing architecture.

My research interest covers: Machine Learning in social network mining (spectral graph theory, ensemble classifier, graph mining), Computer Vision(Feature representation, indexing), and their combinations in large-scale problems e.g. Information Retrieval, Bioinfomatics. Publications in top research conferences, journals e.g. NIPS, KDD, ICDM, HBM, ICPR, IEEE Trans on Intelligent System, etc.

I served as Vice President in Chinese Students & Scholars Association @ UC 2010-2011, and Secretary in Computer Science Grad Student Association @ UC 2008-2009. Since 2008, I have been the web administrator of the largest online forum in Greater Cincinnati Area with 5,000 users, www.UCBBS.com.

Media report: 'Jeopardy' features man against machine on WCPO-TV, Feb 14, 2011. [static]


Selected Research Projects [more]

Novel Frameworks for Mining Heterogeneous and Dynamic Networks

Different from traditional "static" link prediction approach, there is a recent paradigm shift that models the temporal social networks in a "dynamic" perspective. We seek for a more general framework for modeling the dynamic graph (social network, biological network), which can incorporate historical data. Another interest focuses on how to integrate heterogeneous networks from different domain knowledge.

  • newChunsheng Fang, "Novel Frameworks for Mining Heterogeneous and Dynamic Networks", Ph.D Dissertation, University of Cincinnati, Nov 2011. [request a copy]
  • newChunsheng Fang, M. Kohram, X. Meng, Anca Ralescu, "Graph Embedding Framework for Link Prediction and Vertex Behavior Modeling in Temporal Social Networks", ACM SIG KDD (Oral presentation, acceptance rate < 25%), SNA workshop, Aug 2011 [link] [PDF preprint].
  • newChunsheng Fang, M. Kohram, Anca Ralescu, "Spectral Regression with Low-Rank Approximation for Dynamic Graph Link Prediction", IEEE Transaction on Intelligent System, 2011.[link]
  • Invited Seminar Talk, "Framework for Mining Dynamic Social Networks", Amazon.com Research Team, May 2011.
  • Chunsheng Fang, Jason Lu, Anca Ralescu, "Graph Spectra Regression with Low-Rank Approximation for Dynamic Graph Link Prediction",NIPS2010 Workshop on Low-rank Methods for Large-scale Machine Learning,Vancouver, Canada, December, 2010. [PDF].
  • Minlu Zhang, Chunsheng Fang, Jason Lu, “Integrative scoring approach to identify transcriptional regulations controlling lung surfactant homeostasis ", International Conference on Data Mining 2010 (ICDM2010), Sydney, Australia, Dec 2010; [link][PDF]
  • M Zhang, J Deng, Chunsheng Fang, X Zhang, Jason Lu, "Molecular Network Analysis and Applications", Chapter 11 of "Knowledge-Based Bioinformatics.", John Wiley & Sons, Ltd, July 2010 [link];

Image mining & retrieval in developmental gene expression patterns

Developed  innovative algorithms, Curve Profiling Feature, for embryonic images, and utilizing Kernel SVM, manifold learning to mine the implicit spatial-temporal relationship, achieves 98% accuracy and high ROC-AUC in keyword predictions, beat state-of-the-art while require less in space and time complexity. (Ranked as Top 1 in University Research Council Summer Award, 2010)

  • Chunsheng Fang, Minlu Zhang, Anca Ralescu, Jason Lu: Curve Profiling Feature, in International Conference for Data Mining, Workshop 2010, Sydney, Australia. [link][PDF]

MRI structural changes in Parkinsons Diseased brainnew

Aim to automatically identify Parkinson’s related volumetric patterns. Co-project with UC College of Medicine. We developed a regional ensemble learning algorithm for detecting and classifying diseased regional pattern, achieving high AUC-ROC scores, and the results are consistent with neuropathelogy evidence.

  • Chunsheng Fang, Judd Storrs, Anca Ralescu, Jing-Huei Lee, Jason Lu, "Detecting Parkinson's brain changes using local feature based regional SVM ensemble on MRI images", Human Brain Mapping 2011 [PDF]

Probability based similarity for heterogeneous data

We consider a probability based approach according to which the similarity of two values (in the same domain) is the probability of value pairs whose components are rather apart than the two values under consideration. Similarities across the attributes of the heterogeneous data are combined using Fisher transformation. Results of applying this approach to an image retrieval problem are also presented.

  • Chunsheng Fang, Anca Ralescu,"ProbSim-Annotation: a novel image annotation algorithm using probability based similarity", 20th Midwest Artificial Intelligence & Cognitive Science Conference (MAICS), Fort Wayne , Indiana, Apr 18-19, 2009; [PDF]
  • Chunsheng Fang, Anca Ralescu, "Experiments on Probability based Similarity Measures Applied to Image Similarity", 19th International Conference on Pattern Recognition (ICPR2008) Sensing Web workshop, Tampa, FL, Dec 7 -11, 2008; [PDF]

Real time object detection and classification in video stream

Project funded by NSF-China during my R&D software engineer position in National Lab of Pattern Recognition, Chinese Academy of Sciences, Beijing, China, 2006-2007.

Developed Abnormal Human Behavior Detection, Perimeter Intrusion Detection algorithm modules based on Gaussian Mixture Model and AdaBoost, optimized the system for multi-camera real-time video analysis. Implemented in C++ (MFC) and DirectX.

 

Selected Development Projects [more]

UCBIR: image search engine

Developed UCbir web image search engine on Beowulf cluster. Integrate Heritrix as image crawler, PHP for feature extraction, MPI parallel computing with manager-worker model as backend archiving and similarity searching. Collected 30,000 images within uc.edu , retrieval time less than 1 second for a query image. Scalable test performed on 18 nodes (36 cores). [Try it out!]

  • Chunsheng Fang, Ryan Anderson, “A Parallel Implementation of Content©\Based Image Retrieval”, Parallel computing, 2008 Fall; [PDF]
  • Chunsheng Fang, Ryan Anderson, Anca Ralescu, "UCbir: Large-scale content based web image retrieval and tagging system", Excellence Award in ECECS Grad Symposium, 2008. [POSTER]

Human Motion Capture with Particle Swarm Optimation

How to robustly and efficiently map the human motion capture 3D passive sensor point clouds to a defined template skeleton? We utilize the Particle Swarm Optimization algorithm to solve this challenging problem. First we formulate this problem as a 3D rigid body registration problem; secondly the search space is pruned down from 6 to 3 by physical heuristics; Finally, the PSO multi-agents converge to the global optimal in the objective function in < 35 iterations done in 100ms.

  • Chunsheng Fang, Yunzheng He, and Michael Tolston, "Flying Sparrows Capture Poses: Using PSO to Map Appropriate Nomenclature to Motion Capture Markers",Complex System; Networks, 2010 Winter;[PDF]

Genome Wide Association DataBase for large-scale SNP visualization (GWADB@CCHMC)

SNP population stratification and visualization. Co-developed back-end job submission using Torque scheduler to perform PCA on HapMap Single Nucleotide Polymorphism microarray data. Frontend integrates AJAX and Google Web Toolkit for visualizing.(Internal web services. CCHMC VPN access required.)