Shanjiang Tang

Shanjiang Tang's photo 

Associate Prof./Master Supervisor

College of Intelligence and Computing
Tianjin University

Contact

55-B517, CS, Beiyang Campus, JinNan District, Tianjin, China, 300350 [map]
Tel: +86-13652068936
tashj [at] tju [dot] edu [dot] cn
http://cic.tju.edu.cn/faculty/tangshanjiang

Shanjiang Tang's research interest

About Me

I am an Associate Professor in the College of Intelligence and Computing, Tianjin University. I received my Ph.D degree from Nanyang Technological University in 2015. I received the B.Eng. and M.Sc. degrees from School of Software Engineering and School of Computer Science& Technology at Tianjin University in 2008 and 2011, respectively.

My general research interests primarily focus on large-scale computing systems, big data, deep learning and cloud computing, with special emphasis on the resource management and job scheduling for Hadoop/YARN system. Specifically, I am interested in designing new scheduling and resource allocation algorithms, analyzing their performance, and implementing them in large-scale computing systems. I am also interested in problems at the intersection of computing systems and economics.


I'm looking for undergraduate and graduate students. If you are interested in big data, machine learning, parallel computing and cloud computing. Please contact with me.

News

Selected Publications   [Entry@DBLP] [Google scholar]

Journal Articles

  1. [TII'21] Shanjiang Tang, Chunjiang Wang, Jiangtian Nie, Neeraj Kumar, Yang Zhang, Zehui Xiong, Ahmed Barnawi, “EDL-COVID: Ensemble Deep Learning for COVID-19 Cases Detection from Chest X-Ray Images,” IEEE Transactions on Industrial Informatics, 2021.

  2. [TCC'20] Shanjiang Tang, Ce Yu, Yusen Li, “Fairness-Efficiency Scheduling for Cloud Computing with Soft Fairness Guarantees,” IEEE Transactions on Cloud Computing, 2020. [Supplementary Document]

  3. [TKDE'20] Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li, “A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications,” IEEE Transactions on Knowledge and Data Engineering, 2020.

  4. [IJPP'19]  Huihui Zou, Shanjiang Tang, Ce Yu, Hao Fu, Yusen Li, Wenjie Tang, “ASW: Accelerating Smith–Waterman Algorithm on Coupled CPU–GPU Architecture,” International Journal of Parallel Programming, 2019.

  5. [TSC'19]  Zhaojie Niu, Shanjiang Tang, Bingsheng He, “ An Adaptive Efficiency-Fairness Meta-scheduler for Data-Intensive Computing, ” IEEE Transactions on Services Computing, 2019.

  6. [TPDS'18]  Shanjiang Tang, Zhaojie Niu, Bingsheng He, Bu-Sung Lee, Ce Yu, “Long-Term Multi-Resource Fairness for Pay-as-you Use Computing Systems,” IEEE Transactions on Parallel and Distributed System, 2018. [Supplementary Document]

  7. [TSC'18]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, “Fair Resource Allocation for Data-Intensive Computing in the Cloud,” IEEE Transactions on Services Computing, 2018. [Supplementary Document]

  8. [TBE'17]  Peng Ren, Shanjiang Tang, Fang Fang, et al., “Gait Rhythm Fluctuation Analysis for Neurodegenerative Diseases by Empirical Mode Decomposition,” IEEE Transactions on Biomedical Engineering, 2017.

  9. [BMC'17]  Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu, Quan Zou,“CMSA: a heterogeneous CPUGPU computing system for multiple similar RNADNA sequence alignment,” BMC Bioinformatics, 2017.

  10. [PP'16]  Chen Wang, Ce Yu, Shanjiang Tang, Jian Xiao, Jizhou Sun, Xiangfei Meng,“A General and Fast Distributed System for Large-scale Dynamic Programming Applications,” Parallel Computing, 2016.

  11. [TSC'16]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, “Dynamic Job Ordering and Slot Configurations for MapReduce Workloads,” IEEE Transactions on Services Computing, 2016. [Supplementary Document]

  12. [TCC'14]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, “DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters,” IEEE Transactions on Cloud Computing, 2014. [Supplementary Document][slides]

  13. [TPDS'12]  Shanjiang Tang, Ce Yu, Jizhou Sun, Bu-Sung Lee, Tao Zhang, Zheng Xu, and Huabei Wu, “EasyPDP: An Efficient Parallel Dynamic Programming Runtime System for Computational Biology,” IEEE Transactions on Parallel and Distributed System, 2012. [Supplementary Document][slides]

Conference and Workshop Proceedings

  1. [ICPP'20]  Shanjiang Tang, Qifei Chai, Ce Yu, Yusen Li, Chao Sun, “ Balancing Fairness and Efficiency for Cache Sharing in Semi-external Memory System,” in the 49th International Conference on Parallel Processing (ICPP'20), Aug 2020.

  2. [MM'19]  Yusen Li, Haoyuan Liu, Xiwei Wang, Lingjun Pu, Trent G. Marbach, Shanjiang Tang, Gang Wang, Xiaoguang Liu, “ Themis: Efficient and Adaptive Resource Partitioning for Reducing Response Delay in Cloud Gaming,” in the 27th ACM International Conference on Multimedia (MM'19), Oct 2019.

  3. [HPDC'19]  Yusen Li, Chuxu Shan, Ruobing Chen, Xueyan Tang, Wentong Cai, Shanjiang Tang, Xiaoguang Liu, Gang Wang, Xiaoli Gong, Ying Zhang, “ GAugur: Quantifying Performance Interference of Colocated Games for Improving Resource Utilization in Cloud Gaming,” in the 28th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'19), June 2019.

  4. [ICSOC'18]  Shanjiang Tang, Ce Yu, Chao Sun, Jian Xiao, Yinglong Li, “ QKnober: A Knob-based Fairness-Efficiency Scheduler for Cloud Computing with QoS Guarantees,” in the 16th International Conference on Service Oriented Computing (ICSOC'18), Nov 2018. [slides]

  5. [ICPP'18]  Hao Fu, Shanjiang Tang, Bingsheng He, Ce Yu, Jizhou Sun, “ GLP4NN: A Convergence-invariant and Network-agnostic Light-Weight Parallelization Framework for Deep Neural Networks on Modern GPUs,” in the 47th International Conference on Parallel Processing (ICPP'18), August 2018. [slides]

  6. [SC'16]  Shanjiang Tang, Bingsheng He, Shuhao Zhang, Zhaojie Niu, “ Elastic Multi-Resource Fairness: Balancing Fairness and Efficiency in Coupled CPU-GPU Architectures,” in the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Nov 2016. [slides]

  7. [CloudCom'15]  Zhaojie Niu, Shanjiang Tang, and Bingsheng He, “ Gemini: An Adaptive Performance-Fairness Scheduler for Data-Intensive Cluster Computing,” in CloudCom'15, Dec 2015. [slides]

  8. [CloudCom'14]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, “ Towards Economic Fairness for Big Data Processing in Pay-as-you-go Cloud Computing,” in CloudCom 2014 (Ph.D. Consortium), Singapore, Dec 2014. [slides]

  9. [ICS'14]  Shanjiang Tang, Bu-Sung Lee, Bingsheng He and Haikun Liu, ‘‘Long-Term Resource Fairness: Towards Economic Fairness on Pay-as-you-use Computing Systems,’’ in the 28th International Conference on Supercomputing (ICS'14), Munich, Germany, June 2014. [slides]

  10. [Cluster'13]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, ‘‘Dynamic slot allocation technique for MapReduce clusters,’’ In IEEE Cluster 2013, Indiana , USA, Sept 2013. [slides]

  11. [Euro-Par'13]  Shanjiang Tang, Bu-Sung Lee, and Bingsheng He, ‘‘MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads, ’’ In Euro-Par 2013, Aachen, Germany, Aug 2013. [slides]

Book Chapter

  1. Shanjiang Tang, Bingsheng he, Haikun Liu, Bu-sung Lee, “Resource Management in Big Data Processing Systems,” invited chapter in Big Data: Principles and Paradigms. (Eds. R. Buyya, R. N. Calheiros, and A.V. Das), Academic Press, June 2016.

  2. Chen, Wang, Shanjiang Tang, Ce Yu, “Parallel Dynamic Programming for Large-scale Data Applications,” invited chapter in Horizons in Computer Science Research, Volume 14. (Eds. Thomas S. Clary), Nova Science Publishers, Feb 2017.

Patents

  1. Chao Sun, Jizhou Sun, Shanjiang Tang, et al., Multi-level parallel programming method, CN201010205530, Nov 2010.

  2. Ce Yu, Shanjiang Tang, Jizhou Sun, et al., Parallel programming model system of DAG oriented data driving type application and realization method, CN200910312089, May 2010.

  3. Chao Sun, Jizhou Sun, Shanjiang Tang, et al., Visual modeling and code skeleton generating method for supporting design of multinuclear parallel program, CN201010171361, Nov 2010.

  4. Ce Yu, Jizhou Sun, Zhen Xu, Huabei Wu, Shizhong Liao, Xiaojing Meng, Shanjiang Tang, et al., MPI parallel programming system based on visual modeling and automatic skeleton code generation method, CN200910067715, June 2009.

Research Fundings

  1. Optimization on Multi-tenant Deep Learning Computing in a Shared Computing System (NSFC 61972277, 2020.01 - 2023.12), 610K, PI.

  2. Efficient Spark-based Query Processing Techniques for Big Spatial Data in Supercomputing Systems (18JCZDJC30800, 2018.4 - 2021.03), 200K, PI.

  3. Parallel and Distributed Computing (2018XRG-0027, 2018.1 - 2019.12), 100K, PI.

  4. Fair Resource Allocation for Big Data Processing in Multi-tenant Cloud Computing System (NSFC 61602336, 2017.1 - 2019.12), 210K, PI.

Projects

I focus on systems and algorithms for large-scale data-intensive computing. My projects include:

MRYARN: Pay-as-you-go is a popular billing model based on users' resource usage in the cloud. A user's demand is often changing over time, indicating that it is difficult to keep the high resource utilization all the time for cost efficiency. Resource sharing is an effective approach for high resource utilization. In view of the heterogeneous resource demands of workloads in the cloud, multi-resource allocation fairness is a must for resource sharing in cloud computing. MRYARN is proposed for multi-resource fair allocation on the cloud. It ensures that each user in cloud computing can at least get the amount of total resources as that under the exclusively non-sharing environment in the long term. Moreover, MRYARN can guarantee that no users can get more amount of total allocated resources over time by lying their demands. Finally, MRYARN has a mechanism to discourage users to submit cost-inefficient workloads, especially when there are some idle resources they truly do not need.(homepage)(TPDS'18)

LTYARN: Life is not fair, but with a little help, existing large-scale data processing systems (e.g., YARN, Spark, Dryad) can be, ensuring resource sharing between users. However, past work on fair sharing considered memoryless fairness, an instantaneous fair share without historical information considered. When it comes to cloud computing (i.e., pay-as-you-use computing), it fails to satisfy the service-as-you-pay fairness (i.e., the total service that each user enjoys should be proportional to her payment) from a long-term view. Long-Term Resource Fairness (LTRF) generalizes max-min fairness for this case. LTYARN implements LTRF for YARN in cloud computing. (homepage) (demo) (ICS'14) (CloudCom'14)(TSC'18)

DynamicMR: Hadoop MRv1 uses the slot-based resource model with the static configuration of map/reduce slots. Due to the pre-configuration of distinct map slots and reduce slots which are not fungible on slave nodes, slots can be severely under-utilized, which significantly degrades the performance. Although YARN was proposed to address this problem by giving a new resource model of 'container' that either map and reduce tasks can run on, we keep the slot-based model by proposing an alternative technique called Dynamic Hadoop Slot Allocation (DHSA). It relaxes the slot allocation constraint to allow slots to be reallocated to either map or reduce tasks depending on their needs. Our experiments show that it consistently outperforms YARN by about 2% ~ 9% for multiple jobs due to the ratio control mechanism of running map/reduce tasks. Second, the speculative execution can tackle the straggler problem, which has shown to improve the performance for a single job but at the expense of the cluster efficiency. In view of this, we propose Speculative Execution Performance Balancing (SEPB) to balance the performance tradeoff between a single job and a batch of jobs. Third, delay scheduling has shown to improve the data locality but at the cost of fairness. Alternatively, we propose a technique called Slot PreScheduling that can improve the data locality but with no impact on fairness. Finally, by combining these techniques together, we form a step-by-step slot allocation system called DynamicMR that can improve the performance of Hadoop MRv1 significantly while maintaining the fairness. (homepage) (TCC'14) (Cluster'13)

MROrder: In Hadoop MRv1, different job submission orders will bring significantly varied performance results. MROrder is an automated MapReduce job ordering optimizaton prototype system. It targets at the online MapReduce workloads where MapReduce jobs arrives over time for various perfomane metrics, such as makespan, total completion time. Users just need to input some simple arguments. For example, users need to designate the job ordering performance metric( e.g., makespan, total completion time). The MROrder then starts to perform job ordering optimization automatically for online MapReduce jobs, based on user's configuration. (homepage) (Euro-Par'13) (TSC'16)

EasyPDP: To tackle the growing volume of genomic data, an efficient and easily-programmed high performance computing system is needed. EasyPDP is a parallel dynamic programming runtime system and an abstract programming model for computational biology and scientific computing applications. As a runtime system, it handles low-level thread creating, mapping, resource management, and fault tolerance issues automatically regardless of the system characteristics or scale. As an programming model, it allows users to describe applications and specify concurrency from the high level, without concerns about the complex details of parallel programming. (homepage) (demo) (TPDS'12) (IPDPS Workshop'13) (IPDPS Workshop'12) (Parallel Computing'16)

Open Source

Almost all of my work is open source:

Professional Activities

PC Member for

Reviewer for

Others

Interesting Links