删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

A Data Distribution-aware Method for Sub-dataset Analysis On Distributed File System_上海海事大学

上海海事大学 免费考研网/2018-05-04

报告内容: A Data Distribution-aware Method for Sub-dataset Analysis On Distributed File System

时间:6月15日(周三)15:00

地点:信息楼350

主讲人:王军(美国中佛罗里达大学 教授? 上海****特聘教授)

?

讲座摘要:

Talk Abstract: In this work, we study the problem of sub-dataset analysis over distributed ?le systems, e.g, the Hadoop ?le system. Our experiments show that the sub-datasets’ distribution over HDFS blocks can often cause the corresponding analysis to suffer from a seriously imbalanced parallel execution. This is because the locality of individual sub-datasets is hidden by the Hadoop ?le system and the content clustering of subdatasets results in some computational nodes carrying out much more workload than others. We conduct a comprehensive analysis on how the imbalanced computing patterns occur and their sensitivity to the size of a cluster. We then propose a novel method to optimize sub-dataset analysis over distributed storage systems referred to as DataNet. DataNet aims to achieve distribution-aware and workload-balanced computing and consists of the following three parts. Firstly, we propose an ef?cient algorithm with linear complexity to obtain the meta-data of sub-dataset distributions. Secondly, we design an elastic storage structure called ElasticMap based on the HashMap and BloomFilter techniques to store the meta-data. Thirdly, we employ a distribution-aware algorithm for subdataset applications to achieve a workload-balance in parallel execution. Our proposed method can bene?t different subdataset analyses with various computational requirements. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results show the performance bene?ts of DataNet.

?

主讲人简介:

Dr. Jun Wang's bio:

Dr. Jun Wang joinedDepartment of Electrical Engineering and Computer ScienceinUniversity of Central Floridain 2006. Prior to that, he was a faculty in Computer Science and Engineering Department?of?University of Nebraska, Lincoln.?He is the recipient of?National Science Foundation Early Career Award 2009?and?Department of Energy Early Career Principal Investigator Award 2005. Recently, he has won 2015 UCF Reach For the Stars award, 2013 Dean’s Research Professorship Award, Charles N.?MillicanFaculty Fellow 2010-2012, and University of Central Florida Research Incentive Award 2010. 2015年12月获上海****特聘教授

His research has been sponsored mainly by National Science Foundation and Department of Energy.?His work aims to generate?impacts?in the high-performance I/O systems community.?He has authored over 80 publications in premier journals such as IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and leading HPC and systems conferences such as IPDPS, HPDC,?EuroSys, ICS, Middleware, FAST. He has graduated?9?Ph.D. students who upon their graduations were employed by major US IT corporations (e.g., Apple, Google, Microsoft, EMC, etc). He has served as an Associate Editor for the IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Cloud Computing and International Journal of Parallel, Emergent and Distributes Systems (IJPEDS). He has conducted extensive research in the areas of Computer Systems and High Performance Computing.

His specific research interests include:

·???????? Big Data and Big Compute Systems

·???????? Data-intensive High Performance Computing

·???????? Massive Storage and File System

·???????? I/O Architecture

?

欢迎有兴趣的师生参加。

相关话题/上海 信息 佛罗里达 计划 师生