北京大学王选所数据管理实验室

实验室动态

彭鹏：gStoreD: 分布式SPARQL查询的局部计算

时间

2019年1月18号 09:00-10:00

地点

北京大学计算机所大楼106会议室

介绍

With the increasing size of RDF data published on theWeb, the computational requirements of evaluating SPARQLqueries over large RDF graphs have stressed the limitsof single machine processing. In manyapplications, the RDF graph are geographically oradministratively distributed over the sites, and the RDF repositorypartitioning strategy is not controlled by the distributedRDF system itself. Therefore, partitioning-tolerant SPARQLprocessing is desirable. In this study, for partitioning-tolerant SPARQL processing on distributed

RDF graphs, we evaluate SPARQLqueries in the “partial evaluation and assembly” framework. We explore the intrinsicstructural characteristics of partial matches to filter out someirrelevant partial results while providing performance guarantees. We also propose an efficient assembly algorithmto utilize the characteristics of partial matches to merge themand form the final results. To further improve the efficiencyof finding partial matches, we propose an optimization thatcommunicates variables’ candidates among the sites to avoidredundant computations. In addition, although our approachis partition-tolerant, we also evaluate different partitioningstrategies for our approach.Experiments over both real andsynthetic RDF datasets confirm the superiority of our approach.

简历

Peng Pengis an assistant professor at the College of Computer Science and Electronic Engineering at Hunan University. He received his Ph.D. from Peking University in 2016. After his graduation, he join in Hunan University. His research interest is distributed RDF data management and query processing. He has published several papers in the top conferences and journals includingVLDB Journal, TKDE, EDBT and so on.