Big data in biodiversity research

The public availability of species distribution information has increased drastically in the last 10 years. Public aggregators such as GBIF, IUCN, idigbio or BIEN provide more than 800 million occurrence records across all taxonomic groups. This tremendous development has brought ‘big data’ into biogeography and enables an entire new perspective on macroscopic spatial scales. However the consequent use of this data has been hampered by issues on data quality and a lack of bioinformatics tools to process such large amount of data. One of my major interests is to develop tools to improve data quality in large scale species distribution databases and apply these data to biogeographic questions, in particular in the context of a similar revolution in data availability currently ongoing for genetic data. Recently I have developed tools for automatic geographic data cleaning and preparation of large scale distribution data for analyses in historical biogeography. Furthermore I contribute to tools for bioregionalization based on large scale distribution data and to make biogeographic inferences from fossils.