Automated Data Collection with R - Simon Munzert, Christian Rubba, Dominic Nyhuis - [PDF][N27]seeders: 3
leechers: 5
Automated Data Collection with R - Simon Munzert, Christian Rubba, Dominic Nyhuis - [PDF][N27] (Size: 8.17 MB)
DescriptionAutomated Data Collection with R: A Practical Guide to Web Scraping and Text Mining by Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis English | PDF | ISBN-10: 111883481X | ISBN-13: 978-1118834817 January 20, 2015 | Wiley Databases & Big Data, Data Mining CONTENTS Intro Automated Data Collection with R Contents Preface 1 Introduction Part One A Primer on Web and Data Technologies Part Two A Practical Toolbox for Web Scraping and Text Mining Part Three A Bag of Case Studies References General index Package index Function index EULA Excerpt: 1.1 Case study: World Heritage Sites in Danger The United Nations Educational, Scientific and Cultural Organization (UNESCO) is an organization of the United Nations which, among other things, fights for the preservation of the world’s natural and cultural heritage. As of today (November 2013), there are 981 heritage sites, most of which of are man-made like the Pyramids of Giza, but also natural phenomena like the Great Barrier Reef are listed. Unfortunately, some of the awarded places are threatened by human intervention. Which sites are threatened and where are they located? Are there regions in the world where sites are more endangered than in others? What are the reasons that put a site at risk? These are the questions that we want to examine in this first case study. What do scientists always do first when they want to get up to speed on a topic? They look it up on Wikipedia! Checking out the page of the world heritage sites,we stumble across a list of currently and previously endangered sites at http://en.wikipedia.org/wiki/List_of_ World_Heritage_in_Danger. You find a table with the current sites listed when accessing the link. It contains the name, location (city, country, and geographic coordinates), type of danger that is facing the site, the year the site was added to the world heritage list, and the year it was put on the list of endangered sites. Let us investigate how the sites are distributed around the world. While the table holds information on the places, it is not immediately clear where they are located and whether they are regionally clustered. Rather than trying to eyeball the table, it could be very useful to plot the locations of the places on a map. As humans deal well with visual information, we will try to visualize results whenever possible throughout this book. But how to get the information from the table to a map? This sounds like a difficult task, but with the techniques that we are going to discuss extensively in the next pages, it is in fact not. For now, we simply provide you with a first impression of how to tackle such a task with R. Detailed explanations of the commands in the code snippets are provided later and more systematically throughout the book. Sharing Widget |