Automated Data Collection with R - Simon Munzert, Christian Rubba, Dominic Nyhuis - [PDF][N27]

seeders: 3
leechers: 5
Added on May 7, 2016 by N27in Books > Ebooks
Torrent verified.



Automated Data Collection with R - Simon Munzert, Christian Rubba, Dominic Nyhuis - [PDF][N27] (Size: 8.17 MB)
 Automated Data Collection with R.pdf8.04 MB
 Cover.jpg129.54 KB


Description

Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining by Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis

English | PDF | ISBN-10: 111883481X | ISBN-13: 978-1118834817

January 20, 2015 | Wiley

image

Databases & Big Data, Data Mining

CONTENTS

Intro
Automated Data Collection with R
Contents
Preface
1 Introduction
Part One A Primer on Web and Data Technologies
Part Two A Practical Toolbox for Web Scraping and Text Mining
Part Three A Bag of Case Studies
References
General index
Package index
Function index
EULA

Excerpt:
1.1 Case study: World Heritage Sites in Danger

The United Nations Educational, Scientific and Cultural Organization (UNESCO) is an organization of the United Nations which, among other things, fights for the preservation of the world’s natural and cultural heritage. As of today (November 2013), there are 981 heritage sites, most of which of are man-made like the Pyramids of Giza, but also natural phenomena like the Great Barrier Reef are listed. Unfortunately, some of the awarded places are threatened by human intervention. Which sites are threatened and where are they located? Are there regions in the world where sites are more endangered than in others? What are the reasons that put a site at risk? These are the questions that we want to examine in this first case study.

What do scientists always do first when they want to get up to speed on a topic? They look it up on Wikipedia! Checking out the page of the world heritage sites,we stumble across a list of currently and previously endangered sites at http://en.wikipedia.org/wiki/List_of_ World_Heritage_in_Danger. You find a table with the current sites listed when accessing the link. It contains the name, location (city, country, and geographic coordinates), type of danger that is facing the site, the year the site was added to the world heritage list, and the year it was put on the list of endangered sites. Let us investigate how the sites are distributed around the world.

While the table holds information on the places, it is not immediately clear where they are located and whether they are regionally clustered. Rather than trying to eyeball the table, it could be very useful to plot the locations of the places on a map. As humans deal well with visual information, we will try to visualize results whenever possible throughout this book. But how to get the information from the table to a map? This sounds like a difficult task, but with the techniques that we are going to discuss extensively in the next pages, it is in fact not. For now, we simply provide you with a first impression of how to tackle such a task with R. Detailed explanations of the commands in the code snippets are provided later and more systematically throughout the book.

Sharing Widget


Download torrent
8.17 MB
seeders:3
leechers:5
Automated Data Collection with R - Simon Munzert, Christian Rubba, Dominic Nyhuis - [PDF][N27]