R and Data Mining - Examples and Case Studies [PDF]seeders: 62
leechers: 10
R and Data Mining - Examples and Case Studies [PDF] (Size: 9.67 MB)
Description(¯`·._.·[ R and Data Mining - Examples and Case Studies [PDF] ]·._.·´¯) English | ISBN: 0123969638 | 2013 | 256 pages | PDF | 9 MB This book guides R users into data mining and helps data miners who use R in their work. It provides a how-to method using R for data mining applications from academia to industry. It Presents an introduction into using R for data mining applications, covering most popular data mining techniques Provides code examples and data so that readers can easily learn the techniques Features case studies in real-world applications to help readers apply the techniques in their work and studies The R code and data for the book are provided at the RDataMining.com website. The book helps researchers in the field of data mining, postgraduate students who are interested in data mining, and data miners and analysts from industry. For the many universities that have courses on data mining, this book is an invaluable reference for students studying data mining and its related subjects. In addition, it is a useful resource for anyone involved in industrial training courses on data mining and analytics. The concepts in this book help readers as R becomes increasingly popular for data mining applications. (¯`·._.·[ From the Author ]·._.·´¯) Table of Contents: 1 Introduction 1.1 Data Mining 1.2 R 1.3 Datasets 1.3.1 The Iris Dataset 1.3.2 The Bodyfat Dataset 2 Data Import and Export 2.1 Save and Load R Data 2.2 Import from and Export to .CSV Files 2.3 Import Data from SAS 2.4 Import/Export via ODBC 2.4.1 Read from Databases 2.4.2 Output to and Input from EXCEL Files 3 Data Exploration 3.1 Have a Look at Data 3.2 Explore Individual Variables 3.3 Explore Multiple Variables 3.4 More Explorations 3.5 Save Charts into Files 4 Decision Trees and Random Forest 4.1 Decision Trees with Package party 4.2 Decision Trees with Package rpart 4.3 Random Forest 5 Regression 5.1 Linear Regression 5.2 Logistic Regression 5.3 Generalized Linear Regression 5.4 Non-linear Regression 6 Clustering 6.1 The k-Means Clustering 6.2 The k-Medoids Clustering 6.3 Hierarchical Clustering 6.4 Density-based Clustering 7 Outlier Detection 7.1 Univariate Outlier Detection 7.2 Outlier Detection with LOF 7.3 Outlier Detection by Clustering 7.4 Outlier Detection from Time Series 7.5 Discussions 8 Time Series Analysis and Mining 8.1 Time Series Data in R 8.2 Time Series Decomposition 8.3 Time Series Forecasting 8.4 Time Series Clustering 8.4.1 Dynamic Time Warping 8.4.2 Synthetic Control Chart Time Series Data 8.4.3 Hierarchical Clustering with Euclidean Distance 8.4.4 Hierarchical Clustering with DTW Distance 8.5 Time Series Classification 8.5.1 Classification with Original Data 8.5.2 Classification with Extracted Features 8.5.3 k-NN Classification 8.6 Discussions 8.7 Further Readings 9 Association Rules 9.1 Basics of Association Rules 9.2 The Titanic Dataset 9.3 Association Rule Mining 9.4 Removing Redundancy 9.5 Interpreting Rules 9.6 Visualizing Association Rules 9.7 Discussions and Further Readings 10 Text Mining 10.1 Retrieving Text from Twitter 10.2 Transforming Text 10.3 Stemming Words 10.4 Building a Term-Document Matrix 10.5 Frequent Terms and Associations 10.6 Word Cloud 10.7 Clustering Words 10.8 Clustering Tweets 10.8.1 Clustering Tweets with the k-means Algorithm 10.8.2 Clustering Tweets with the k-medoids Algorithm 10.9 Packages, Further Readings and Discussions 11 Social Network Analysis 11.1 Network of Terms 11.2 Network of Tweets 11.3 Two-Mode Network 11.4 Discussions and Further Readings 12 Case Study I: Analysis and Forecasting of House Price Indices 12.1 Importing HPI Data 12.2 Exploration of HPI Data 12.3 Trend and Seasonal Components of HPI 12.4 HPI Forecasting 12.5 The Estimated Price of a Property 12.6 Discussion 13 Case Study II: Customer Response Prediction and Profit Optimization 13.1 Introduction 13.2 The Data of KDD Cup 1998 13.3 Data Exploration 13.4 Training Decision Trees 13.5 Model Evaluation 13.6 Selecting the Best Tree 13.7 Scoring 13.8 Discussions and Conclusions 14 Case Study III: Predictive Modeling of Big Data with Limited Memory 14.1 Introduction 14.2 Methodology 14.3 Data and Variables 14.4 Random Forest 14.5 Memory Issue 14.6 Train Models on Sample Data 14.7 Build Models with Selected Variables 14.8 Scoring 14.9 Print Rules 14.9.1 Print Rules in Text 14.9.2 Print Rules for Scoring with SAS 14.10 Conclusions and Discussion 15 Online Resources 15.1 R Reference Cards 15.2 R 15.3 Data Mining 15.4 Data Mining with R 15.5 Classification/Prediction with R 15.6 Time Series Analysis with R 15.7 Association Rule Mining with R 15.8 Spatial Data Analysis with R 15.9 Text Mining with R 15.10 Social Network Analysis with R 15.11 Data Cleansing and Transformation with R 15.12 Big Data and Parallel Computing with R (¯`·._.·[ About the Author .·´¯) Dr. Yanchang Zhao is a Senior Data Mining Specialist in Australian public sector. Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) at University of Technology, Sydney from 2007 to 2009. He is the founder of the RDataMining.com website and an RDataMining Group on LinkedIn. He has rich experience in R and data mining. He started his research on data mining since 2001 and has been applying data mining in real-world business applications since 2006. He has over 50 publications on data mining research and applications, including three books. He is a senior member of IEEE, and has been a Program Chair of the Australasian Data Mining Conference (AusDM 2012 & 2013) and a program committee member for more than 50 academic conferences. Sharing Widget |