Clojure Data Analysis Cookbook (Packt, 2013) (Size: 3.4 MB)
Chapter 1, Importing Data for Analysis, will cover how to read data from a variety of sources, including CSV files, web pages, and linked semantic web data.
Chapter 2, Cleaning and Validating Data, will present strategies and implementations for normalizing dates, fixing spelling, and working with large datasets. Getting data into a useable shape is an important, but often overlooked, stage of data analysis.
Chapter 3, Managing Complexity with Concurrent Programming, will cover Clojure's
concurrency features and how we can use them to simplify our programs.
Chapter 4, Improving Performance with Parallel Programming, will cover using Clojure's parallel processing capabilities to speed up processing data.
Chapter 5, Distributed Data Processing with Cascalog, will cover using Cascalog as a wrapper over Hadoop and the Cascading library to process large amounts of data distributed over multiple computers. The final recipe in this chapter will use Pallet to run a simple analysis on Amazon's EC2 service.
Chapter 6, Working with Incanter Datasets, will cover the basics of working with Incanter datasets. Datasets are the core data structure used by Incanter, and understanding them is necessary to use Incanter effectively.
Chapter 7, Preparing for and Performing Statistical Data Analysis with Incanter, will cover a variety of statistical processes and tests used in data analysis. Some of these are quite simple, such as generating summary statistics. Others are more complex, such as performing linear regressions and auditing data with Benford's Law.
Chapter 8, Working with Mathematica and R, will talk about setting up Clojure to talk to Mathematica or R. These are powerful data analysis systems, and sometimes we might want to use them. This chapter will show us how to get these systems to work together, as well as some tasks we can do once they are communicating.
Chapter 9, Clustering, Classifying, and Working with Weka, will cover more advanced machine learning techniques. In this chapter, we'll primarily use the Weka machine learning library, and some recipes will discuss how to use it and the data structures its built on, while other recipes will demonstrate machine learning algorithms.
Chapter 10, Graphing in Incanter, will show how to generate graphs and other visualizations in Incanter. These can be important for exploring and learning about your data and also for publishing and presenting your results.
Chapter 11, Creating Charts for the Web, will show how to set up a simple web application to present findings from data analysis. It will include a number of recipes that leverage the powerful D3 visualization library.