6.2
Using “R” to Analyze Data
6.2.1 What is “R”
You collected a data set using a GCDC logger and realized, “Wow, that's a lot
of data! Now what?”. Data analysis is tedious and the process is particular to
each user's application. Don't expect to find a magic software solution that
will reduce your data into your perfect answer. However, don't despair.
There are several options available, combined with a little bit of user effort,
that provide powerful and versatile analysis capabilities.
Spreadsheets, such as Microsoft Excel or LibreOffice Calc, are great choices for plotting moderately
sized data sets. The user interfaces are highly polished and customized plotting is easy to handle.
Although, most spreadsheets can handle only about 100,000 lines of data before performance begins to
slow. Furthermore, scripting complex analysis procedures in a spreadsheet is cumbersome. We
recommend trying “R” because it is more powerful than a spreadsheet and it is easy to learn.
“R” is a high-level programming language used most commonly for statistical analysis of data. R is an
open-source project based on the “S” language, which was developed by the Bell Laboratories in the
1970s. R provides a simple workspace environment that can manipulate large data sets using simple
math commands and complex function libraries. R is widely used by statisticians and data miners and
the language is well supported by the open-source community. The software is compact, free, and
available for Windows, Mac, and Linux (visit
).
Matlab is another common software application for analyzing data but it is usually reserved to
universities or businesses with copious budgets (it's expensive software!). Octave is a free open source
adaptation of Matlab with nearly the same capabilities. Although, Octave is a significantly larger
download and more complicated installation than R. We favor R because it's small, easy to learn, and
free.
Gulf Coast Data Concepts
Page 26
X16-5, Rev New
Figure 24: R Command Line Interface