It is a no-brainer that purchasing a mobile phone can be a very challenging process. With so many models and brands to choose from, determining the right phone to befit your usage involves research and understanding of product utility. Interestingly, there are several product reviews and price comparisons available for user discretion that helps consumers make the right selection. This practical data is accumulated and cached for consumers to exploit while making decisions, or for comprehensive analysis of the product itself. As a consumer, looking for the right kind of data involves a high degree of sophistication. This is where R programming is valid. With R programming, one can use a script to quickly draw statistics suitable for one’s analysis. Let us look into some of the features and usage of R programming.
Different Ways to Handle Data in R
R can read data from:
- Spread sheets
- Excel sheets
- Databases
- Images
- Text files
- Many other special formats
Get Data Into R
Whether data is local or available on the Web, with R programming you will be able to successfully import data in different formats.
Read Data From Files
Ideally, data is available on the file stored within the system. All that is required to read or write this data is identification of the current directory in which the file is stored.
Setting Directories
One of the foremost things required is to set up the working directory.
To identify the directory(folder) use the command getwd()
On the linux pc, output is displayed with the path as follows:
1 2 | > getwd() [1] “/home/test” |
On Windows it is depicted as:
1 | c:datatest |
To set the directory in which the data file is saved, use the command setwd (“path”) where path has directories with subdirectories where the datafile is located. For example, if data is in file temp.txt and the file is in folder /home/test/example/ then issue:
1 | setwd(“/home/test/example/”) |
On Windows it will be represented as:
1 | setwd(“C: mydatatest”) |
It is necessary to know the folder in which the file is saved.
Reading Text File
Data contained in text files can be read in R session using scan command.
Remember to use option what=”” with scan command which indicates that input will be of character data type.
For this session, I have created the textsample.txt file which can be read in R session.
1 | > fdata<– scan(“textsample.txt”,what=“”) |
Now, fdata is to hold the data from the .txt file.
Let’s review the few first entries with command head(fdata):
1 2 | > head(fdata) [1] “this” “is” “a” “sample” “file” “generated” |
To change to lower-case use tolower.
1 | > fdata<–tolower(fdata) |
There are many words in the file that are stored separately. Some of the words are also repetitive.
To count the frequency of the words use
1 | > ft<–table(fdata) |
To view a pie graph of ft use command
1 | > pie(ft) |
From the above graph, the words “file” and “the” have the highest frequency.
The maximum frequency of the words in ft can be found directly by using the max command.
1 2 | > max(ft) [1] 4 |
Look at the output of the command.
1 2 3 4 | > head(ft) fdata a be by can character command 1 3 1 2 1 1 |
The plot shows the words against frequency graph.
1 | > dotchart(ft) |
Commands to Read Data From File
It is not unknown that some of the most common data files available are csv and .xls format files, where csv is a file with comma separated values and xls is the file extension of an excel file.
Some of the most common data file formats that can be handled through commands are read.csv and read.table:
1 2 3 4 5 6 7 8 9 | > read.csv(“test.csv”,header=TRUE) 1 Status Age V1 V2 V3 V4 2 P 23646 45190 50333 55166 56271 3 CC 26174 35535 38227 37911 41184 4 CC 27723 25691 25712 26144 26398 5 CC 27193 30949 29693 29754 30772 6 CC 24370 50542 51966 54341 54273 7 CC 28359 58591 58803 59435 61292 8 CC 25136 45801 45389 47197 47126 |
1 2 3 4 5 6 7 8 9 | > read.table(“test.csv”,header=TRUE) Status Age V1 V2 V3 V4 1 P 23646 45190 50333 55166 56271 2 CC 26174 35535 38227 37911 41184 3 CC 27723 25691 25712 26144 26398 4 CC 27193 30949 29693 29754 30772 5 CC 24370 50542 51966 54341 54273 6 CC 28359 58591 58803 59435 61292 7 CC 25136 45801 45389 47197 47126 |
Fetch Data Directly From the Web
It is possible to read data directly from the Web. The data available in the Web link or URL will be directly fetched through R in the memory. Data is set on the network at http://lib.stat.cmu.edu/datasets/csb/ch3a.dat.
Read the data directly with read.csv or read.table command.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | data1<–read.table( “http://lib.stat.cmu.edu/datasets/csb/ch3a.dat”) > head(data1) V1 V2 V3 V4 V5 1 07/08/91 47.33 52.82 19.58 17.78 2 07/09/91 42.58 53.25 9.42 6.06 3 07/10/91 59.55 56.32 19.83 14.81 4 07/11/91 52.92 50.06 15.08 9.75 5 07/12/91 55.25 59.50 28.75 27.21 6 07/13/91 54.75 56.80 27.83 20.84 data2<–read.csv( “http://lib.stat.cmu.edu/datasets/csb/ch3a.dat”) > head(data2) X07.08.91….47.33….52.82….19.58….17.78 1 07/09/91 42.58 53.25 9.42 6.06 2 07/10/91 59.55 56.32 19.83 14.81 3 07/11/91 52.92 50.06 15.08 9.75 4 07/12/91 55.25 59.50 28.75 27.21 5 07/13/91 54.75 56.80 27.83 20.84 6 07/14/91 35.33 40.88 11.83 15.65 |
data1 and data2 are objects that hold the same file with different formats.
Reading Spreadsheets
To read spreadsheet data we need to install the library gdata.
1 2 | > install.packages(“gdata”) > library(gdata) |
With this package the new command read.xls will be available.
The data file test.xls can be read with read.xls(“test.xls”).
Fill Spread Sheet Type Data Through the Editor in R
1 | x<–edit(as.data.frame(NULL)) |
Datasets in R
One can pull datasets available in R with data() which will show the lists of data sets available in R.
1 | data(Airpassengers) |
To see the description of the data use the command:
1 | help(AirPassengers) |
To see the actual data use head command:
1 2 | > head(AirPassengers) [1] 112 118 132 129 121 135 |
More about data can be found at r-manual
Here is the Github repo link for codes we have used in this post.
Featured image via Flickr Creative Commons.
Manjusha Joshi is a freelancer for free open source software in scientific computing. She is a mathematician and a member of the Pune Linux User group.