For doing an analysis on a given data, building a machine learning model for a business problem, or building a great visualization in R, we need to bring data into R or R Studio.
The data may be available in different source formats (e.g. excel, csv, text, sql etc) and across systems (e.g. in the local desktop, web or warehouses ).
We can read data from various different sources or of different type using R. In this blog, we will talk about a common data types and reading the data into R.
Reading a CSV file
A comma separated (csv) file is available in a folder and we want to read into R. First, we need to know the name and path 0f the file. Also, what is separator or delimiter (e.g. comma in CSV file), whether first row has column names etc.
We have a file available in a folder -"C:\\Learn R\\Data" and the name of file is "Binary" and it is a "csv" file.
We can use below R code to import or the read the file into R.
## Reading data from Comma separated file (csv) df_csv <- read.csv(file="C:\\Learn R\\Data\\binary.csv")
We can also read a csv file available over the net - a link on a website. We can read the data into R. In this example, termcrosssell.csv file is available on the net and we want to read the data. Here is the simple code to read data.
termCrosssell1 <- read.csv(file="http://dni-institute.in/blogs/wp-content/uploads/2016/07/termCrosssell.csv")
Reading a Text file
We can read data from text file, but we need to give appropriate information on separator for R to read the data correctly. This is in addition to file name and path.
In the below example, the file - tab_delimited_data.txt is available in the default library/folder. If it is available in different folder, we can change the default directory using setwd() or give the full path where file is available.
The data values are separated by "Tab" and first line has column names. So we are giving below parameter values in "read.table()" function.
## Read a Tab Delimited file input_tabdlmtd.df <- read.table(file="tab_delimited_data.txt", header = TRUE, sep = '\t')
Reading SAS dataset Data
SAS Institute is a leading Data Science and Analytics software company and SAS is an analytics or data science tool. In some cases, we need to bring SAS data into R for further analysis or model development.
One of the ways is that we need to install and load "sas7bdat" package and read the data into R. Here are the steps.
## Read SAS dataset Data install.packages("sas7bdat") library(sas7bdat) library(help=sas7bdat) bank_ins = read.sas7bdat("C:\\Trainings\\R Programming for Data Science\\data\\bank_additional_full.sas7bdat")
Extract Data from Web
Web has so much of data and depending on the analysis need, we may want to extract data from websites. Web scraping is pretty detailed topic in itself, but this is how we can extract a table context from a web page.
library(RCurl) library(XML) pages <- getURL("https://en.wikipedia.org/wiki/India%E2%80%93Pakistan_cricket_rivalry") overall_matches= readHTMLTable(pages, header=T, which=2,stringsAsFactors=F)
From the link, we are reading 2 table and reading header of the table as well.