We will discussing some of the commonly used Base R Graphic functions. Some of the commonly used functions are
- plot: Plotting Line Chart and Scatter Plot
- boxplot: Box Whiskers Plot for a continuous variable or distributions by different groups
- hist: Histogram
We will create a sample data points and then use for the scatter plot.
# Our first plot par(mfrow=c(2,2)) x <- c (1, 2, 3, 4, 5) y <- c (1, 5, 3, 2, 0) plot (x, y, pch=20, col="red", main="Scatter Plot", xlab="X Variable", ylab="Y variable")
We have given x and y coordinate values using x and y vectors.
pch: Plotting character and 20 indicate a filled dot
col: Color of plotting character - in the above example it is red color.
main: Chart Title
xlab: X Axis Label
ylab: Y Axis Label
Scenario: We want to explore relationship between Parent Height and Children Height. Scatterplot is useful plot to visualise relationship between two continuous variables. We have a data frame "galton" which has mid parent and child heights. This data frame is available in psych package. We have to install and load the package to use the data frame.
install.packages("psych") library(psych) library(help=psych) data(galton,package = "psych")
Now, we can use plot function to find the relationship.It may be appropriate to estimate a linear relationship between parents and children heights. For a linear relationship, we need to estimate intercept and slope. lm (linear model) function helps us in estimating the intercept and slope.
Output of lm function is passed as input to abline function to plot the linear line representing relationship between heights of parent and child.
lty helps in selecting type of line and lwd in assigning width of the line.
par(mfrow=c(1,2)) # Add elements to the graph plot(galton$parent, galton$child, xlab = "Height of Parent", ylab= "Height of Children", main=" Relationship between Parent and Children Heights") # Changes in Plotting Characters plot(galton$parent, galton$child, xlab = "Height of Parent", ylab= "Height of Children", main=" Relationship between Parent and Children Heights", pch="a", col="blue") # Fit a line between X and Y or Height of Parent and Children abline(lm(galton$child~galton$parent), col = "green", lwd=5, lty=6)
Line Plot or Time Series Plot
Again, for time series or line plot, we can use plot function. Within plot function type parameter helps us in selecting ways to connect the points.
Scenario: We have temperature data of a place across years and we want to see pattern of temperature across months & years.
Since the data frame is available in "datasets" package if we have to install and load if not done already.
# ------------------- Time Series Plot or Line Chart ------------------------- # Scenario - How Average Month Temprature is changing across years # nottem Average Monthly Temperatures at Nottingham,1920-1939 library(help = "datasets") data(nottem,package = "datasets") # Add elements plot(nottem, xlab="Years", ylab="Avg Monthly Temp", main="Temp across years", col="blue", type="s", pch=20)
When we want to see distribution plot of a continuous variable, we create equal size bins and count number of observations in each of the bin. When we plot the count or proportion for each of the bin, we get the histogram.
Scenario: We have a list of customers and their age. We want to see distribution of age, instead of just looking at the summary statistics. We can get histogram of the age variable and get the distribution. R has hist function to get the histogram plot.
# Generate Age data ## Generate a numeric vector for Age Age <- as.integer(rnorm(10000,m=55, sd=15)) # Plot histogram hist(Age)
In the above R code,we created a series of value for Age (using normally distributed random value with expected mean as 55 and standard deviation 15).
Once, we had a vector - Age- , we have used hist function to get histogram plot.
We can use some of the additional arguments of hist function to make the histogram look better and improve the readability.
# Add elements or beautify Histogram hist(Age, breaks=30, col="green", border="red", xlab="Age", ylab="Counts", main="Histogram:Age")
Arguments for hist function are similar to plot function. col - fills input color, border allows selecting color for the histogram border, xlab - label of X axis, ylab - label of Y axis and main for giving chart title.
We can add density curve by first making histogram as probability histogram by making argument freq as FALSE.
hist(Age, xlim = c(-10,150), breaks=30, col="red", border="orange", xlab="Age", ylab="Prob", main="Histogram:Age Denssity", freq = F) lines(density(Age,na.rm = TRUE), col="orange", lwd=2) #density computes kernel density estimates
density function is used for estimating the density values and line function helps in plotting the line.
More on Visualisation using R
- Base R Graphic Elements
- Line Charts using Base R Graphic
- Formatted Line Chart for Forecasting Example
- Histogram using Base R Graphic
- Histogram using ggplot2
- Histogram Function- Creating histogram for each numeric variable of data frame
- Cute Column Chart using Base R Visualization
- Cute Column Chart using ggplot2 Visualization
- Clustered Column Chart using Base R Graphics
- Plots used for Moving Average and Weighted Moving Average
- Scatterplot using Base R Graphics
- Bubble Chart using Base R and Ggplot2 graphics