Visualisation using R - Commonly used functions

We will discussing some of the commonly used Base R Graphic functions.  Some of the commonly used functions are

  • plot: Plotting Line Chart and Scatter Plot
  • boxplot: Box Whiskers Plot for a continuous variable  or distributions by different groups
  • hist: Histogram

Scatter  Plot

We will create a sample data points and then use for the scatter plot.

We have given  x and y coordinate values using x and y vectors.

pch:  Plotting character and 20 indicate a filled dot

col: Color of plotting character - in the above example it is red color.

main: Chart Title

xlab: X Axis Label

ylab: Y Axis Label

Scatter Plot v1

Scenario: We want to explore relationship between Parent Height and Children Height. Scatterplot is useful plot to visualise relationship between two continuous variables. We have a data frame "galton" which has mid parent and child heights. This data frame is available in psych package. We have to install and load the package to use the data frame.

Now, we can use plot function to find the relationship.It may be appropriate to estimate a linear relationship between parents and children heights. For a linear relationship, we need to estimate intercept and slope. lm (linear model) function helps us in estimating the intercept and slope.

Output of lm function is passed as input to abline function to plot the linear line representing relationship between heights of parent and child.

lty helps in selecting type of line and lwd in assigning width of the line.

Scatter Plot with line

Line  Plot or Time Series Plot

Again, for time series or line plot, we can use plot function. Within plot function type parameter helps us in selecting ways to connect the points.

Scenario: We have temperature data of a place across years and we want to see pattern of temperature across months & years.

Since the data frame is available in "datasets" package if we have to install and load if not done already.

Time Series Plot

Histogram

When we want to see distribution plot of a continuous variable, we create equal size bins and count number of observations in each of the bin. When we plot the count or proportion for each of the bin, we get the histogram.

Scenario: We have a list of customers and their age. We want to see distribution of age, instead of just looking at the summary statistics. We can get histogram of the age variable and get the distribution. R has hist function to get the histogram plot.

In the above R code,we created a series of value for Age (using normally distributed random value with expected mean as 55 and standard deviation 15).

Once, we had a vector - Age- , we have used hist function to get histogram plot.

We can use some of the additional arguments of hist function to make the histogram look better and improve the readability.

histogram Age

Arguments for hist function are similar to plot function. col - fills input color, border allows selecting color for the histogram border, xlab - label of X axis, ylab - label of Y axis and main for giving chart title.

We can add density curve by first making histogram as probability histogram by making argument freq as FALSE.

Histogram with density curve

density function is used for estimating the density values and line function helps in plotting the line.

More on Visualisation using R

Leave a Comment