Normal quantile plot

4 posts / 0 new
Last post
gshaase
gshaase's picture
Offline
Joined: 01/11/2012
Normal quantile plot

I am trying to reproduce a very common way of plotting data to look for outliers in an otherwise a normally-distributted data set. It is a common feature in Minitab, Jump and other statistical software:

I wish to plot data on a single Normal Quantile plot, such that the data woudl lie on a straight line if it is normally distributed. The measured values would be on a linear scale on the X axis, while the Y axis will shows the cumulative probability percentile (say, 0.1% tp 99.9%) on a NON-linear scale, such that points fall on a straight line (rather than have an error-function shape) .

Any idea how to obtain such a Y-axis?

Also, I wish to plot several data sets on the same quantile plot. qqnorm will not allow me to plot a few sets in different colors/shapes.

I will appreciate any advice,
Gaddi




ProgrammingR offers two ways for you to stay up to date. To be notified when new articles and book reviews are posted, subscribe to the general ProgrammingR RSS feed by clicking here. To be notified when new R-based job listings are posted, click here.

JWaddell
Offline
Joined: 02/07/2012

EDIT: On the y-axis, the 66% should be 84%.

JWaddell
Offline
Joined: 02/07/2012

Hi Gaddi,

For changing the axes, use xaxt = "n" or yaxt = "n" in the initial qqnorm call, then construct the axis manually with axis().
I created a quick function addQQpoints() for plotting additional sets of data to an existing plot (this function is just a slight altering of the qqnorm() function).

Lastly, for the colors, I used a color palette I created that allows for easy semi-transparent colors, so that you can see the data when there are overlaps. This is of course optional.

Let me know if this is what you are looking for:

#install.packages("oaColors", repos = "http://repos.openanalytics.eu", type = "source")

addQQpoints <- function (y, ylim, main = "Normal Q-Q Plot", xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles", plot.it = TRUE, datax = FALSE,
...)
{
if (has.na <- any(ina <- is.na(y))) {
yN <- y
y <- y[!ina]
}
if (0 == (n <- length(y)))
stop("y is empty or has only NAs")
if (plot.it && missing(ylim))
ylim <- range(y)
x <- qnorm(ppoints(n))[order(order(y))]
if (has.na) {
y <- x
x <- yN
x[!ina] <- y
y <- yN
}
if (plot.it)
if (datax)
plot(y, x, ...)
else points(x, y, ...)
invisible(if (datax) list(x = y, y = x) else list(x = x,
y = y))
}

pdf("exampleQQ.pdf", width = 8, height = 8)
x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)
qqnorm(x1, pch = 19, col = oaColors("orange", alpha = 0.7), yaxt = "n", ylim = c(-3, 3), ylab = "Sample Percentiles")
axis(2, at = seq(-3, 3, by = 1), labels = c("0.15%", "2.5%", "16%", "50%", "66%", "97.5%", "99.85%"))
qqline(x1)
addQQpoints(x2, pch = 19, col = oaColors("blue", alpha = 0.7))
addQQpoints(x3, pch = 19, col = oaColors("green", alpha = 0.7))
dev.off()

Example graph (http://imgur.com/xanls, apologies about the poor image quality)

gshaase
gshaase's picture
Offline
Joined: 01/11/2012

JWaddell, thank you very much once again for posting a solution and I apologize for taking so long to get back to this topic.

The sequence of R commands that you gave work very well.

However, what I (and many others in the science, engineering, and data-collection communities) require is to have the actual measured values (X1, X2, X3 in your example) be presented on a linear scale on the X axis (not the quantiles).

When I add datax=TRUE to the first qqnorm comand, and/or in the call or the body of the function addQQpoints, everything goes bad... I played with this for a long time and gave up...

Can you please help?

Login or register to post comments