R programming language resources › Forums › Graphing › Normal quantile plot
- This topic has 2 replies, 2 voices, and was last updated 13 years, 1 month ago by
gshaase.
- AuthorPosts
- January 11, 2012 at 4:43 pm #348
gshaase
MemberI am trying to reproduce a very common way of plotting data to look for outliers in an otherwise a normally-distributted data set. It is a common feature in Minitab, Jump and other statistical software:
I wish to plot data on a single Normal Quantile plot, such that the data woudl lie on a straight line if it is normally distributed. The measured values would be on a linear scale on the X axis, while the Y axis will shows the cumulative probability percentile (say, 0.1% tp 99.9%) on a NON-linear scale, such that points fall on a straight line (rather than have an error-function shape) .
Any idea how to obtain such a Y-axis?
Also, I wish to plot several data sets on the same quantile plot. qqnorm will not allow me to plot a few sets in different colors/shapes.
I will appreciate any advice,
GaddiFebruary 7, 2012 at 10:23 am #369JWaddle
MemberHi Gaddi,
For changing the axes, use xaxt = “n” or yaxt = “n” in the initial qqnorm call, then construct the axis manually with axis().
I created a quick function addQQpoints() for plotting additional sets of data to an existing plot (this function is just a slight altering of the qqnorm() function).
Lastly, for the colors, I used a color palette I created that allows for easy semi-transparent colors, so that you can see the data when there are overlaps. This is of course optional.
Let me know if this is what you are looking for:
#install.packages("oaColors", repos = "http://repos.openanalytics.eu", type = "source")
addQQpoints <- function (y, ylim, main = "Normal Q-Q Plot", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE, datax = FALSE, ...) { if (has.na <- any(ina <- is.na(y))) { yN <- y y <- y[!ina] } if (0 == (n <- length(y))) stop("y is empty or has only NAs") if (plot.it && missing(ylim)) ylim <- range(y) x <- qnorm(ppoints(n))[order(order(y))] if (has.na) { y <- x x <- yN x[!ina] <- y y <- yN } if (plot.it) if (datax) plot(y, x, ...) else points(x, y, ...) invisible(if (datax) list(x = y, y = x) else list(x = x, y = y)) } pdf("exampleQQ.pdf", width = 8, height = 8) x1 <- rnorm(100) x2 <- rnorm(100) x3 <- rnorm(100) qqnorm(x1, pch = 19, col = oaColors("orange", alpha = 0.7), yaxt = "n", ylim = c(-3, 3), ylab = "Sample Percentiles") axis(2, at = seq(-3, 3, by = 1), labels = c("0.15%", "2.5%", "16%", "50%", "66%", "97.5%", "99.85%")) qqline(x1) addQQpoints(x2, pch = 19, col = oaColors("blue", alpha = 0.7)) addQQpoints(x3, pch = 19, col = oaColors("green", alpha = 0.7)) dev.off()
Example graph (http://imgur.com/xanls, apologies about the poor image quality)March 21, 2012 at 5:28 pm #373gshaase
MemberJWaddell, thank you very much once again for posting a solution and I apologize for taking so long to get back to this topic.
The sequence of R commands that you gave work very well.
However, what I (and many others in the science, engineering, and data-collection communities) require is to have the actual measured values (X1, X2, X3 in your example) be presented on a linear scale on the X axis (not the quantiles).
When I add datax=TRUE to the first qqnorm comand, and/or in the call or the body of the function addQQpoints, everything goes bad… I played with this for a long time and gave up…
Can you please help?
- AuthorPosts
- You must be logged in to reply to this topic.