Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 273
8.1.3 The Mode
Оглавлениеmode
central tendency – mode
The mode is the value that has highest probability to occur. For a series of observations, this should be the one that occurs most often. Note that the mode is also defined for variables that have no order-relation (even labels such as “green,” “yellow,” etc. have amode, but not a mean or median—without further abstraction or a numerical representation).
In R, the function mode()
or storage.mode()
returns a character string describing how a variable is stored. In fact, R does not have a standard function to calculate mode, so let us create our own:
mode()
storage.mode()
# my_mode # Finds the first mode (only one) # Arguments: # v -- numeric vector or factor # Returns: # the first mode my_mode <- function(v) { uniqv <- unique(v) tabv <- tabulate(match(v, uniqv)) uniqv[which.max(tabv)] } # now test this function x <- c(1,2,3,3,4,5,60,NA) my_mode(x) ## [1] 3 x1 <- c(“relevant”, “N/A”, “undesired”, “great”, “N/A”, “undesired”, “great”, “great”) my_mode(x1) ## [1] “great” # text from https://www.r-project.org/about.html t <- “R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.” v <- unlist(strsplit(t,split=” “)) my_mode(v) ## [1] “and”
unique()
Linux
FreeBSD
tabulate()
uniqv()
While this function works fine on the examples provided, it only returns the first mode encountered. In general, however, the mode is not necessarily unique and it might make sense to return them all. This can be done by modifying the code as follows:
# my_mode # Finds the mode(s) of a vector v # Arguments: # v -- numeric vector or factor # return.all -- boolean -- set to true to return all modes # Returns: # the modal elements my_mode <- function(v, return.all = FALSE) { uniqv <- unique(v) tabv <- tabulate(match(v, uniqv)) if (return.all) { uniqv[tabv == max(tabv)] } else { uniqv[which.max(tabv)] } } # example: x <- c(1,2,2,3,3,4,5) my_mode(x) ## [1] 2 my_mode(x, return.all = TRUE) ## [1] 2 3