fanny
Fuzzy Analysis Clustering
Description
Computes a fuzzy clustering of the data into k
clusters.
Usage
fanny(x, k, diss = inherits(x, "dist"), memb.exp = 2, metric = c("euclidean", "manhattan", "SqEuclidean"), stand = FALSE, iniMem.p = NULL, cluster.only = FALSE, keep.diss = !diss && !cluster.only && n < 100, keep.data = !diss && !cluster.only, maxit = 500, tol = 1e-15, trace.lev = 0)
Arguments
x | data matrix or data frame, or dissimilarity matrix, depending on the value of the In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed. In case of a dissimilarity matrix, |
k | integer giving the desired number of clusters. It is required that 0 < k < n/2 where n is the number of observations. |
diss | logical flag: if TRUE (default for |
memb.exp | number r strictly larger than 1 specifying the membership exponent used in the fit criterion; see the ‘Details’ below. Default: |
metric | character string specifying the metric to be used for calculating dissimilarities between observations. Options are |
stand | logical; if true, the measurements in |
iniMem.p | numeric n x k matrix or |
cluster.only | logical; if true, no silhouette information will be computed and returned, see details. |
keep.diss, keep.data | logicals indicating if the dissimilarities and/or input data |
maxit, tol | maximal number of iterations and default tolerance for convergence (relative convergence of the fit criterion) for the FANNY algorithm. The defaults |
trace.lev | integer specifying a trace level for printing diagnostics during the C-internal algorithm. Default |
Details
In a fuzzy clustering, each observation is “spread out” over the various clusters. Denote by u(i,v) the membership of observation i to cluster v.
The memberships are nonnegative, and for a fixed observation i they sum to 1. The particular method fanny
stems from chapter 4 of Kaufman and Rousseeuw (1990) (see the references in daisy
) and has been extended by Martin Maechler to allow user specified memb.exp
, iniMem.p
, maxit
, tol
, etc.
Fanny aims to minimize the objective function
SUM_[v=1..k] (SUM_(i,j) u(i,v)^r u(j,v)^r d(i,j)) / (2 SUM_j u(j,v)^r)
where n is the number of observations, k is the number of clusters, r is the membership exponent memb.exp
and d(i,j) is the dissimilarity between observations i and j.
Note that r -> 1 gives increasingly crisper clusterings whereas r -> Inf leads to complete fuzzyness. K&R(1990), p.191 note that values too close to 1 can lead to slow convergence. Further note that even the default, r = 2 can lead to complete fuzzyness, i.e., memberships u(i,v) == 1/k. In that case a warning is signalled and the user is advised to chose a smaller memb.exp
(=r).
Compared to other fuzzy clustering methods, fanny
has the following features: (a) it also accepts a dissimilarity matrix; (b) it is more robust to the spherical cluster
assumption; (c) it provides a novel graphical display, the silhouette plot (see plot.partition
).
Value
an object of class "fanny"
representing the clustering. See fanny.object
for details.
See Also
agnes
for background and references; fanny.object
, partition.object
, plot.partition
, daisy
, dist
.
Examples
## generate 10+15 objects in two clusters, plus 3 objects lying ## between those clusters. x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)), cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)), cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5))) fannyx <- fanny(x, 2) ## Note that observations 26:28 are "fuzzy" (closer to # 2): fannyx summary(fannyx) plot(fannyx) (fan.x.15 <- fanny(x, 2, memb.exp = 1.5)) # 'crispier' for obs. 26:28 (fanny(x, 2, memb.exp = 3)) # more fuzzy in general data(ruspini) f4 <- fanny(ruspini, 4) stopifnot(rle(f4$clustering)$lengths == c(20,23,17,15)) plot(f4, which = 1) ## Plot similar to Figure 6 in Stryuf et al (1996) plot(fanny(ruspini, 5))
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.