【3.5】factor、levels、reorder

> haha<-c("aa","ab","ac",'aa')
> haha
[1] "aa" "ab" "ac" "aa"
> ha<-factor(haha)
> ha
[1] aa ab ac aa
Levels: aa ab ac
> h<-factor(haha,ordered=TRUE)
> h
[1] aa ab ac aa
Levels: aa < ab < ac


status<-c("poor","improved","excellent","poor");
status<-factor(status,ordered=TRUE) #会将编码为（3,2,1,3），并在内部将这关联为1=excellent,2=improved,3=poor。
sattus<-factor(status,order=TRUE,levels=c("poor","improved","excellent")) #这样各水平的排序为1=poor,2=improved,3=excellent。


levels 用来指定因子可能的水平（缺省值是向量x中互异的值），表示这组离散值； labels用来指定水平的名字；exclude表示从向量x中剔除的水平值；ordered是一个逻辑型选项用来指定因子的水平是否有次序。回想数值型或字符型的x。

下面有一些例子：
> factor(1:3)
[1] 1 2 3
Levels: 1 2 3
> factor(1:3, levels=1:5)
[1] 1 2 3
Levels: 1 2 3 4 5
> factor(1:3, labels=c("A", "B", "C"))
[1] A B C
Levels: A B C
> factor(1:5, exclude=4)
[1] 1 2 3 NA 5
Levels: 1 2 3 5

> ff <- factor(c(2, 4), levels=2:5)
> ff
[1] 2 4
Levels: 2 3 4 5
> levels(ff)
[1] "2" "3" "4" "5"


factor(x, levels = sort(unique(x), na.last = TRUE),
labels = levels, exclude = NA, ordered = is.ordered(x))


1、 创建一个因子。

> colour <- c('G', 'G', 'R', 'Y', 'G', 'Y', 'Y', 'R', 'Y')
> colour
[1] "G" "G" "R" "Y" "G" "Y" "Y" "R" "Y"
> col <- factor(colour)
> col
[1] G G R Y G Y Y R Y
Levels: G R Y
> col1 <- factor(colour, levels = c('G', 'R', 'Y'), labels = c('Green', 'Red', 'Yellow'))
> col1
[1] Green Green Red Yellow Green Yellow
[7] Yellow Red Yellow
Levels: Green Red Yellow
> col2 <- factor(colour, levels = c('G', 'R', 'Y'), labels = c('1', '2', '3'))
> col2
[1] 1 1 2 3 1 3 3 2 3
Levels: 1 2 3
> col_vec <- as.vector(col2) #转换成字符向量
> col_vec
[1] "1" "1" "2" "3" "1" "3" "3" "2" "3"
> col_num <- as.numeric(col2) #转换成数字向量
> col_num
[1] 1 1 2 3 1 3 3 2 3
> col3 <- factor(colour, levels = c('G', 'R'))
> col3
[1] G G R G R
[9]
Levels: G R


2、创建一个有序因子。

score <- c('A', 'B', 'A', 'C', 'B')
score1 <- ordered(score, levels = c('C', 'B', 'A'));
> score1
[1] A B A C B
Levels: C < B < A


3、用cut()函数将一般的数据转换成因子或有序因子。

exam <- c(98, 97, 52, 88, 85, 75, 97, 92, 77, 74, 70, 63, 97, 71, 98,
65, 79, 74, 58, 59, 60, 63, 87, 82, 95, 75, 79, 96, 50, 88)
exam1 <- cut(exam, breaks = 3) #切分成3组
exam2 <- cut(exam, breaks = c(0, 59, 69, 79, 89, 100)) #切分成自己设置的组
attr(exam1, 'levels'); attr(exam2, 'levels'); attr(exam2, 'class')
ordered(exam2, labels = c('bad', 'ok', 'average', 'good', 'excellent')) #一个有序因子


4.table计算每个因子出现的次数

> table(score1)
score1
C B A
1 2 2


> ff <- factor(1:100)
> ff
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
100 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ... 100
> ff2 <- ff[1]
> ff2
[1] 1
100 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ... 100



reorder

reorder.default {stats} R Documentation
Reorder Levels of a Factor
reorder is a generic function. The "default" method treats its first argument as a categorical variable, and reorders its levels based on the values of a second variable, usually numeric.

Usage
reorder(x, ...)
## Default S3 method:
reorder(x, X, FUN = mean, ...,
order = is.ordered(x))
require(graphics)
bymedian <- with(InsectSprays, reorder(spray, count, median))
boxplot(count ~ bymedian, data = InsectSprays,
xlab = "Type of spray", ylab = "Insect count",
main = "InsectSprays data", varwidth = TRUE,
col = "lightgray")
ps: 这里面的FUN还不是太理解，反正可以应用到ggplot中bar_geom中了，见http://blog.sina.com.cn/s/blog_670445240102v1xw.html