【3】数据分析--10--科学计算--Pandas--3--Dataframe索引

  • 如何改变Series和DataFrame对象?
  • 增加或重排:重新索引
  • 删除:drop

获取行名:

list(dataframe.index)

获取列名

column_names_1  = list(data_info.columns.values)

一、reindex()

.reindex()能够改变或重排Series和DataFrame索引

import pandas as pd
import numpy as np 


dt = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(dt,index =['a','b','c','d'])
print d
   one  two
a    1    9
b    2    8
c    3    7
d    4    6

d = d.reindex(columns = ['two','one'])
print d
   two  one
a    9    1
b    8    2
c    7    3
d    6    4

d =d.reindex(index = ['d','a','c','b'])
print d
   two  one
d    6    4
a    9    1
c    7    3
b    8    2

.reindex(index=None, columns=None, …)的参数

参数 说明
index, columns 新的行列自定义索引
fill_value 重新索引中,用于填充缺失位置的值
method 填充方法, ffill当前值向前填充,bfill向后填充
limit 最大填充量
copy 默认True,生成新的对象,False时,新旧相等不复制

案例:

import pandas as pd
import numpy as np 


dt = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(dt,index =['a','b','c','d'])
print d
   one  two
a    1    9
b    2    8
c    3    7
d    4    6


d2 =d.columns.insert(2,'add')
d3 = d.reindex(columns = d2,fill_value =200)

print d3
   one  two  add
a    1    9  200
b    2    8  200
c    3    7  200
d    4    6  200

Series和DataFrame的索引是Index类型 Index对象是不可修改类型 索引类型常用方法

方法 说明
.append(idx) 连接另一个Index对象,产生新的Index对象
.diff(idx) 计算差集,产生新的Index对象
.intersection(idx) 计算交集
.union(idx) 计算并集
.delete(loc) 删除loc位置处的元素
.insert(loc,e) 在loc位置增加一个元素e

案例:

import pandas as pd
import numpy as np 


dt = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(dt,index =['a','b','c','d'])
print d
   one  two
a    1    9
b    2    8
c    3    7
d    4    6

nc =d.columns.delete(1)
ni =d.index.insert(4,'w')
d3 = d.reindex(index =ni,columns = nc,method='ffill')

print d3
   one
a    1
b    2
c    3
d    4
w    4

二、drop()

.drop()能够删除Series和DataFrame指定行或列索引

import pandas as pd
import numpy as np 

dt = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(dt,index =['a','b','c','d'])
print d
   one  two
a    1    9
b    2    8
c    3    7
d    4    6

e = d.drop('two',axis=1)
print e
   one
a    1
b    2
c    3
d    4

f =d.drop('c')
print f
   one  two
a    1    9
b    2    8
d    4    6

m =d.drop(['c','d'])
print  m
   one  two
a    1    9
b    2    8

三、修改DataFrame列名的方法

数据如下:

>>>import pandas as pd
>>>a = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9]})
>>> a 
 A B C
0 1 4 7
1 2 5 8
2 3 6 9

方法一:暴力方法

>>>a.columns = ['a','b','c']
>>>a
 a b c
0 1 4 7
1 2 5 8
2 3 6 9

但是缺点是必须写三个,要不报错。

方法二:较好的方法

>>>a.rename(columns={'A':'a', 'B':'b', 'C':'c'}, inplace = True)
>>>a
 a b c
0 1 4 7
1 2 5 8
2 3 6 9

好处是可以随意改个数:

>>>a.rename(columns={'A':'a', 'C':'c'}, inplace = True)
>>>a
 a B c
0 1 4 7
1 2 5 8
2 3 6 9

可以只改变’A’,‘C’,不改变’B’。

四、更改pandas dataframe 列的顺序

这是我的df:

                             Net   Upper   Lower  Mid  Zsore
Answer option                                                
More than once a day          0%   0.22%  -0.12%   2    65 
Once a day                    0%   0.32%  -0.19%   3    45
Several times a week          2%   2.45%   1.10%   4    78
Once a week                   1%   1.63%  -0.40%   6    65

怎样将mid这一列移动到第一列?

                   Mid   Upper   Lower  Net  Zsore
Answer option                                                
More than once a day          2   0.22%  -0.12%   0%    65 
Once a day                    3   0.32%  -0.19%   0%    45
Several times a week          4   2.45%   1.10%   2%    78
Once a week                   6   1.63%  -0.40%   1%    65

方法一: ix

In [27]:
# get a list of columns
cols = list(df)
# move the column to head of list using index, pop and insert
cols.insert(0, cols.pop(cols.index('Mid')))
cols
Out[27]:
['Mid', 'Net', 'Upper', 'Lower', 'Zsore']
In [28]:
# use ix to reorder
df = df.ix[:, cols]
df
Out[28]:
                      Mid Net  Upper   Lower  Zsore
Answer_option                                      
More_than_once_a_day    2  0%  0.22%  -0.12%     65
Once_a_day              3  0%  0.32%  -0.19%     45
Several_times_a_week    4  2%  2.45%   1.10%     78
Once_a_week             6  1%  1.63%  -0.40%     65

方法二:

In [39]:
mid = df['Mid']
df.drop(labels=['Mid'], axis=1,inplace = True)
df.insert(0, 'Mid', mid)
df
Out[39]:
                      Mid Net  Upper   Lower  Zsore
Answer_option                                      
More_than_once_a_day    2  0%  0.22%  -0.12%     65
Once_a_day              3  0%  0.32%  -0.19%     45
Several_times_a_week    4  2%  2.45%   1.10%     78
Once_a_week             6  1%  1.63%  -0.40%     65
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn

Sam avatar
About Sam
专注生物信息 专注转化医学