【3.3.3】Pandas--排序(sort_index/sort_values/argsort)

May 06, 2018 pandas 阅读量：次

常用的3种方法:

.sort_index()方法在指定轴上根据索引进行排序，默认升序 .sort_index(axis=0, ascending=True)
.sort_values()方法在指定轴上根据数值进行排序，默认升序 Series.sort_values(axis=0, ascending=True)

一、sort_values

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

参数说明

axis:{0 or ‘index’, 1 or ‘columns’}, default 0，默认按照索引排序，即纵向排序，如果为1，则是横向排序    
by:str or list of str；如果axis=0，那么by="列名"；如果axis=1，那么by="行名"；  
ascending:布尔型，True则升序，可以是[True,False]，即第一字段升序，第二个降序  
inplace:布尔型，是否用排序后的数据框替换现有的数据框  
kind:排序方法，{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’。似乎不用太关心  
na_position : {‘first’, ‘last’}, default ‘last’，默认缺失值排在最后面

按行或列对应的值来排序

DataFrame.sort_values(by, axis=0, ascending=True)

by : axis轴上的某个索引或索引列表

NaN统一放到排序末尾

import pandas as pd
import numpy as np 

b = pd.DataFrame(np.arange(20).reshape(4,5),index=['c','a','d','b'])

print b
	0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19


g= b.sort_values(2,ascending=False)
print g
	0   1   2   3   4
b  15  16  17  18  19
d  10  11  12  13  14
a   5   6   7   8   9
c   0   1   2   3   4

h = b.sort_values('a',axis=1,ascending=False)
print h
	4   3   2   1   0
c   4   3   2   1   0
a   9   8   7   6   5
d  14  13  12  11  10
b  19  18  17  16  15

一列增序，一列降序

b,c为列名，inplace=True时，替换原来的数据

df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)

二、sort_index

sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)

参数说明

axis：0按照行名排序；1按照列名排序
level：默认None，否则按照给定的level顺序排列---貌似并不是，文档
ascending：默认True升序排列；False降序排列
inplace：默认False，否则排序之后的数据直接替换原来的数据框
kind：默认quicksort，排序的方法
na_position：缺失值默认排在最后{"first","last"}
by：按照那一列数据进行排序，但是by参数貌似不建议使用

按行或列的索引来排序

import pandas as pd
import numpy as np 

b = pd.DataFrame(np.arange(20).reshape(4,5),index=['c','a','d','b'])

print b
	0   1   2   3   4
c   0   1   2   3   4
a   5   6   7   8   9
d  10  11  12  13  14
b  15  16  17  18  19

c = b.sort_index()
print c
	0   1   2   3   4
a   5   6   7   8   9
b  15  16  17  18  19
c   0   1   2   3   4
d  10  11  12  13  14

e =b.sort_index(ascending=False)
print e
	0   1   2   3   4
d  10  11  12  13  14
c   0   1   2   3   4
b  15  16  17  18  19
a   5   6   7   8   9

m = b.sort_index(axis=1,ascending=False)
print m
	4   3   2   1   0
c   4   3   2   1   0
a   9   8   7   6   5
d  14  13  12  11  10
b  19  18  17  16  15

例如；

## 对x1列升序排列，x2列升序。处理x1有相同值的情况  
import pandas as pd  
x = pd.DataFrame({"x1":[1,2,2,3],"x2":[4,3,2,1]})  
x.sort_index(by = ["x1","x2"],ascending = [False,True])  

	x1	x2
3	3	1
2	2	2
1	2	3
0	1	4

三、argsort

使用argsort(来自StackOverflow)选择数据最接近某个值的行

>>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })

>>> df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

>>> df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50

三、具体案例

3.1 修改列的顺序

这是我的df:

                             Net   Upper   Lower  Mid  Zsore
Answer option                                                
More than once a day          0%   0.22%  -0.12%   2    65 
Once a day                    0%   0.32%  -0.19%   3    45
Several times a week          2%   2.45%   1.10%   4    78
Once a week                   1%   1.63%  -0.40%   6    65

怎样将mid这一列移动到第一列?

    　　　　　　　　　　　　　　　Mid   Upper   Lower  Net  Zsore
Answer option                                                
More than once a day          2   0.22%  -0.12%   0%    65 
Once a day                    3   0.32%  -0.19%   0%    45
Several times a week          4   2.45%   1.10%   2%    78
Once a week                   6   1.63%  -0.40%   1%    65

方法一：

new_col = ['name','sum','H','L']
df_4 = df_3.loc[:,new_col]

方法二：

In [39]:
mid = df['Mid']
df.drop(labels=['Mid'], axis=1,inplace = True)
df.insert(0, 'Mid', mid)
df
Out[39]:
                      Mid Net  Upper   Lower  Zsore
Answer_option                                      
More_than_once_a_day    2  0%  0.22%  -0.12%     65
Once_a_day              3  0%  0.32%  -0.19%     45
Several_times_a_week    4  2%  2.45%   1.10%     78
Once_a_week             6  1%  1.63%  -0.40%     65

3.2 修改行的顺序

示例数据：

>>> df = pd.DataFrame({
...     'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
...     'col2' : [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
... })
>>> df
    col1 col2 col3
0   A    2    0
1   A    1    1
2   B    9    9
3   NaN  8    4
4   D    7    2
5   C    4    3

Sort by col1

>>> df.sort_values(by=['col1'])
    col1 col2 col3
0   A    2    0
1   A    1    1
2   B    9    9
5   C    4    3
4   D    7    2
3   NaN  8    4

根据多列排序

>>> df.sort_values(by=['col1', 'col2'])
    col1 col2 col3
1   A    1    1
0   A    2    0
2   B    9    9
5   C    4    3
4   D    7    2
3   NaN  8    4

排序方式

>>> df.sort_values(by='col1', ascending=False)
    col1 col2 col3
4   D    7    2
5   C    4    3
2   B    9    9
0   A    2    0
1   A    1    1
3   NaN  8    4

NA放的位置

>>> df.sort_values(by='col1', ascending=False, na_position='first')
    col1 col2 col3
3   NaN  8    4
4   D    7    2
5   C    4    3
2   B    9    9
0   A    2    0
1   A    1    1

注：

na_position : {‘first’, ‘last’}
axis : {0 or ‘index’, 1 or ‘columns’}, default 0 ；列还是行的排序

参考资料

北京理工大学嵩山 www.python123.org

这里是一个广告位，，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn