【6.1.1】numpy的ndaray与pandas的series和dataframe、list、dict之间互转

data可以为list、series、hash

一、初始化数据

创建serires数据

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
data = [[2000,'Ohino',1.5],
        [2001,'Ohino',1.7],
        [2002,'Nevada',2.4]]
ser = Series(data,index=['one','two','three'])

print 'Ser的结果:\n'
print ser

结果:

Ser的结果:

one       [2000, Ohino, 1.5]
two       [2001, Ohino, 1.7]
three    [2002, Nevada, 2.4]
dtype: object

创建dataframe:

df = DataFrame(data,index=['one','two','three'],columns=['year','state','pop'])

print '\ndataframe的结果:\n'
print df

结果:

dataframe的结果:

       year   state  pop
one    2000   Ohino  1.5
two    2001   Ohino  1.7
three  2002  Nevada  2.4

二、数据转换

2.1 series 转matrix

foo = ser.as_matrix()

print '\n ser to matrix的结果:\n'
print foo

结果:

 ser to matrix的结果:

[list([2000, 'Ohino', 1.5]) 
 list([2001, 'Ohino', 1.7])
 list([2002, 'Nevada', 2.4])]

2.2 Series转frame

虽然Series有一个to_frame()方法,但是当Series的index也需要转变为DataFrame的一列时,这个方法转换会有一点问题。所以,下面我采用将Series对象转换为list对象,然后将list对象转换为DataFrame对象。

这里的month为一个series对象:

type(month)
pandas.core.series.Series

它的index为月份,values为数量,下面将这两列都转换为DataFrame的columns。

import pandas as pd

dict_month = {'month':month.index,'numbers':month.values}
df_month = pd.DataFrame(dict_month)

2.2 dataframe转matrix

foo = df.as_matrix()

print '\n dataframe to matrix的结果:\n'
print foo

结果:

 dataframe to matrix的结果:

[[2000 'Ohino' 1.5]
 [2001 'Ohino' 1.7]
 [2002 'Nevada' 2.4]]

2.3 dataframe转array

foo_2 = np.array(df)

print '\n dataframe to array的结果:\n'
print foo_2

[[2000 'Ohino' 1.5]
 [2001 'Ohino' 1.7]
 [2002 'Nevada' 2.4]]

例子

foo_3 = df.as_matrix(['pop'])
print '\n dataframe to array的结果:\n'
print foo_3

输出结果

 dataframe to array的结果:

[[1.5]
 [1.7]
 [2.4]]

2.4 转成list

import pandas as pd

>>> df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9],
                   'b':[3,5,6,2,4,6,7,8,7,8,9]})


>>> df['a'].values.tolist()
[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]
or you can just use

>>> df['a'].tolist()
[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]
To drop duplicates you can do one of the following:

>>> df['a'].drop_duplicates().values.tolist()
[1, 3, 5, 7, 4, 6, 8, 9]

>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]

# convert df to list[list] 
>>> df.values.tolist()


# conver series to list
>>> Series.tolist()

3.4 dataframe 转dict

示例代码:

import pandas as pd
df = pd.DataFrame({'col1': [1, 2],
                    'col2': [0.5, 0.75]},
                   index=['row1', 'row2'])

df_2 = df.set_index('col1').to_dict() 
print df_2

print  df.set_index('col1')['col2'].to_dict()

示例结果

{'col2': {1: 0.5, 2: 0.75}}

{1: 0.5, 2: 0.75}

更多例子:

>>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can specify the return orientation.


>>> df.to_dict('series')
{'col1': row1    1
         row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
        row2    0.75
Name: col2, dtype: float64}


>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]]}

>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
             ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])


If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
 defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]

其他例子:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({'colA' : list('AABCA'), 'colB' : ['X',np.nan,'Ya','Xb','Xa'],'colC' : [100,50,30,5
   ...: 0,20], 'colD': [90,60,60,80,50]})

In [4]: df
Out[4]:
  colA colB  colC  colD
0    A    X   100    90
1    A  NaN    50    60
2    B   Ya    30    60
3    C   Xb    50    80
4    A   Xa    20    50

In [5]: df.to_dict(orient='dict')
Out[5]:
{'colA': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'A'},
 'colB': {0: 'X', 1: nan, 2: 'Ya', 3: 'Xb', 4: 'Xa'},
 'colC': {0: 100, 1: 50, 2: 30, 3: 50, 4: 20},
 'colD': {0: 90, 1: 60, 2: 60, 3: 80, 4: 50}}

In [6]: df.to_dict(orient='list')
Out[6]:
{'colA': ['A', 'A', 'B', 'C', 'A'],
 'colB': ['X', nan, 'Ya', 'Xb', 'Xa'],
 'colC': [100, 50, 30, 50, 20],
 'colD': [90, 60, 60, 80, 50]

In [7]: df.to_dict(orient='series')
Out[7]:
{'colA': 0    A
 1    A
 2    B
 3    C
 4    A
 Name: colA, dtype: object, 'colB': 0      X
 1    NaN
 2     Ya
 3     Xb
 4     Xa
 Name: colB, dtype: object, 'colC': 0    100
 1     50
 2     30
 3     50
 4     20
 Name: colC, dtype: int64, 'colD': 0    90
 1    60
 2    60
 3    80
 4    50
 Name: colD, dtype: int64}

In [8]: df.to_dict(orient='split')
Out[8]:
{'columns': ['colA', 'colB', 'colC', 'colD'],
 'data': [['A', 'X', 100, 90],
  ['A', nan, 50, 60],
  ['B', 'Ya', 30, 60],
  ['C', 'Xb', 50, 80],
  ['A', 'Xa', 20, 50]],
 'index': [0, 1, 2, 3, 4]}

In [9]: df.to_dict(orient='records')
Out[9]:
[{'colA': 'A', 'colB': 'X', 'colC': 100, 'colD': 90},
 {'colA': 'A', 'colB': nan, 'colC': 50, 'colD': 60},
 {'colA': 'B', 'colB': 'Ya', 'colC': 30, 'colD': 60},
 {'colA': 'C', 'colB': 'Xb', 'colC': 50, 'colD': 80},
 {'colA': 'A', 'colB': 'Xa', 'colC': 20, 'colD': 50}]

In [10]: df.to_dict(orient='index')
Out[10]:
{0: {'colA': 'A', 'colB': 'X', 'colC': 100, 'colD': 90},
 1: {'colA': 'A', 'colB': nan, 'colC': 50, 'colD': 60},
 2: {'colA': 'B', 'colB': 'Ya', 'colC': 30, 'colD': 60},
 3: {'colA': 'C', 'colB': 'Xb', 'colC': 50, 'colD': 80},
 4: {'colA': 'A', 'colB': 'Xa', 'colC': 20, 'colD': 50}}

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn