【3.3.5】Pandas--DataFrame的dropna

去掉空值

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)

一、参数

Name Description Type/Default Value Required / Optional
axis Determine if rows or columns which contain missing values are removed. 0, or ‘index’ : Drop rows which contain missing values. 1, or ‘columns’ : Drop columns which contain missing value. {0 or ‘index’, 1 or ‘columns’} ; Default Value: 0 Required
how Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.‘any’ : If any NA values are present, drop that row or column; ‘all’ : If all values are NA, drop that row or column. {‘any’, ‘all’} Default Value: ‘any’ Required
thresh Require that many non-NA values. int Optional
subset Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. array-like Optional
inplace If True, do operation inplace and return None. boolDefault Value: False Required

二、示例

import numpy as np
import pandas as pd


df = pd.DataFrame({"name": ['Superman', 'Batman', 'Spiderman'],
                   "toy": [np.nan, 'Batmobile', 'Spiderman toy'],
                   "born": [pd.NaT, pd.Timestamp("1956-06-26"),
                            pd.NaT]})
df

输出:

name    toy born
0   Superman    NaN NaT
1   Batman  Batmobile   1956-06-26
2   Spiderman   Spiderman toy   NaT

2.1 Drop the rows where at least one element is missing:

df.dropna()

    name    toy born
    1   Batman  Batmobile   1956-06-26

2.2 Drop the columns where at least one element is missing:

df.dropna(axis='columns')
df

name
0   Superman
1   Batman
2   Spiderman

2.3 Drop the rows where all elements are missing.

df.dropna(how='all')

name    toy born
0   Superman    NaN NaT
1   Batman  Batmobile   1956-06-26
2   Spiderman   Spiderman toy   NaT

2.4 Keep only the rows with at least 2 non-NA values: 保留至少2个非空值的行

df.dropna(thresh=2)

name    toy born
1   Batman  Batmobile   1956-06-26
2   Spiderman   Spiderman toy   NaT

2.5 Define in which columns to look for missing values:

df.dropna(subset=['name', 'born'])
name    toy born
1   Batman  Batmobile   1956-06-26

2.6 Keep the DataFrame with valid entries in the same variable:

df.dropna(inplace=True)
df

name    toy born
1   Batman  Batmobile   1956-06-26

参考资料

个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn

Sam avatar
About Sam
专注生物信息 专注转化医学