This is a python cheat sheet for Data Manipulation using pandas.
Check types of columns
df.info()
nrow and ncol
df.shape
check the data of head and tail
df.head(20)
df.tail(20)
check summary statistic
df.describe()
check columns distribution
df.hist()
df[rowid list]
df[rowMask list]
df[colname list]
df['colname1']
df.colname1
df.loc[rowslicing, colname list]
df.iloc[rowid list, rowid list],
df.append(pd.DataFrame(dict)
ignore_index=True)
df[colname list] = df2
df['colname1'] = series
df.colname1 = series
df.loc[rowslicing, colname list] = data
df.iloc[rowid list, rowid list] = data
Delete rows:
df.drop(index)
df.drop(labels, axis=0)
df.dropna(axis=0)
df.drop_duplicates()
Delete columns:
df.drop(columns)
df.drop(labels, axis=1)
df.dropna(axis=1)
Delete level:
df.drop(level)
Type of join | Pandas | SQL | Venn diagram |
---|---|---|---|
Inner join | pd.merge(df1, df2, on='key') |
SELECT * FROM table1 INNER JOIN table2 ON table1.key = table2.key; |
|
Left join | pd.merge(df1, df2, on='key',how='left') |
SELECT * FROM table1 LEFT JOIN table2 ON table1.key = table2.key; |
|
Right join | pd.merge(df1, df2, on='key',how='right') |
SELECT * FROM table1 RIGHT JOIN table2 ON table1.key = table2.key; |
|
Full outer join | pd.merge(df1, df2, on='key',how='outer') |
SELECT * FROM table1 FULL JOIN table2 ON table1.key = table2.key; |
Pivot: long-to-wide, unstack row axis into measured columns
df.pivot(index='colname1', columns='colname2'],values='values')
df.pivot(index=['colname1','colname2'], columns=['colname3', 'colname4'], values='values')
pivot_table: Pivot/long-to-wide + aggfun,
pivot_table()
Unpivot: wide-to-long, Stack the measured columns into row axis
pd.melt(df, id_vars=['colname1'], value_vars=['colname2'])
pd.melt(df, id_vars=['colname1'], value_vars=['colname2', 'colname3'], ignore_index=False)
pd.wide_to_long()
stack: Stack the prescribed level from columns to hierarchical index
df.stack()
unstack: unstack a level of the hierarchical index to columns
df_stacked.unstack()
Function | Object | Location of effect | Example |
---|---|---|---|
map() |
Series | - element-wise | - mapping values from one domain to another: df['A'].map({1:'a', 2:'b', 3:'c'}) |
applymap() |
DataFrame | - element-wise | - mapping values from one domain to another: df.applymap(lambda x: x**2) |
apply() |
DataFrame | - row / column, - group-wise |
- calculate over an axis: df.apply(np.sum) - calculate by group, and return the grouped shape df.groupby('label).apply(lambda x: x.mean()) |
aggregate() |
DataFrame | - row / column, - group-wise |
- calculate over an axis: df.agg(['mean']) - calculate by group: df.groupby('label').agg(['mean']) |
transform() |
DataFrame | - element-wise, - group-wise |
- return the input shapedf.transform(lambda x: x.mean()) - calculate by group, and return the input shape df.groupby('label).transform(lambda x: x.mean()) |
!python3 -m jupyter nbconvert jupyter_PandasDataManipulation.ipynb --to html
[NbConvertApp] Converting notebook jupyter_PandasDataManipulation.ipynb to html [NbConvertApp] Writing 582031 bytes to jupyter_PandasDataManipulation.html