Here are several code snippets demonstrating efficient data processing techniques using Pandas in Python. These examples cover common tasks such as data cleaning, transformation, and analysis.
๐งน Data Cleaning
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'C', 'B'],
'col2': [1, 1, 2, 3, 2]})
df_unique = df.drop_duplicates()
print(df_unique)
- Handling Missing Values (NaN):
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': [1, 2, np.nan, 4],
'col2': [5, np.nan, 7, 8]})
# Fill NaN values with a specific value (e.g., 0)
df_filled = df.fillna(0)
print(df_filled)
# Drop rows with NaN values
df_dropna = df.dropna()
print(df_dropna)
๐ Data Transformation
- Applying Functions to Columns:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 4]})
# Using apply with a lambda function
df['col2'] = df['col1'].apply(lambda x: x * 2)
print(df)
# Using a defined function
def square(x):
return x ** 2
df['col3'] = df['col1'].apply(square)
print(df)
- Vectorized String Operations:
import pandas as pd
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
# Convert names to uppercase
df['name_upper'] = df['name'].str.upper()
print(df)
๐ Data Analysis
- Grouping and Aggregating Data:
import pandas as pd
df = pd.DataFrame({'category': ['A', 'A', 'B', 'B', 'A'],
'value': [10, 12, 15, 18, 20]})
# Group by 'category' and calculate the sum of 'value'
df_grouped = df.groupby('category')['value'].sum()
print(df_grouped)
# Multiple aggregations
df_agg = df.groupby('category')['value'].agg(['sum', 'mean', 'count'])
print(df_agg)
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5],
'col2': [6, 7, 8, 9, 10]})
# Filter rows where 'col1' is greater than 2
df_filtered = df[df['col1'] > 2]
print(df_filtered)
# Multiple conditions
df_filtered_multiple = df[(df['col1'] > 2) & (df['col2'] < 10)]
print(df_filtered_multiple)
These snippets should help you process data more efficiently in Pandas. Remember to choose the right method based on your specific data and task for optimal performance. ๐