5. Time Series & Advanced Operations
pandas is exceptionally well-suited for working with time series data, offering robust functionalities for various time-based analyses.
5.1. Date and Time Handling
Converting strings to datetime objects is the first step for time series analysis. Setting a datetime column as the index enables powerful time-based operations.
# Create a DataFrame with a date column
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
'Value': [10, 12, 15, 13, 16]}
df_time = pd.DataFrame(data)
# Convert 'Date' column to datetime objects
df_time['Date'] = pd.to_datetime(df_time['Date'])
# Set 'Date' as the index
df_time.set_index('Date', inplace=True)
print(df_time.head())
print(df_time.info())
5.2. Resampling Time Series Data
Resampling is the process of converting a time series from one frequency to another (e.g., daily to monthly). .resample() followed by an aggregation function is commonly used.
# Resample daily data to weekly data, taking the mean
weekly_mean = df_time.resample('W')['Value'].mean()
print(weekly_mean)
# Resample to monthly sum
monthly_sum = df_time.resample('M')['Value'].sum()
print(monthly_sum)
5.3. Rolling and Expanding Windows
Window functions allow you to perform calculations over a sliding (rolling) or growing (expanding) window of data. This is useful for moving averages, cumulative sums, etc.
# Calculate a 3-day rolling mean
df_time['Rolling_Mean_3D'] = df_time['Value'].rolling(window=3).mean()
print(df_time)
# Calculate an expanding sum
df_time['Expanding_Sum'] = df_time['Value'].expanding().sum()
print(df_time)
With these advanced operations, you can perform sophisticated time series analysis and extract deeper insights from your datasets using pandas.