2. Data Inspection & Selection
Once data is loaded, it's crucial to inspect its structure and content. pandas offers powerful methods for data selection.
2.1. Inspecting Data
Use these methods to get an overview of your DataFrame:
df.head()/df.tail(): View the first/last N rows.df.info(): Get a summary of the DataFrame including data types and non-null values.df.describe(): Generate descriptive statistics of numerical columns.df.shape: Get the number of rows and columns.
print(df.info())
print(df.describe())
print(df.shape)
2.2. Selecting Data
You can select columns using bracket notation and rows using .loc (label-based) or .iloc (integer-based).
Selecting Columns
# Select a single column (returns a Series)
ages = df['Age']
print(ages.head())
# Select multiple columns (returns a DataFrame)
subset = df[['Name', 'Age']]
print(subset.head())
Selecting Rows with .loc (Label-based)
# Select row with index label '0'
row_0 = df.loc[0]
print(row_0)
# Select rows with index labels '0' to '2' and columns 'Name', 'Age'
rows_cols = df.loc[0:2, ['Name', 'Age']]
print(rows_cols)
Selecting Rows with .iloc (Integer-based)
# Select the first row
first_row = df.iloc[0]
print(first_row)
# Select the first 3 rows and first 2 columns
first_3_rows_2_cols = df.iloc[0:3, 0:2]
print(first_3_rows_2_cols)
Conditional Selection
Select data based on conditions, which returns a boolean Series for filtering.
# Select all rows where 'Age' is greater than 30
adults = df[df['Age'] > 30]
print(adults.head())
# Combine multiple conditions
filtered_data = df[(df['Age'] > 25) & (df['City'] == 'New York')]
print(filtered_data.head())