Page 2: Data Inspection & Selection

2. Data Inspection & Selection

Once data is loaded, it's crucial to inspect its structure and content. pandas offers powerful methods for data selection.

2.1. Inspecting Data

Use these methods to get an overview of your DataFrame:

print(df.info())
print(df.describe())
print(df.shape)

2.2. Selecting Data

You can select columns using bracket notation and rows using .loc (label-based) or .iloc (integer-based).

Selecting Columns

# Select a single column (returns a Series)
ages = df['Age']
print(ages.head())

# Select multiple columns (returns a DataFrame)
subset = df[['Name', 'Age']]
print(subset.head())

Selecting Rows with .loc (Label-based)

# Select row with index label '0'
row_0 = df.loc[0] 
print(row_0)

# Select rows with index labels '0' to '2' and columns 'Name', 'Age'
rows_cols = df.loc[0:2, ['Name', 'Age']]
print(rows_cols)

Selecting Rows with .iloc (Integer-based)

# Select the first row
first_row = df.iloc[0]
print(first_row)

# Select the first 3 rows and first 2 columns
first_3_rows_2_cols = df.iloc[0:3, 0:2]
print(first_3_rows_2_cols)

Conditional Selection

Select data based on conditions, which returns a boolean Series for filtering.

# Select all rows where 'Age' is greater than 30
adults = df[df['Age'] > 30]
print(adults.head())

# Combine multiple conditions
filtered_data = df[(df['Age'] > 25) & (df['City'] == 'New York')]
print(filtered_data.head())