Create articles from any YouTube video or use our API to get YouTube transcriptions
Start for freeBoost Your Data Handling with Top 25 Pandas Tips and Tricks
If you're working with data in Python, Pandas is an indispensable library that can make data analysis a breeze. Kevin from Data School has shared his top 25 Pandas tricks, gleaned from five years of teaching, that can help you work more efficiently, write superior code, and impress your peers. Below, we'll explore these tricks in detail to enhance your data analysis skills.
Trick 1: Show Installed Versions
Knowing the version of Pandas you are working with is crucial, especially when referencing the documentation. Use pd.__version__
to check the Pandas version, or pd.show_versions()
for a comprehensive list of versions for Python, Pandas, NumPy, and other dependencies.
Trick 2: Create an Example DataFrame
Creating an example DataFrame is common for demonstrating Pandas code. Use a dictionary with keys as column names and values as lists, or for larger DataFrames, leverage NumPy's random.rand
function. To add non-numeric column names, convert a string of characters to a list and pass it to the columns
parameter.
Trick 3: Rename Columns
Renaming columns can be accomplished through several methods. The rename
method allows for renaming specific columns with a dictionary, while overwriting the columns
attribute can rename all columns simultaneously. For simple replacements, such as changing spaces to underscores, str.replace
is efficient. Additionally, add_prefix
or add_suffix
methods can append prefixes or suffixes to column names.
Trick 4: Reverse Row Order
To reverse row order, use the loc
accessor with ::-1
. If you wish to reset the index starting from zero, chain the reset_index(drop=True)
method.
Trick 5: Reverse Column Order
Similarly, reverse column order by utilizing the loc
accessor with :, ::-1
syntax, which maintains all rows while reversing columns.
Trick 6: Select Columns by Data Type
Select specific data types using select_dtypes(include=[dtype])
or exclude certain types with exclude=[dtype]
. This is handy for filtering only numeric or object columns.
Trick 7: Convert Strings to Numbers
To perform mathematical operations on columns with string values, convert them to a numeric type using astype
. For strings with invalid characters, pd.to_numeric(errors='coerce')
converts them to NaN, which can then be filled with zeros using fillna(0)
.
Trick 8: Reduce DataFrame Size
To manage memory usage, read only necessary columns using read_csv(usecols=[cols])
and convert object columns with categorical data to the category
data type to reduce DataFrame size.
Trick 9: Build a DataFrame from Multiple Files Row-wise
Combine datasets spread across multiple files into a single DataFrame using the glob
module to match file patterns and pd.concat
to merge rows.
Trick 10: Build a DataFrame from Multiple Files Column-wise
When files contain different columns of a dataset, read them individually with read_csv
and combine using pd.concat(axis=1)
.
Trick 11: Create a DataFrame from the Clipboard
The read_clipboard
function allows quick importation of copied data into a DataFrame, ideal for data from spreadsheets, though not recommended for reproducible work.
Trick 12: Split a DataFrame into Two Random Subsets
Use the sample
method to randomly assign rows to subsets and drop
to exclude already assigned rows, effectively splitting the DataFrame.
Trick 13: Filter a DataFrame by Multiple Categories
Filtering by categories is simplified with the isin
method, which checks if DataFrame values match a list of categories. To exclude categories, prefix the condition with ~
.
Trick 14: Filter a DataFrame by Largest Categories
To filter by the largest categories, use value_counts
, nlargest
, and isin
methods to include only the top categories in your analysis.
Trick 15: Handle Missing Values
Identify and handle missing values in your DataFrame with isna
, dropna
, and setting thresholds for data inclusion based on the presence of non-null values.
Trick 16: Split a String into Multiple Columns
Split string columns into separate columns using str.split(expand=True)
. Select specific split parts by indexing the resulting DataFrame.
Trick 17: Expand a Series of Lists into a DataFrame
Expand a column containing lists into a separate DataFrame with apply(pd.Series)
and concatenate it with the original DataFrame using pd.concat
.
Trick 18: Aggregate by Multiple Functions
Use groupby
followed by agg([functions])
to perform multiple aggregations simultaneously, such as summing and counting grouped data.
Trick 19: Combine the Output of an Aggregation with a DataFrame
The transform
method allows you to perform aggregations while maintaining the original DataFrame shape, useful for creating new columns based on aggregated data.
Trick 20: Select a Slice of Rows and Columns
Utilize loc
with row and column label slices to select specific sections of your DataFrame for a more focused analysis.
Trick 21: Reshape a MultiIndexed Series
Convert a MultiIndexed Series into a more readable DataFrame format using unstack
, allowing for easier data interaction.
Trick 22: Create a Pivot Table
Pivot tables are versatile tools for summarizing data, with options for specifying indexes, columns, values, and aggregation functions, plus the ability to include margins.
Trick 23: Convert Continuous Data into Categorical Data
Use the cut
function to categorize continuous data into labeled bins, transforming it into categorical data for analysis.
Trick 24: Change Display Options
Adjust display options, such as decimal precision, using pd.set_option
, affecting only the display, not the underlying data.
Trick 25: Style a DataFrame
Enhance DataFrame presentation with styling options such as custom formats, hiding indexes, and adding visualizations like bar charts and color gradients.
Bonus: Profile a DataFrame
For a comprehensive analysis of a new dataset, consider the pandas-profiling
package to generate an interactive HTML report with detailed insights into your data.
These 25 tips and tricks offer a powerful set of tools for any data analyst or enthusiast working with Pandas. Whether you're a beginner or a seasoned pro, these tips can help you handle your data more effectively and efficiently.
For more details and to see these tricks in action, check out Kevin's video on Data School's YouTube channel.