import pandas as pd
20 Pandas Basics Part 2 — Workbook
In this lesson, we’re going to introduce some more fundamentals of Pandas, a powerful Python library for working with tabular data like CSV files.
20.1 The Trans-Atlantic Slave Trade Database
“[D]isplaying data alone could not and did not offer the atonement descendants of slaves sought or capture the inhumanity of this archive’s formation.”
— Jessica Marie Johnson, “Markup Bodies”
The dataset that we’re going to be working with in this lesson is taken from The Trans-Atlantic Slave Trade Database, part of the Slave Voyages project. The larger database includes information about 35,000 slave-trading voyages from 1514-1866. The dataset we’re working with here was filtered to include the 20,000 voyages that landed in the Americas. The data was filtered to also include the percentage of enslaved men, women, and children on the voyages.
= [['Avatar', 'Neytiri', 2000, 2800], ['Avatar', 'Jake', 500, 2800], ['The Avengers', 'Loki', 700, 1518],['The Avengers', 'Thor', 800, 1518], ['My College Short Film', 'Myself', 100], ['My College Short Film', 'My Mom', 900]]
data = pd.DataFrame(data, columns = ['Film', 'Character', 'Words Spoken','Box Office Gross (Millions)']) df
df
Film | Character | Words Spoken | Box Office Gross (Millions) | |
---|---|---|---|---|
0 | Avatar | Neytiri | 2000 | 2800.0 |
1 | Avatar | Jake | 500 | 2800.0 |
2 | The Avengers | Loki | 700 | 1518.0 |
3 | The Avengers | Thor | 800 | 1518.0 |
4 | My College Short Film | Myself | 100 | NaN |
5 | My College Short Film | My Mom | 900 | NaN |
'Film').sum() df.groupby(
Words Spoken | Box Office Gross (Millions) | |
---|---|---|
Film | ||
Avatar | 2500 | 5600.0 |
My College Short Film | 1000 | 0.0 |
The Avengers | 1500 | 3036.0 |
'Film')['Words Spoken'].mean() df.groupby(
Film
Avatar 1250.0
My College Short Film 500.0
The Avengers 750.0
Name: Words Spoken, dtype: float64
'Film').count() df.groupby(
Character | Words Spoken | Box Office Gross (Millions) | |
---|---|---|---|
Film | |||
Avatar | 2 | 2 | 2 |
My College Short Film | 2 | 2 | 0 |
The Avengers | 2 | 2 | 2 |
20.2 Import Pandas
20.3 Change Display Settings
By default, Pandas will display 60 rows and 20 columns. I often change Pandas’ default display settings to show more rows or columns.
= 200 pd.options.display.max_rows
20.4 Get Data
= pd.read_csv('../data/Trans-Atlantic-Slave-Trade_Americas.csv', delimiter=",", encoding='utf-8') slave_voyages_df
slave_voyages_df
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
This creates a Pandas DataFrame object — often abbreviated as df, e.g., slave_voyages_df. A DataFrame looks and acts a lot like a spreadsheet. But it has special powers and functions that we will discuss in the next few lessons.
20.5 Overview
To look at the first n rows in a DataFrame, we can use a method called .head()
.
10) slave_voyages_df.head(
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
5 | 1526 | NaN | Cape Verde Islands | San Domingo (a) Santo Domingo | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Santa Maria de Guadalupe | Pabon, Francisco | 42679 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
6 | 1526 | Portugal / Brazil | Sao Tome | Spanish Caribbean, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | NaN | 46474 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... | |
7 | 1527 | Spain / Uruguay | Cape Verde Islands | Puerto Rico, port unspecified | NaN | NaN | NaN | 325.0 | 260.0 | NaN | Concepción | Díaz, Alonso | 99027 | [u'SuedBadillo,57,75,76<><p>SuedBadillo, <em>x... |
8 | 1532 | Portugal / Brazil | Sao Tome | Spanish Caribbean, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Antônio | Afonso, Martim | 11293 | [u'Ryder,66<>Ryder, A. F. C., <i>Benin</i><i> ... |
9 | 1532 | NaN | Cape Verde Islands | San Juan | NaN | NaN | NaN | 25.0 | 20.0 | NaN | de Illanes, Manuel | 28994 | [u'Tanodi, 321-22<>Tanodi, Aurelio, <i>Documen... |
Generate information about all the columns in the data
slave_voyages_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20741 entries, 0 to 20740
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year_of_arrival 20741 non-null int64
1 flag 19583 non-null object
2 place_of_purchase 20663 non-null object
3 place_of_landing 20741 non-null object
4 percent_women 2894 non-null float64
5 percent_children 2927 non-null float64
6 percent_men 2894 non-null float64
7 total_embarked 20722 non-null float64
8 total_disembarked 20719 non-null float64
9 resistance_label 372 non-null object
10 vessel_name 20741 non-null object
11 captain's_name 19396 non-null object
12 voyage_id 20741 non-null int64
13 sources 20741 non-null object
dtypes: float64(5), int64(2), object(7)
memory usage: 2.2+ MB
Just like Python has different data types, Pandas has different data types, too. These data types are automatically assigned to columns when we read in a CSV file. We can check these Pandas data types with the .dtypes
method.
Pandas Data Type | Explanation |
---|---|
object |
string |
float64 |
float |
int64 |
integer |
datetime64 |
date time |
Make a histogram of the DataFrame
slave_voyages_df.hist()
array([[<AxesSubplot:title={'center':'year_of_arrival'}>,
<AxesSubplot:title={'center':'percent_women'}>,
<AxesSubplot:title={'center':'percent_children'}>],
[<AxesSubplot:title={'center':'percent_men'}>,
<AxesSubplot:title={'center':'total_embarked'}>,
<AxesSubplot:title={'center':'total_disembarked'}>],
[<AxesSubplot:title={'center':'voyage_id'}>, <AxesSubplot:>,
<AxesSubplot:>]], dtype=object)
=(10,10)) slave_voyages_df.hist(figsize
array([[<AxesSubplot:title={'center':'year_of_arrival'}>,
<AxesSubplot:title={'center':'percent_women'}>,
<AxesSubplot:title={'center':'percent_children'}>],
[<AxesSubplot:title={'center':'percent_men'}>,
<AxesSubplot:title={'center':'total_embarked'}>,
<AxesSubplot:title={'center':'total_disembarked'}>],
[<AxesSubplot:title={'center':'voyage_id'}>, <AxesSubplot:>,
<AxesSubplot:>]], dtype=object)
Generate descriptive statistics for all the columns in the data
slave_voyages_df.describe()
year_of_arrival | percent_women | percent_children | percent_men | total_embarked | total_disembarked | voyage_id | |
---|---|---|---|---|---|---|---|
count | 20741.000000 | 2894.000000 | 2927.000000 | 2894.000000 | 20722.000000 | 20719.000000 | 20741.000000 |
mean | 1752.014850 | 0.274198 | 0.231582 | 0.496648 | 295.050381 | 251.573966 | 42783.741671 |
std | 59.702189 | 0.116513 | 0.149508 | 0.140324 | 147.997690 | 128.050439 | 32401.785320 |
min | 1520.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 112.000000 |
25% | 1724.000000 | 0.195265 | 0.115380 | 0.407460 | 194.000000 | 163.000000 | 17862.000000 |
50% | 1765.000000 | 0.264110 | 0.215100 | 0.497890 | 282.000000 | 241.000000 | 31916.000000 |
75% | 1792.000000 | 0.346150 | 0.321900 | 0.586765 | 368.000000 | 313.000000 | 78283.000000 |
max | 1866.000000 | 1.000000 | 1.000000 | 1.000000 | 2024.000000 | 1700.000000 | 900206.000000 |
='all') slave_voyages_df.describe(include
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 20741.000000 | 19583 | 20663 | 20741 | 2894.000000 | 2927.000000 | 2894.000000 | 20722.000000 | 20719.000000 | 372 | 20741 | 19396 | 20741.000000 | 20741 |
unique | NaN | 8 | 156 | 187 | NaN | NaN | NaN | NaN | NaN | 6 | 5849 | 12233 | NaN | 13754 |
top | NaN | Great Britain | Africa., port unspecified | Barbados, port unspecified | NaN | NaN | NaN | NaN | NaN | Slave insurrection | Smith, John | NaN | [u'mettas,I<>Mettas, Jean, <i>R\xe9pertoire d... | |
freq | NaN | 10536 | 5999 | 2038 | NaN | NaN | NaN | NaN | NaN | 330 | 712 | 36 | NaN | 1134 |
mean | 1752.014850 | NaN | NaN | NaN | 0.274198 | 0.231582 | 0.496648 | 295.050381 | 251.573966 | NaN | NaN | NaN | 42783.741671 | NaN |
std | 59.702189 | NaN | NaN | NaN | 0.116513 | 0.149508 | 0.140324 | 147.997690 | 128.050439 | NaN | NaN | NaN | 32401.785320 | NaN |
min | 1520.000000 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | NaN | NaN | NaN | 112.000000 | NaN |
25% | 1724.000000 | NaN | NaN | NaN | 0.195265 | 0.115380 | 0.407460 | 194.000000 | 163.000000 | NaN | NaN | NaN | 17862.000000 | NaN |
50% | 1765.000000 | NaN | NaN | NaN | 0.264110 | 0.215100 | 0.497890 | 282.000000 | 241.000000 | NaN | NaN | NaN | 31916.000000 | NaN |
75% | 1792.000000 | NaN | NaN | NaN | 0.346150 | 0.321900 | 0.586765 | 368.000000 | 313.000000 | NaN | NaN | NaN | 78283.000000 | NaN |
max | 1866.000000 | NaN | NaN | NaN | 1.000000 | 1.000000 | 1.000000 | 2024.000000 | 1700.000000 | NaN | NaN | NaN | 900206.000000 | NaN |
20.6 Missing Data
“Black digital practice requires researchers to witness and remark on the marked and unmarked bodies, the ones that defy computation, and finds ways to hold the null vlaues up to the light.”
—Jessica Marie Johnson
20.6.1 .isna() / .notna()
Pandas has special ways of dealing with missing data. As you may have already noticed, blank rows in a CSV file show up as NaN
in a Pandas DataFrame.
To filter and count the number of missing/not missing values in a dataset, we can use the special .isna()
and .notna()
methods on a DataFrame or Series object.
'percent_women'].notna() slave_voyages_df[
0 False
1 False
2 False
3 False
4 False
...
20736 False
20737 False
20738 False
20739 False
20740 False
Name: percent_women, Length: 20741, dtype: bool
= slave_voyages_df['percent_women'].notna() not_blank_women
To explicitly count the number of blank rows, we can use the .value_counts()
method.
'percent_women'].isna().value_counts() slave_voyages_df[
True 17847
False 2894
Name: percent_women, dtype: int64
To quickly transform these numbers into percentages, we can set the normalize=
parameter to True.
'percent_women'].isna().value_counts(normalize=True) slave_voyages_df[
True 0.86047
False 0.13953
Name: percent_women, dtype: float64
Because the .count()
method always excludes NaN values, we can also count the number of values in each column and divide by the total number of rows in each column (len()
) to find the percentage of not blank data in every column.
/ len(slave_voyages_df) slave_voyages_df.count()
year_of_arrival 1.000000
flag 0.944169
place_of_purchase 0.996239
place_of_landing 1.000000
percent_women 0.139530
percent_children 0.141121
percent_men 0.139530
total_embarked 0.999084
total_disembarked 0.998939
resistance_label 0.017935
vessel_name 1.000000
captain's_name 0.935153
voyage_id 1.000000
sources 1.000000
dtype: float64
20.7 Sort Columns
We can sort a DataFrame with the .sort_values()
method, inside of which we include the parameter by=
and indicate the name of the column we want to sort by (written in quotation marks).
20.7.1 ❓ Which voyages had the highest percentage of women on board?
slave_voyages_df...
SyntaxError: invalid syntax (4198010776.py, line 1)
20.8 Calculate Columns
We can do different calculations on columns with built-in Pandas functions. These calculations will ignore NaN
values.
Pandas calculations | Explanation |
---|---|
.count() |
Number of observations |
.sum() |
Sum of values |
.mean() |
Mean of values |
.median() |
Median of values |
.min() |
Minimum |
.max() |
Maximum |
.mode() |
Mode |
.std() |
Unbiased standard deviation |
20.8.1 ❓ What was the average percentage of women aboard the voyages?
slave_voyages_df
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
20.8.2 ❓ What was the highest percentage of women aboard the voyages?
slave_voyages_df
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
20.8.3 ❓ What was the average percentage of children aboard the voyages?
slave_voyages_df
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
20.8.4 ❓ What was the highest percentage of children aboard the voyages?
slave_voyages_df
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
20.9 Rename Columns
We can rename columns with the .rename()
method and the columns=
parameter. For example, we can rename the “flag” column “national_affiliation.”
={'flag': 'national_affiliation'}) slave_voyages_df.rename(columns
year_of_arrival | national_affiliation | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | |
1 | 1525 | Portugal / Brazil | Sao Tome | Hispaniola, unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Maria de Bogoña | Monteiro, Pero | 46473 | [u'ANTT,CC,Parte II, maco 131, doc 54<><i>Inst... |
2 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11297 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
3 | 1526 | Spain / Uruguay | Cape Verde Islands | Cuba, port unspecified | NaN | NaN | NaN | 359.0 | 287.0 | NaN | Carega, Esteban (?) | 11298 | [u'Pike,60-1,172<>Pike, Ruth, <i>Enterprise</i... | |
4 | 1526 | NaN | Cape Verde Islands | Caribbean (colony unspecified) | NaN | NaN | NaN | 359.0 | 287.0 | NaN | S Anton | Leon, Juan de | 42631 | [u'Chaunus, 3: 162-63<><p>Chaunus, <em>xxxxxx<... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
20736 | 1864 | Spain / Uruguay | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 488.0 | 465.0 | NaN | Polaca | NaN | 46554 | [u'AHNM, Ultramar, Leg. 3551, 6<><i>Archivo Hi... |
20737 | 1865 | Spain / Uruguay | Africa., port unspecified | Isla de Pinas | NaN | NaN | NaN | 152.0 | 145.0 | Slave insurrection | Gato | NaN | 4394 | [u'IUP,ST,50/B/137<>Great Britain, <i>Irish Un... |
20738 | 1865 | NaN | Africa., port unspecified | Mariel | NaN | NaN | NaN | 780.0 | 650.0 | NaN | NaN | 4395 | [u'IUP,ST,50/B/144<>Great Britain, <i>Irish Un... | |
20739 | 1865 | NaN | Congo River | Cuba, port unspecified | NaN | NaN | NaN | 1265.0 | 1004.0 | NaN | Cicerón | Mesquita | 5052 | [u'IUP,ST,50/A/23-4<>Great Britain, <i>Irish U... |
20740 | 1866 | NaN | Africa., port unspecified | Cuba, port unspecified | NaN | NaN | NaN | 851.0 | 700.0 | NaN | NaN | 4998 | [u'IUP,ST,50/B/220<>Great Britain, <i>Irish Un... |
20741 rows × 14 columns
Renaming the “flag” column as above will only momentarily change that column’s name, however. If we display our DataFrame, we will see that the column name has not changed permamently.
1) slave_voyages_df.head(
year_of_arrival | flag | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... |
To save changes in the DataFrame, we need to reassign the DataFrame to the same variable.
= slave_voyages_df.rename(columns={'flag': 'national_affiliation'}) slave_voyages_df
1) slave_voyages_df.head(
year_of_arrival | national_affiliation | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... |
20.10 Groupby Columns
The Pandas function.groupby()
allows us to group data and perform calculations on the groups.
For example, Jennifer Morgan writes about how some nations recorded more information about the gender of the enslaved people aboard their voyages than other nations did. To see the breakdown of gender information by nation, we can use a .groupby()
function.
20.10.1 ❓ Which nation recorded the most information about gender?
To answer this question, you can use the .groupby()
method and the .count()
method, which will show you the number of non-blank values in each column.
slave_voyages_df...
SyntaxError: invalid syntax (4198010776.py, line 1)
“For example, patterns emerge that suggest that English slave ship captains provided the most data related to the age or sex characteristics of the captives they transported and sold into slavery…The degree to which the practice of recording the sex of the passengers on board accords to national origin raises some interesting questions about the possible correlations between certain notational and national presumptions of accountability.”
-Jennifer Morgan, “Accounting for ‘The Most Excruciating Torment’”
20.11 Add Columns
To add a column, we simply put a new column name in square brackets and set it equal to whatever we want the new column to be.
For example, if we wanted to create new columns for the total women and men aboard each voyage, we could set them equal to the product of the “total_disembarked” column * the “percent_women” / “percent_men” columns.
'total_women'] = slave_voyages_df['total_embarked'] * slave_voyages_df['percent_women'] slave_voyages_df[
'total_men'] = slave_voyages_df['total_embarked'] * slave_voyages_df['percent_men'] slave_voyages_df[
If we scroll all the way to the right side of the DataFrame, we can see that these columns have been added.
1) slave_voyages_df.head(
year_of_arrival | national_affiliation | place_of_purchase | place_of_landing | percent_women | percent_children | percent_men | total_embarked | total_disembarked | resistance_label | vessel_name | captain's_name | voyage_id | sources | total_women | total_men | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1520 | NaN | Portuguese Guinea | San Juan | NaN | NaN | NaN | 324.0 | 259.0 | NaN | NaN | 42987 | [u'AGI,Patronato 175, r.9<><p><em>AG!</em> (Se... | NaN | NaN |
20.12 Plots and Data Visualizations
'national_affiliation')['percent_women'].count() slave_voyages_df.groupby(
national_affiliation
Denmark / Baltic 119
France 708
Great Britain 1123
Netherlands 200
Other 0
Portugal / Brazil 42
Spain / Uruguay 468
U.S.A. 223
Name: percent_women, dtype: int64
= slave_voyages_df.groupby('national_affiliation')['percent_women'].count() nation_by_gender
='bar', title='Trans-Atlantic Slave Trade (Americas): \n Slave Voyages with Recorded Gender Information') nation_by_gender.plot(kind
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas): \n Slave Voyages with Recorded Gender Information'}, xlabel='national_affiliation'>
To make a time series, we would typically want to convert our date column into datetime values rather than integers. Since we’re only working with year information, however, we can make some simple time series plots just by grouping by the year column and performing calculations on those year groupings.
'year_of_arrival')['total_women'].sum() slave_voyages_df.groupby(
year_of_arrival
1520 0.0
1525 0.0
1526 0.0
1527 0.0
1532 0.0
...
1862 0.0
1863 0.0
1864 0.0
1865 0.0
1866 0.0
Name: total_women, Length: 330, dtype: float64
= slave_voyages_df.groupby('year_of_arrival')['total_women'].sum() total_women_by_year
total_women_by_year.plot()
<AxesSubplot:xlabel='year_of_arrival'>
='line', title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages") total_women_by_year.plot(kind
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
We can plot multiple columns at the same time by using two square brackets. For example, we can compare the total number of women and men aboard the voyages.
= slave_voyages_df.groupby('year_of_arrival')['total_men'].sum() total_men_by_year
='line', legend= True, title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages")
total_women_by_year.plot(kind=True) total_men_by_year.plot(legend
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
= total_women_by_year.plot(kind='line', legend= True, title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages")
axis =axis, legend=True) total_men_by_year.plot(ax
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
= total_women_by_year.plot(kind='line', label="Total Women", legend= True, title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages")
axis =axis, label="Total Men", legend=True) total_men_by_year.plot(ax
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
= slave_voyages_df.groupby('year_of_arrival')['total_embarked'].sum() total_embarked_by_year
='line', label="Total Women", legend= True, title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages")
total_women_by_year.plot(kind="Total Men", legend=True)
total_men_by_year.plot(label="Total Embarked", legend=True) total_embarked_by_year.plot(label
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
= total_women_by_year.plot(kind='line', label="Total Women", legend= True, title="Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages")
axis =axis, label="Total Men", legend=True)
total_men_by_year.plot(ax=axis, label="Total Embarked", legend=True) total_embarked_by_year.plot(ax
<AxesSubplot:title={'center':'Trans-Atlantic Slave Trade (Americas):\nTotal Number of Enslaved Women on Voyages'}, xlabel='year_of_arrival'>
If there is anything wrong, please open an issue on GitHub or email f.pianzola@rug.nl