18  Pandas Basics Part 1 — Workbook

Author

Melanie Walsh

If you want to save your work from this notebook, you should be sure to make a copy of it on your own computer.

In this workbook, we’re going to explore the basics of the Python library Pandas.

18.1 Import Pandas

To use the Pandas library, we first need to import it.

import pandas as pd

18.2 Change Display Settings

By default, Pandas will display 60 rows and 20 columns. I often change Pandas’ default display settings to show more rows or columns.

pd.options.display.max_rows = 200

18.3 Get Data

To read in a CSV file, we will use the method pd.read_csv() and insert the name of our desired file path.

pd.read_csv('Bellevue_Almshouse_Dataset.csv')
date_in first_name last_name full_name age gender disease profession children sent_to sender1 sender2
0 1847-04-17 Mary Gallagher Mary Gallagher 28.0 f recent emigrant married Child Alana 10 days Hospital superintendent hd. gibbens
1 1847-04-08 John Sanin (?) John Sanin (?) 19.0 m recent emigrant laborer Catherine 2 mo NaN george w. anderson edward witherell
2 1847-04-17 Anthony Clark Anthony Clark 60.0 m recent emigrant laborer Charles Riley afed 10 days Hospital george w. anderson edward witherell
3 1847-04-08 Lawrence Feeney Lawrence Feeney 32.0 m recent emigrant laborer Child NaN george w. anderson james donnelly
4 1847-04-13 Henry Joyce Henry Joyce 21.0 m recent emigrant NaN Child 1 mo NaN george w. anderson edward witherell
... ... ... ... ... ... ... ... ... ... ... ... ...
9593 1846-05-23 Joseph Aton Joseph Aton 69.0 m NaN shoemaker NaN NaN [blank] NaN
9594 1847-06-17 Mary Smith Mary Smith 47.0 f NaN NaN NaN Hospital Ward 38 [blank] NaN
9595 1847-06-22 Francis Riley Francis Riley 29.0 m lame superintendent NaN NaN [blank] NaN
9596 1847-07-02 Martin Dunn Martin Dunn 4.0 m NaN NaN NaN NaN [blank] NaN
9597 1847-07-08 Elizabeth Post Elizabeth Post 32.0 f NaN NaN NaN Hospital [blank] NaN

9598 rows × 12 columns

type(pd.read_csv('Bellevue_Almshouse_Dataset.csv'))
pandas.core.frame.DataFrame

This creates a Pandas DataFrame object, one of the two main data structures in Pandas. A DataFrame looks and acts a lot like a spreadsheet, but it has special powers and functions that we will discuss below and in the next few lessons.

Pandas objects Explanation
DataFrame Like a spreadsheet, 2-dimensional
Series Like a column, 1-dimensional

We assign the DataFrame to a variable called bellevue_df. It is common convention to name DataFrame variables df, but we want to be a bit more specific.

bellevue_df = pd.read_csv('Bellevue_Almshouse_Dataset.csv')

18.4 Begin to Examine Patterns

18.4.1 Select Columns as Series Objects []

To select a column from the DataFrame, we will type the name of the DataFrame followed by square brackets and a column name in quotations marks.

bellevue_df['age']
0       28.0
1       19.0
2       60.0
3       32.0
4       21.0
        ... 
9593    69.0
9594    47.0
9595    29.0
9596     4.0
9597    32.0
Name: age, Length: 9598, dtype: float64

Technically, a single column in a DataFrame is a Series object.

type(bellevue_df['age'])
pandas.core.series.Series

18.5 Pandas Methods

Pandas method Explanation
.sum() Sum of values
.mean() Mean of values
.median() Median of values
.min() Minimum
.max() Maximum
.mode() Mode
.std() Unbiased standard deviation
.count() Total number of non-blank values
.value_counts() Frequency of unique values

18.5.1 ❓ How old (on average) were the people admitted to the Bellevue Almshouse?

bellevue_df['age']
0       28.0
1       19.0
2       60.0
3       32.0
4       21.0
        ... 
9593    69.0
9594    47.0
9595    29.0
9596     4.0
9597    32.0
Name: age, Length: 9598, dtype: float64

18.5.2 ❓ How old was the oldest person admitted to Bellevue?

bellevue_df['age']
0       28.0
1       19.0
2       60.0
3       32.0
4       21.0
        ... 
9593    69.0
9594    47.0
9595    29.0
9596     4.0
9597    32.0
Name: age, Length: 9598, dtype: float64

18.5.3 ❓ How young was the youngest person?

bellevue_df['age']
0       28.0
1       19.0
2       60.0
3       32.0
4       21.0
        ... 
9593    69.0
9594    47.0
9595    29.0
9596     4.0
9597    32.0
Name: age, Length: 9598, dtype: float64

18.5.4 ❓ What were the most common professions among these Irish immigrants?

To count the values in a column, we can use the .value_counts() method.

What patterns do you notice in this list? What seems strange to you? What can we learn about the people in the dataset and the people who created the dataset?

bellevue_df['profession'].value_counts()
laborer                        3116
married                        1586
spinster                       1522
widow                          1055
shoemaker                       158
tailor                          116
blacksmith                      104
mason                            99
weaver                           66
carpenter                        65
baker                            48
waiter                           41
clerk                            28
stone cutter                     27
painter                          26
gardener                         25
cooper                           24
farmer                           21
peddler                          20
cartman                          15
wheelwright                      14
hostler                          12
hatter                           12
printer                          11
butcher                          11
tinsmith                         10
boot maker                       10
teacher                          10
coachman                         10
sailor                           10
servant                           9
boiler maker                      9
harness maker                     9
cabinet maker                     8
boatman                           8
nailer                            8
porter                            6
plasterer                         6
nail maker                        6
(illegible)                       6
umbrella maker                    6
seaman                            5
soldier                           5
marble polisher                   5
grocer                            5
machinist                         5
brick layer                       4
ship carpenter                    4
barkeeper                         4
book maker                        4
hackman                           4
turner                            4
dyer                              4
tinman                            4
miner                             3
merchant                          3
starch maker                      3
courier                           3
seamstress                        3
saddler                           3
varnisher                         3
locksmith                         3
brass founder                     3
pavier                            3
store keeper                      3
brewer                            3
paper stainer                     3
lawyer                            2
barker                            2
morocco dresser                   2
cab driver                        2
carver                            2
glass cutter                      2
copper smith                      2
barber                            2
engraver                          2
engineer                          2
food carrier                      2
quarry man                        2
rigger                            2
stevedore                         2
single                            2
sail maker                        2
stove maker                       2
shipwright                        2
type caster                       2
glove maker                       2
tanner                            2
sawyer                            2
calico printer                    2
paper maker                       2
chair maker                       2
gw anderson per e witherell       1
drayman                           1
groom                             1
plumber                           1
soap boiler                       1
paner                             1
wool manufacturer                 1
joiner                            1
hodman                            1
leather draper                    1
flag cutter                       1
rectifier                         1
parrier                           1
teamster                          1
clery                             1
book seller                       1
manufacturer                      1
wood sawyer                       1
auctioneer                        1
dugget                            1
mariner                           1
cook                              1
polisher                          1
sham                              1
tavern keeper                     1
surveyor                          1
ship sawyer                       1
oysterman                         1
hacker                            1
moulder                           1
magician                          1
builder                           1
soapmaker                         1
jeweller                          1
strap maker                       1
upholsterer                       1
book keeper                       1
school teacher                    1
shop keeper                       1
iron moulder                      1
glover                            1
flagger                           1
book binder                       1
apothecary                        1
saw maker                         1
marble cutter                     1
jobber                            1
flag pavier                       1
musician                          1
marble sawyer                     1
leather dresser                   1
chandler                          1
paper-carrier                     1
book pedlar                       1
cabrener                          1
miniature painter                 1
cabman                            1
truss maker                       1
marketmab                         1
copoper                           1
stage driver                      1
brass turner                      1
fishman                           1
brush maker                       1
soap comber                       1
manow(?)                          1
waterman                          1
cloth printer                     1
stone sawyer                      1
schoolmaster                      1
dancing master                    1
music painter                     1
croper                            1
cotton sampler                    1
gas fitter                        1
gas manufacturer                  1
caulker                           1
basket maker                      1
gun maker                         1
superintendent                    1
Name: profession, dtype: int64

18.5.5 ❓ What are the most common diseases?

bellevue_df['disease'].value_counts()
sickness           2710
recent emigrant    1975
destitution         846
fever               192
insane              138
pregnant            134
sore                 79
intemperance         71
illegible            47
typhus               46
injuries             32
ulcers               26
ophthalmia           19
vagrant              17
lame                 15
debility             12
rheumatism           11
blind                 9
bronchitis            9
dropsy                8
phthisis              8
syphilis              7
old age               7
dysentery             6
erysipelas            6
diarrhea              6
cripple               5
broken bone           5
measles               3
drunkenness           3
burn                  3
delusion dreams       2
scrofula              2
tuberculosis          2
pneumonia             2
fits                  2
abandonment           2
piles                 2
sprain                2
jaundice              2
scarletina            2
phagadaena            1
spinal disease        1
tumor                 1
smallpox              1
horrors               1
hernia                1
paralysis             1
abscess               1
neuralgia             1
hypochondria          1
ungovernable          1
from trial            1
sunburn               1
colic                 1
orchitis              1
beggar                1
contusion             1
rickets               1
ascites               1
cut                   1
deaf                  1
congested head        1
eczema                1
bruise                1
severed limb          1
emotional             1
poorly                1
disabled              1
bleeding              1
seizure               1
del femur             1
throat cut            1
ague                  1
asthma                1
Name: disease, dtype: int64

18.5.6 ❓ Where were most people sent?

bellevue_df['sent_to'].value_counts()
Hospital                             3882
Blackwell's Island                    571
Bellevue Garret                       250
Randall's Island                      172
Shanty                                109
Lunatic Asylum                         93
CHECK                                  90
Bellevue Hospital Chapel               78
Almshouse                              64
Hospital Ward 38                       62
Long Island Farms                      45
Hospital Ward 46                       42
Lunatic Asylum (?)                     37
Hospital Ward 18                       23
Blackwell's Island Ward 38             14
Hospital Ward 39                       11
Hospital Ward 16                       11
Hospital Blackwell's Island            10
Hospital on Blackwell's Island         10
Lunatic Asylum Ward 38                  5
Hospital Ward 13                        5
Hosptial Ward 17                        5
Hospital Ward 11                        4
Hospital Ward 17                        3
Hospital Morgue                         3
Hospital Ward 22                        3
Hospital Ward 9                         3
Almshouse on Blackwell's Island         2
Hospital Ward 45                        2
Children's Home                         2
Hospital Ward 6                         2
Blackwell's Island Ward 39              2
Lunatic Asylum Ward 28                  2
Hospital Ward 32                        2
Hospital Ward 42                        2
Hospital Ward 24                        2
Hospital Ward 12                        2
Shanty 38                               2
Blackwell's Island Workhouse (?)        1
Lunatic Asylum Ward 5                   1
Blackwell's Island Ward 19              1
Blackwell's Island Ward 7               1
Blackwell's Island Ward 8               1
Randall's Island Ward 8                 1
Hospital Women's Ward                   1
Hospital Ward 5                         1
Lunatic Asylum Blackwell's Island       1
Blackwell's Island Ward 17              1
Shanty 6                                1
Randall's Island Ward 38                1
Hospital Ward 28                        1
Hospital Ward 34                        1
Hosital Ward 45                         1
Blackwell's Island Ward 11              1
Hospital Ward 26                        1
Blackwell's Island Ward 18              1
Blackwell's Island Ward 5               1
Blackwell's Island Shanty               1
Blackwell's Island Ward 10              1
Hospital Ward 36                        1
Hospital Ward 35                        1
Hospital Ward 21                        1
Hospital Ward 15                        1
Blackwell's Island Ward 16              1
Women's Hospital                        1
Almshouse Hospital                      1
Blackwell's Island Ward 4               1
Hospital Ward 29                        1
Randall's Island Ward 17                1
Shanty 18                               1
Shanty 1                                1
Randall's Island Ward 6                 1
Blackwell's Island Ward 35              1
Hospital Ward 30                        1
Lunatic Asylum Ward 18                  1
Lunatic Asylum Ward 16                  1
Shanty 8                                1
Name: sent_to, dtype: int64

18.6 Examine Subsets

18.6.1 ❓ Why were people being sent to Hostpital Ward 38?

To explore this question, we can filter rows with a condition.

bellevue_df['sent_to'] == 'Hospital Ward 38'
0       False
1       False
2       False
3       False
4       False
        ...  
9593    False
9594     True
9595    False
9596    False
9597    False
Name: sent_to, Length: 9598, dtype: bool
bellevue_df[bellevue_df['sent_to'] == 'Hospital Ward 38']
date_in first_name last_name full_name age gender disease profession children sent_to sender1 sender2
249 1847-05-17 Elizabeth Cauley Elizabeth Cauley 24.0 f recent emigrant married Son Walter 4 mo Hospital Ward 38 moses g. leonard peter c. johnston
330 1847-03-22 Sarah Corrigan Sarah Corrigan 21.0 f recent emigrant married NaN Hospital Ward 38 george w. anderson peter c. johnston
367 1847-06-13 Bridget Reynolds Bridget Reynolds 20.0 f pregnant married NaN Hospital Ward 38 moses g. leonard oscar s. field
499 1847-06-09 Rose Dinns Rose Dinns 22.0 f pregnant married NaN Hospital Ward 38 moses g. leonard peter c. johnston
698 1847-08-21 Bridget Redding Bridget Redding 25.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
1041 1847-03-13 Betty Dunn Betty Dunn 34.0 f recent emigrant married NaN Hospital Ward 38 george w. anderson peter c. johnston
1290 1847-02-17 Catherine Doherty Catherine Doherty 34.0 f recent emigrant widow NaN Hospital Ward 38 george w. anderson NaN
1426 1847-05-31 Catherine Riley Catherine Riley 28.0 f recent emigrant married NaN Hospital Ward 38 moses g. leonard peter c. johnston
1494 1847-06-14 Brigt Frikee? Brigt Frikee? 25.0 f recent emigrant married NaN Hospital Ward 38 moses g. leonard peter c. johnston
1584 1847-10-28 Ellen Lamb Ellen Lamb 20.0 f recent emigrant married NaN Hospital Ward 38 william w. lyons NaN
1685 1847-03-09 Catherine McManus Catherine McManus 21.0 f recent emigrant widow NaN Hospital Ward 38 george w. anderson james donnelly
1746 1847-05-08 Ellen O'Brien Ellen O'Brien 20.0 f pregnant spinster NaN Hospital Ward 38 george w. anderson peter c. johnston
1863 1847-06-01 Bridget Nash Bridget Nash 30.0 f recent emigrant married NaN Hospital Ward 38 moses g. leonard peter c. johnston
1960 1847-05-13 Mary Gallagher Mary Gallagher 18.0 f pregnant married NaN Hospital Ward 38 george w. anderson peter c. johnston
2370 1847-02-13 Eliza Latimer Eliza Latimer 19.0 f recent emigrant married NaN Hospital Ward 38 george w. anderson NaN
5194 1847-02-22 Mary Toohey Mary Toohey 34.0 f pregnant married NaN Hospital Ward 38 george w. anderson peter c. johnston
5286 1847-03-16 Mary Mullen Mary Mullen 27.0 f pregnant married NaN Hospital Ward 38 george w. anderson james donnelly
5309 1847-03-22 Margaret Welsh Margaret Welsh 24.0 f pregnant married NaN Hospital Ward 38 george w. anderson benson s. hopkins
5310 1847-03-22 Elizabeth McDonald Elizabeth McDonald 26.0 f pregnant married NaN Hospital Ward 38 george w. anderson edward witherell
5321 1847-03-24 Mary Jane Stevens Mary Jane Stevens 30.0 f pregnant spinster NaN Hospital Ward 38 george w. anderson edward witherell
5445 1847-04-15 Alice Duffy Alice Duffy 28.0 f pregnant spinster NaN Hospital Ward 38 george w. anderson peter c. johnston
5629 1847-05-10 Jane McCann Jane McCann 22.0 f pregnant spinster NaN Hospital Ward 38 george w. anderson peter c. johnston
5636 1847-05-10 Susan Patterson Susan Patterson 20.0 f pregnant spinster NaN Hospital Ward 38 george w. anderson peter c. johnston
5777 1847-08-11 Bridget Grady Bridget Grady 35.0 f pregnant spinster NaN Hospital Ward 38 thomas b. tappen NaN
5935 1847-07-15 Ellen Connolly Ellen Connolly 25.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
6003 1847-07-19 Mary Kelly Mary Kelly 27.0 f sickness married NaN Hospital Ward 38 william w. lyons charles j. sutton
6051 1847-07-22 Ann Clark Ann Clark 23.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
6052 1847-07-22 Mary Diffen Mary Diffen 32.0 f pregnant widow NaN Hospital Ward 38 william w. lyons NaN
6053 1847-07-22 Mary Ann Williamson Mary Ann Williamson 50.0 f pregnant widow NaN Hospital Ward 38 william w. lyons NaN
6102 1847-07-25 Margaret McCabe Margaret McCabe 40.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
6265 1847-08-06 Elizabeth Wilkinson Elizabeth Wilkinson 13.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
7140 1847-10-20 Bridget Riley Bridget Riley 18.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7209 1847-10-27 Ann Rourke Ann Rourke 20.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7221 1847-10-28 Catherine McDermott Catherine McDermott 28.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
7280 1847-11-02 Ellen Sweeney Ellen Sweeney 20.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7297 1847-11-04 Mary Smith Mary Smith 30.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7312 1847-11-05 Bridget Campbell Bridget Campbell 23.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7338 1847-11-08 Bridget Laughlin Bridget Laughlin 21.0 f pregnant spinster NaN Hospital Ward 38 william w. lyons NaN
7347 1847-11-09 Catherine Gillespie Catherine Gillespie 34.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
7417 1847-11-16 Eliza Martin Eliza Martin 30.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
7470 1847-11-20 Catherine O'Brien Catherine O'Brien 29.0 f pregnant married NaN Hospital Ward 38 william w. lyons NaN
8009 1847-06-15 Hannah Keatherson Hannah Keatherson 14.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8100 1847-02-12 Mary McGrath Mary McGrath 26.0 f pregnant widow NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8101 1847-02-18 Jane Smith Jane Smith 25.0 f pregnant spinster NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8125 1847-04-19 Mary Smith Mary Smith 23.0 f pregnant spinster NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8129 1847-04-22 Ellen McCalu Ellen McCalu 24.0 f pregnant seamstress NaN Hospital Ward 38 moses g. leonard commissioners of emigration
8134 1847-05-08 Judah Fallen Judah Fallen 30.0 f pregnant widow NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8180 1847-05-15 Catherine Seele Catherine Seele 25.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8203 1847-05-18 Mary Andrews Mary Andrews 26.0 f pregnant married NaN Hospital Ward 38 moses g. leonard peter c. johnston
8285 1847-05-26 Ellen McCake Ellen McCake 23.0 f pregnant spinster NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8303 1847-05-27 Sarah Campbell Sarah Campbell 22.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8364 1847-06-01 Bridget Keanan Bridget Keanan 20.0 f pregnant married NaN Hospital Ward 38 moses g. leonard benson s. hopkins
8379 1847-06-02 Sarah Cormick Sarah Cormick 46.0 f destitution widow NaN Hospital Ward 38 moses g. leonard edward witherell
8410 1847-06-04 Mary Hart Mary Hart 37.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8424 1847-06-05 Ann Smith Ann Smith 24.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8479 1847-06-11 Mary McMagh Mary McMagh 28.0 f pregnant spinster NaN Hospital Ward 38 moses g. leonard oscar s. field
8498 1847-06-13 Mary Henry Mary Henry 27.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
8506 1847-06-14 Mary Keeler Mary Keeler 44.0 f pregnant cook NaN Hospital Ward 38 moses g. leonard oscar s. field
8541 1847-06-17 Mary Smith Mary Smith 23.0 f pregnant NaN NaN Hospital Ward 38 moses g. leonard peter c. johnston
8581 1847-06-19 Martha McConnelly Martha McConnelly 35.0 f pregnant spinster NaN Hospital Ward 38 moses g. leonard peter c. johnston
8714 1847-06-26 Jane Davis Jane Davis 18.0 f pregnant married NaN Hospital Ward 38 moses g. leonard edward witherell
9594 1847-06-17 Mary Smith Mary Smith 47.0 f NaN NaN NaN Hospital Ward 38 [blank] NaN

18.7 ❓ What data is missing? What data do you wish we had?

18.8 Overview

Generate descriptive statistics for all the columns in the data

bellevue_df.describe()
age
count 9548.000000
mean 30.337039
std 14.179527
min 0.080000
25% 21.000000
50% 28.000000
75% 39.000000
max 97.000000
bellevue_df.describe(include='all')
date_in first_name last_name full_name age gender disease profession children sent_to sender1 sender2
count 9598 9594 9598 9598 9548.000000 9598 6509 8579 37 5666 9521 5212
unique 653 523 3159 7308 NaN 5 75 172 36 77 59 80
top 1847-05-24 Mary Kelly Mary Smith NaN m sickness laborer Child Hospital george w. anderson peter c. johnston
freq 113 979 137 21 NaN 4967 2710 3116 2 3882 3469 1666
mean NaN NaN NaN NaN 30.337039 NaN NaN NaN NaN NaN NaN NaN
std NaN NaN NaN NaN 14.179527 NaN NaN NaN NaN NaN NaN NaN
min NaN NaN NaN NaN 0.080000 NaN NaN NaN NaN NaN NaN NaN
25% NaN NaN NaN NaN 21.000000 NaN NaN NaN NaN NaN NaN NaN
50% NaN NaN NaN NaN 28.000000 NaN NaN NaN NaN NaN NaN NaN
75% NaN NaN NaN NaN 39.000000 NaN NaN NaN NaN NaN NaN NaN
max NaN NaN NaN NaN 97.000000 NaN NaN NaN NaN NaN NaN NaN

Generate information about all the columns in the data

bellevue_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9598 entries, 0 to 9597
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date_in     9598 non-null   object 
 1   first_name  9594 non-null   object 
 2   last_name   9598 non-null   object 
 3   full_name   9598 non-null   object 
 4   age         9548 non-null   float64
 5   gender      9598 non-null   object 
 6   disease     6509 non-null   object 
 7   profession  8579 non-null   object 
 8   children    37 non-null     object 
 9   sent_to     5666 non-null   object 
 10  sender1     9521 non-null   object 
 11  sender2     5212 non-null   object 
dtypes: float64(1), object(11)
memory usage: 899.9+ KB

Make a histogram of the DataFrame

bellevue_df.hist()
array([[<AxesSubplot:title={'center':'age'}>]], dtype=object)

If there is anything wrong, please open an issue on GitHub or email f.pianzola@rug.nl