Chapter 6 Basics of SQL Language

6.1 The SQL language

The acronym SQL stands for Structured Query Language. It is the most used language worldwide for relational database management systems. Whenever you interact with a database in your daily life (for example, any time you browse an online shopping website, or any time you look up something on StackOverflow!), there’s a good chance it was written in SQL.

SQL is a language that is used and understood by different database management systems, such as Oracle, PostgreSQL, MySQL, or SQLite. While all these programs share many similarities, which one of these you’ll end up using depends on the specifics of what you need it for. For the purposes of this class, we’ll be using SQLite. SQLite provides all the core functionalities you’ll need to get familiar with databases while being lightweight in terms of setup (easy to install and to configure). At the end of this chapter, we’ll talk about why you may want to make a different choice in the future, but for now we’ll stick with SQLite.

Each database management program uses a slightly different dialect of SQL. The language is the same, but there may be some slight differences in the syntax between different programs. I’ll point out throughout the chapter when we encounter syntax that is specific to the SQLite dialect.

6.2 Writing SQL queries

The nice thing about SQL is that it sounds a lot like human language. An SQL statement, or query, always begins with a verb that describes the action we want to do. A semicolon at the end of the query lets SQL know that we are done talking. For example:

SELECT dragon_id FROM dragons;
##     dragon_id
## 1          D1
## 2         D10
## 3        D100
## 4        D101
## 5        D102
## 6        D103
## 7        D104
## 8        D105
## 9        D106
## 10       D107
## 11       D108
## 12       D109
## 13        D11
## 14       D110
## 15       D111
## 16       D112
## 17       D113
## 18       D114
## 19       D115
## 20       D116
## 21       D117
## 22       D118
## 23       D119
## 24        D12
## 25       D120
## 26       D121
## 27       D122
## 28       D123
## 29       D124
## 30       D125
## 31       D126
## 32       D127
## 33       D128
## 34       D129
## 35        D13
## 36       D130
## 37       D131
## 38       D132
## 39       D133
## 40       D134
## 41       D135
## 42       D136
## 43       D137
## 44       D138
## 45       D139
## 46        D14
## 47       D140
## 48       D141
## 49       D142
## 50       D143
## 51       D144
## 52       D145
## 53       D146
## 54       D147
## 55       D148
## 56       D149
## 57        D15
## 58       D150
## 59       D151
## 60       D152
## 61       D153
## 62       D154
## 63       D155
## 64       D156
## 65       D157
## 66       D158
## 67       D159
## 68        D16
## 69       D160
## 70       D161
## 71       D162
## 72       D163
## 73       D164
## 74       D165
## 75       D166
## 76       D167
## 77       D168
## 78       D169
## 79        D17
## 80       D170
## 81       D171
## 82       D172
## 83       D173
## 84       D174
## 85       D175
## 86       D176
## 87       D177
## 88       D178
## 89       D179
## 90        D18
## 91       D180
## 92       D181
## 93       D182
## 94       D183
## 95       D184
## 96       D185
## 97       D186
## 98       D187
## 99       D188
## 100      D189
## 101       D19
## 102      D190
## 103      D191
## 104      D192
## 105      D193
## 106      D194
## 107      D195
## 108      D196
## 109      D197
## 110      D198
## 111      D199
## 112        D2
## 113       D20
## 114      D200
## 115      D201
## 116      D202
## 117      D203
## 118      D204
## 119      D205
## 120      D206
## 121      D207
## 122      D208
## 123      D209
## 124       D21
## 125      D210
## 126      D211
## 127      D212
## 128      D213
## 129      D214
## 130      D215
## 131      D216
## 132      D217
## 133      D218
## 134      D219
## 135       D22
## 136      D220
## 137      D221
## 138      D222
## 139      D223
## 140      D224
## 141      D225
## 142      D226
## 143      D227
## 144      D228
## 145      D229
## 146       D23
## 147      D230
## 148      D231
## 149      D232
## 150      D233
## 151      D234
## 152      D235
## 153      D236
## 154      D237
## 155      D238
## 156      D239
## 157       D24
## 158      D240
## 159      D241
## 160      D242
## 161      D243
## 162      D244
## 163      D245
## 164      D246
## 165      D247
## 166      D248
## 167      D249
## 168       D25
## 169      D250
## 170      D251
## 171      D252
## 172      D253
## 173      D254
## 174      D255
## 175      D256
## 176      D257
## 177      D258
## 178      D259
## 179       D26
## 180      D260
## 181      D261
## 182      D262
## 183      D263
## 184      D264
## 185      D265
## 186      D266
## 187      D267
## 188      D268
## 189      D269
## 190       D27
## 191      D270
## 192      D271
## 193      D272
## 194      D273
## 195      D274
## 196      D275
## 197      D276
## 198      D277
## 199      D278
## 200      D279
## 201       D28
## 202      D280
## 203      D281
## 204      D282
## 205      D283
## 206      D284
## 207      D285
## 208      D286
## 209      D287
## 210      D288
## 211      D289
## 212       D29
## 213      D290
## 214      D291
## 215      D292
## 216      D293
## 217      D294
## 218      D295
## 219      D296
## 220      D297
## 221      D298
## 222      D299
## 223        D3
## 224       D30
## 225      D300
## 226      D301
## 227      D302
## 228      D303
## 229      D304
## 230      D305
## 231      D306
## 232      D307
## 233      D308
## 234      D309
## 235       D31
## 236      D310
## 237      D311
## 238      D312
## 239      D313
## 240      D314
## 241      D315
## 242      D316
## 243      D317
## 244      D318
## 245      D319
## 246       D32
## 247      D320
## 248      D321
## 249      D322
## 250      D323
## 251      D324
## 252      D325
## 253      D326
## 254      D327
## 255      D328
## 256      D329
## 257       D33
## 258      D330
## 259      D331
## 260      D332
## 261      D333
## 262      D334
## 263      D335
## 264      D336
## 265      D337
## 266      D338
## 267      D339
## 268       D34
## 269      D340
## 270      D341
## 271      D342
## 272      D343
## 273      D344
## 274      D345
## 275      D346
## 276      D347
## 277      D348
## 278      D349
## 279       D35
## 280      D350
## 281      D351
## 282      D352
## 283      D353
## 284      D354
## 285      D355
## 286      D356
## 287      D357
## 288      D358
## 289      D359
## 290       D36
## 291      D360
## 292      D361
## 293      D362
## 294      D363
## 295      D364
## 296      D365
## 297      D366
## 298      D367
## 299      D368
## 300      D369
## 301       D37
## 302      D370
## 303      D371
## 304      D372
## 305      D373
## 306      D374
## 307      D375
## 308      D376
## 309      D377
## 310      D378
## 311      D379
## 312       D38
## 313      D380
## 314      D381
## 315      D382
## 316      D383
## 317      D384
## 318      D385
## 319      D386
## 320      D387
## 321      D388
## 322      D389
## 323       D39
## 324      D390
## 325      D391
## 326      D392
## 327      D393
## 328      D394
## 329      D395
## 330      D396
## 331      D397
## 332      D398
## 333      D399
## 334        D4
## 335       D40
## 336      D400
## 337      D401
## 338      D402
## 339      D403
## 340      D404
## 341      D405
## 342      D406
## 343      D407
## 344      D408
## 345      D409
## 346       D41
## 347      D410
## 348      D411
## 349      D412
## 350      D413
## 351      D414
## 352      D415
## 353      D416
## 354      D417
## 355      D418
## 356      D419
## 357       D42
## 358      D420
## 359      D421
## 360      D422
## 361      D423
## 362      D424
## 363      D425
## 364      D426
## 365      D427
## 366      D428
## 367      D429
## 368       D43
## 369      D430
## 370      D431
## 371      D432
## 372      D433
## 373      D434
## 374      D435
## 375      D436
## 376      D437
## 377      D438
## 378      D439
## 379       D44
## 380      D440
## 381      D441
## 382      D442
## 383      D443
## 384      D444
## 385      D445
## 386      D446
## 387      D447
## 388      D448
## 389      D449
## 390       D45
## 391      D450
## 392      D451
## 393      D452
## 394      D453
## 395      D454
## 396      D455
## 397      D456
## 398      D457
## 399      D458
## 400      D459
## 401       D46
## 402      D460
## 403      D461
## 404      D462
## 405      D463
## 406      D464
## 407      D465
## 408      D466
## 409      D467
## 410      D468
## 411      D469
## 412       D47
## 413      D470
## 414      D471
## 415      D472
## 416      D473
## 417      D474
## 418      D475
## 419      D476
## 420      D477
## 421      D478
## 422      D479
## 423       D48
## 424      D480
## 425      D481
## 426      D482
## 427      D483
## 428      D484
## 429      D485
## 430      D486
## 431      D487
## 432      D488
## 433      D489
## 434       D49
## 435      D490
## 436      D491
## 437      D492
## 438      D493
## 439      D494
## 440      D495
## 441      D496
## 442      D497
## 443      D498
## 444      D499
## 445        D5
## 446       D50
## 447      D500
## 448       D51
## 449       D52
## 450       D53
## 451       D54
## 452       D55
## 453       D56
## 454       D57
## 455       D58
## 456       D59
## 457        D6
## 458       D60
## 459       D61
## 460       D62
## 461       D63
## 462       D64
## 463       D65
## 464       D66
## 465       D67
## 466       D68
## 467       D69
## 468        D7
## 469       D70
## 470       D71
## 471       D72
## 472       D73
## 473       D74
## 474       D75
## 475       D76
## 476       D77
## 477       D78
## 478       D79
## 479        D8
## 480       D80
## 481       D81
## 482       D82
## 483       D83
## 484       D84
## 485       D85
## 486       D86
## 487       D87
## 488       D88
## 489       D89
## 490        D9
## 491       D90
## 492       D91
## 493       D92
## 494       D93
## 495       D94
## 496       D95
## 497       D96
## 498       D97
## 499       D98
## 500       D99
## 501      D999

The verb SELECT is one we’ll be using over and over again. Any time you want to select data you also have to specify which table you want it from, so the SELECT clause is always followed by a FROM clause. In this case, we asked to see the column dragon_id from the dragons table. Sounds like English, right?

If we want to see multiple columns, we can list them separated by commas:

SELECT dragon_id, sex 
FROM dragons;
##     dragon_id  sex
## 1          D1    F
## 2          D2 <NA>
## 3          D3    F
## 4          D4    F
## 5          D5 <NA>
## 6          D6    F
## 7          D7    M
## 8          D8 <NA>
## 9          D9    F
## 10        D10    F
## 11        D11    F
## 12        D12    F
## 13        D13 <NA>
## 14        D14 <NA>
## 15        D15 <NA>
## 16        D16    F
## 17        D17    F
## 18        D18    F
## 19        D19    F
## 20        D20    M
## 21        D21    F
## 22        D22 <NA>
## 23        D23    M
## 24        D24 <NA>
## 25        D25    M
## 26        D26 <NA>
## 27        D27    F
## 28        D28    M
## 29        D29    F
## 30        D30    F
## 31        D31 <NA>
## 32        D32    M
## 33        D33    F
## 34        D34    F
## 35        D35    F
## 36        D36 <NA>
## 37        D37    F
## 38        D38    F
## 39        D39 <NA>
## 40        D40 <NA>
## 41        D41 <NA>
## 42        D42    M
## 43        D43    F
## 44        D44    M
## 45        D45    F
## 46        D46    M
## 47        D47    F
## 48        D48    M
## 49        D49 <NA>
## 50        D50    F
## 51        D51    F
## 52        D52    M
## 53        D53 <NA>
## 54        D54    F
## 55        D55 <NA>
## 56        D56    F
## 57        D57    F
## 58        D58    M
## 59        D59    F
## 60        D60    F
## 61        D61    M
## 62        D62    F
## 63        D63    F
## 64        D64 <NA>
## 65        D65 <NA>
## 66        D66 <NA>
## 67        D67    M
## 68        D68    M
## 69        D69    F
## 70        D70    F
## 71        D71    M
## 72        D72 <NA>
## 73        D73    F
## 74        D74 <NA>
## 75        D75    M
## 76        D76    M
## 77        D77 <NA>
## 78        D78 <NA>
## 79        D79    M
## 80        D80    F
## 81        D81 <NA>
## 82        D82    F
## 83        D83 <NA>
## 84        D84    M
## 85        D85    F
## 86        D86    F
## 87        D87    M
## 88        D88    M
## 89        D89    F
## 90        D90    M
## 91        D91    F
## 92        D92    M
## 93        D93    M
## 94        D94    M
## 95        D95    M
## 96        D96    F
## 97        D97 <NA>
## 98        D98 <NA>
## 99        D99    F
## 100      D100 <NA>
## 101      D101 <NA>
## 102      D102    F
## 103      D103    F
## 104      D104    F
## 105      D105    M
## 106      D106 <NA>
## 107      D107 <NA>
## 108      D108    F
## 109      D109    M
## 110      D110    F
## 111      D111    M
## 112      D112    F
## 113      D113 <NA>
## 114      D114 <NA>
## 115      D115    F
## 116      D116    F
## 117      D117 <NA>
## 118      D118 <NA>
## 119      D119    F
## 120      D120    F
## 121      D121 <NA>
## 122      D122    F
## 123      D123 <NA>
## 124      D124    M
## 125      D125 <NA>
## 126      D126    M
## 127      D127    M
## 128      D128 <NA>
## 129      D129    M
## 130      D130 <NA>
## 131      D131 <NA>
## 132      D132    F
## 133      D133    F
## 134      D134    F
## 135      D135 <NA>
## 136      D136    F
## 137      D137    F
## 138      D138    F
## 139      D139    F
## 140      D140 <NA>
## 141      D141    F
## 142      D142    M
## 143      D143    F
## 144      D144    M
## 145      D145    M
## 146      D146    M
## 147      D147 <NA>
## 148      D148 <NA>
## 149      D149    M
## 150      D150    F
## 151      D151 <NA>
## 152      D152    F
## 153      D153    M
## 154      D154    F
## 155      D155    F
## 156      D156    M
## 157      D157    F
## 158      D158 <NA>
## 159      D159 <NA>
## 160      D160    F
## 161      D161    F
## 162      D162    F
## 163      D163 <NA>
## 164      D164 <NA>
## 165      D165    M
## 166      D166 <NA>
## 167      D167    M
## 168      D168    F
## 169      D169    F
## 170      D170 <NA>
## 171      D171    M
## 172      D172    M
## 173      D173    F
## 174      D174 <NA>
## 175      D175    M
## 176      D176    F
## 177      D177    F
## 178      D178    F
## 179      D179 <NA>
## 180      D180    F
## 181      D181    F
## 182      D182    F
## 183      D183    F
## 184      D184    F
## 185      D185    M
## 186      D186 <NA>
## 187      D187 <NA>
## 188      D188    M
## 189      D189    F
## 190      D190    F
## 191      D191    M
## 192      D192    M
## 193      D193    M
## 194      D194 <NA>
## 195      D195    F
## 196      D196    F
## 197      D197    F
## 198      D198    M
## 199      D199 <NA>
## 200      D200 <NA>
## 201      D201 <NA>
## 202      D202    M
## 203      D203 <NA>
## 204      D204    M
## 205      D205    F
## 206      D206    F
## 207      D207 <NA>
## 208      D208    M
## 209      D209 <NA>
## 210      D210    M
## 211      D211    F
## 212      D212    F
## 213      D213 <NA>
## 214      D214    F
## 215      D215    F
## 216      D216 <NA>
## 217      D217    F
## 218      D218    F
## 219      D219    M
## 220      D220    F
## 221      D221    M
## 222      D222    M
## 223      D223    M
## 224      D224 <NA>
## 225      D225    M
## 226      D226 <NA>
## 227      D227    F
## 228      D228    F
## 229      D229 <NA>
## 230      D230    M
## 231      D231    M
## 232      D232    F
## 233      D233    F
## 234      D234    F
## 235      D235 <NA>
## 236      D236    F
## 237      D237    M
## 238      D238    F
## 239      D239    F
## 240      D240    F
## 241      D241 <NA>
## 242      D242    M
## 243      D243 <NA>
## 244      D244    F
## 245      D245 <NA>
## 246      D246 <NA>
## 247      D247    M
## 248      D248    M
## 249      D249 <NA>
## 250      D250    F
## 251      D251 <NA>
## 252      D252 <NA>
## 253      D253    F
## 254      D254    F
## 255      D255 <NA>
## 256      D256 <NA>
## 257      D257 <NA>
## 258      D258 <NA>
## 259      D259    F
## 260      D260    F
## 261      D261 <NA>
## 262      D262    M
## 263      D263    M
## 264      D264    F
## 265      D265    M
## 266      D266 <NA>
## 267      D267 <NA>
## 268      D268    F
## 269      D269    F
## 270      D270 <NA>
## 271      D271    F
## 272      D272    F
## 273      D273 <NA>
## 274      D274    F
## 275      D275    F
## 276      D276    M
## 277      D277    F
## 278      D278    F
## 279      D279 <NA>
## 280      D280    M
## 281      D281    F
## 282      D282    F
## 283      D283    F
## 284      D284    F
## 285      D285 <NA>
## 286      D286    F
## 287      D287    M
## 288      D288    M
## 289      D289 <NA>
## 290      D290 <NA>
## 291      D291    F
## 292      D292 <NA>
## 293      D293    F
## 294      D294    F
## 295      D295    M
## 296      D296 <NA>
## 297      D297 <NA>
## 298      D298    M
## 299      D299    F
## 300      D300    F
## 301      D301    F
## 302      D302    F
## 303      D303 <NA>
## 304      D304    F
## 305      D305    M
## 306      D306 <NA>
## 307      D307 <NA>
## 308      D308    F
## 309      D309 <NA>
## 310      D310 <NA>
## 311      D311    M
## 312      D312    M
## 313      D313    F
## 314      D314    M
## 315      D315    F
## 316      D316    M
## 317      D317    F
## 318      D318 <NA>
## 319      D319 <NA>
## 320      D320 <NA>
## 321      D321    M
## 322      D322 <NA>
## 323      D323    M
## 324      D324 <NA>
## 325      D325    F
## 326      D326    F
## 327      D327    M
## 328      D328 <NA>
## 329      D329    M
## 330      D330    F
## 331      D331 <NA>
## 332      D332    F
## 333      D333 <NA>
## 334      D334 <NA>
## 335      D335    F
## 336      D336 <NA>
## 337      D337    M
## 338      D338 <NA>
## 339      D339 <NA>
## 340      D340 <NA>
## 341      D341    F
## 342      D342 <NA>
## 343      D343    F
## 344      D344 <NA>
## 345      D345    M
## 346      D346    F
## 347      D347    F
## 348      D348 <NA>
## 349      D349    M
## 350      D350    M
## 351      D351    M
## 352      D352    M
## 353      D353 <NA>
## 354      D354    F
## 355      D355 <NA>
## 356      D356    F
## 357      D357    F
## 358      D358 <NA>
## 359      D359    F
## 360      D360    M
## 361      D361    M
## 362      D362    F
## 363      D363 <NA>
## 364      D364 <NA>
## 365      D365    M
## 366      D366    M
## 367      D367    M
## 368      D368    F
## 369      D369    M
## 370      D370 <NA>
## 371      D371 <NA>
## 372      D372    M
## 373      D373    F
## 374      D374    M
## 375      D375    M
## 376      D376 <NA>
## 377      D377    F
## 378      D378    M
## 379      D379    F
## 380      D380 <NA>
## 381      D381    M
## 382      D382    F
## 383      D383    F
## 384      D384    M
## 385      D385    F
## 386      D386 <NA>
## 387      D387    M
## 388      D388    F
## 389      D389    F
## 390      D390    F
## 391      D391 <NA>
## 392      D392    M
## 393      D393    M
## 394      D394    M
## 395      D395 <NA>
## 396      D396    F
## 397      D397 <NA>
## 398      D398    F
## 399      D399    F
## 400      D400 <NA>
## 401      D401    M
## 402      D402    F
## 403      D403 <NA>
## 404      D404    F
## 405      D405    M
## 406      D406    M
## 407      D407    M
## 408      D408    F
## 409      D409 <NA>
## 410      D410    F
## 411      D411 <NA>
## 412      D412 <NA>
## 413      D413    F
## 414      D414    F
## 415      D415    F
## 416      D416    F
## 417      D417    M
## 418      D418    M
## 419      D419    F
## 420      D420 <NA>
## 421      D421 <NA>
## 422      D422    F
## 423      D423    M
## 424      D424    F
## 425      D425    M
## 426      D426    F
## 427      D427    F
## 428      D428 <NA>
## 429      D429 <NA>
## 430      D430    F
## 431      D431 <NA>
## 432      D432    M
## 433      D433    M
## 434      D434 <NA>
## 435      D435 <NA>
## 436      D436 <NA>
## 437      D437    M
## 438      D438 <NA>
## 439      D439    F
## 440      D440    M
## 441      D441    F
## 442      D442    M
## 443      D443    F
## 444      D444    M
## 445      D445    M
## 446      D446    F
## 447      D447 <NA>
## 448      D448    F
## 449      D449    M
## 450      D450    F
## 451      D451    F
## 452      D452    F
## 453      D453    F
## 454      D454    M
## 455      D455    F
## 456      D456    F
## 457      D457    F
## 458      D458    F
## 459      D459    F
## 460      D460    F
## 461      D461    M
## 462      D462 <NA>
## 463      D463    M
## 464      D464    F
## 465      D465    M
## 466      D466 <NA>
## 467      D467 <NA>
## 468      D468    M
## 469      D469    F
## 470      D470    F
## 471      D471    F
## 472      D472    M
## 473      D473    M
## 474      D474    F
## 475      D475    M
## 476      D476    F
## 477      D477    M
## 478      D478    M
## 479      D479    F
## 480      D480    F
## 481      D481    F
## 482      D482    M
## 483      D483 <NA>
## 484      D484    M
## 485      D485    M
## 486      D486 <NA>
## 487      D487    F
## 488      D488 <NA>
## 489      D489    F
## 490      D490    F
## 491      D491    F
## 492      D492    F
## 493      D493    F
## 494      D494    F
## 495      D495    M
## 496      D496    F
## 497      D497    F
## 498      D498    F
## 499      D499    F
## 500      D500    F
## 501      D999    F

And if we want to see the whole table (not specific columns), we can use a wildcard:

SELECT * 
FROM dragons;
##     dragon_id  sex age_class             species    update_timestamp
## 1          D1    F  Subadult     Hebridean Black                <NA>
## 2          D2 <NA>  Juvenile   Romanian Longhorn                <NA>
## 3          D3    F     Adult     Hebridean Black                <NA>
## 4          D4    F     Adult Peruvian Vipertooth                <NA>
## 5          D5 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 6          D6    F     Adult Norwegian Ridgeback                <NA>
## 7          D7    M     Adult     Hebridean Black                <NA>
## 8          D8 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 9          D9    F     Adult Norwegian Ridgeback                <NA>
## 10        D10    F     Adult  Common Welsh Green                <NA>
## 11        D11    F     Adult Peruvian Vipertooth                <NA>
## 12        D12    F     Adult Swedish Short-Snout                <NA>
## 13        D13 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 14        D14 <NA>  Juvenile     Hebridean Black                <NA>
## 15        D15 <NA>  Juvenile    Chinese Fireball                <NA>
## 16        D16    F     Adult    Chinese Fireball                <NA>
## 17        D17    F     Adult Peruvian Vipertooth                <NA>
## 18        D18    F     Adult Norwegian Ridgeback                <NA>
## 19        D19    F     Adult  Common Welsh Green                <NA>
## 20        D20    M  Subadult    Chinese Fireball                <NA>
## 21        D21    F     Adult Ukrainian Ironbelly                <NA>
## 22        D22 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 23        D23    M     Adult Swedish Short-Snout                <NA>
## 24        D24 <NA>  Juvenile  Common Welsh Green                <NA>
## 25        D25    M     Adult Peruvian Vipertooth                <NA>
## 26        D26 <NA>  Juvenile  Common Welsh Green                <NA>
## 27        D27    F     Adult Norwegian Ridgeback                <NA>
## 28        D28    M     Adult  Common Welsh Green                <NA>
## 29        D29    F     Adult Norwegian Ridgeback                <NA>
## 30        D30    F     Adult    Chinese Fireball                <NA>
## 31        D31 <NA>  Juvenile  Common Welsh Green                <NA>
## 32        D32    M     Adult Ukrainian Ironbelly                <NA>
## 33        D33    F     Adult   Romanian Longhorn                <NA>
## 34        D34    F     Adult Norwegian Ridgeback                <NA>
## 35        D35    F     Adult Ukrainian Ironbelly                <NA>
## 36        D36 <NA>  Juvenile   Romanian Longhorn                <NA>
## 37        D37    F  Subadult  Common Welsh Green                <NA>
## 38        D38    F     Adult  Hungarian Horntail                <NA>
## 39        D39 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 40        D40 <NA>  Juvenile  Common Welsh Green                <NA>
## 41        D41 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 42        D42    M     Adult Swedish Short-Snout                <NA>
## 43        D43    F  Subadult  Antipodean Opaleye                <NA>
## 44        D44    M     Adult   Romanian Longhorn                <NA>
## 45        D45    F     Adult Peruvian Vipertooth                <NA>
## 46        D46    M  Subadult     Hebridean Black                <NA>
## 47        D47    F     Adult Ukrainian Ironbelly                <NA>
## 48        D48    M     Adult Ukrainian Ironbelly                <NA>
## 49        D49 <NA>  Juvenile  Common Welsh Green                <NA>
## 50        D50    F     Adult    Chinese Fireball                <NA>
## 51        D51    F  Subadult Norwegian Ridgeback                <NA>
## 52        D52    M     Adult  Common Welsh Green                <NA>
## 53        D53 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 54        D54    F     Adult Norwegian Ridgeback                <NA>
## 55        D55 <NA>  Juvenile  Common Welsh Green                <NA>
## 56        D56    F  Subadult Norwegian Ridgeback                <NA>
## 57        D57    F     Adult  Common Welsh Green                <NA>
## 58        D58    M     Adult     Hebridean Black                <NA>
## 59        D59    F     Adult Peruvian Vipertooth                <NA>
## 60        D60    F     Adult     Hebridean Black                <NA>
## 61        D61    M  Subadult   Romanian Longhorn                <NA>
## 62        D62    F     Adult  Hungarian Horntail                <NA>
## 63        D63    F     Adult  Antipodean Opaleye                <NA>
## 64        D64 <NA>  Juvenile   Romanian Longhorn                <NA>
## 65        D65 <NA>  Juvenile   Romanian Longhorn                <NA>
## 66        D66 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 67        D67    M     Adult Norwegian Ridgeback                <NA>
## 68        D68    M     Adult  Antipodean Opaleye                <NA>
## 69        D69    F  Subadult     Hebridean Black                <NA>
## 70        D70    F     Adult Norwegian Ridgeback                <NA>
## 71        D71    M     Adult     Hebridean Black                <NA>
## 72        D72 <NA>  Juvenile  Common Welsh Green                <NA>
## 73        D73    F     Adult   Romanian Longhorn                <NA>
## 74        D74 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 75        D75    M     Adult Swedish Short-Snout                <NA>
## 76        D76    M     Adult   Romanian Longhorn                <NA>
## 77        D77 <NA>  Juvenile     Hebridean Black                <NA>
## 78        D78 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 79        D79    M     Adult  Common Welsh Green                <NA>
## 80        D80    F     Adult   Romanian Longhorn                <NA>
## 81        D81 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 82        D82    F     Adult     Hebridean Black                <NA>
## 83        D83 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 84        D84    M     Adult Norwegian Ridgeback                <NA>
## 85        D85    F     Adult     Hebridean Black                <NA>
## 86        D86    F     Adult     Hebridean Black                <NA>
## 87        D87    M     Adult  Hungarian Horntail                <NA>
## 88        D88    M     Adult Swedish Short-Snout                <NA>
## 89        D89    F     Adult Norwegian Ridgeback                <NA>
## 90        D90    M     Adult Peruvian Vipertooth                <NA>
## 91        D91    F     Adult     Hebridean Black                <NA>
## 92        D92    M     Adult     Hebridean Black                <NA>
## 93        D93    M  Subadult Peruvian Vipertooth                <NA>
## 94        D94    M     Adult   Romanian Longhorn                <NA>
## 95        D95    M     Adult   Romanian Longhorn                <NA>
## 96        D96    F     Adult  Common Welsh Green                <NA>
## 97        D97 <NA>  Juvenile    Chinese Fireball                <NA>
## 98        D98 <NA>  Juvenile     Hebridean Black                <NA>
## 99        D99    F     Adult     Hebridean Black                <NA>
## 100      D100 <NA>  Juvenile  Common Welsh Green                <NA>
## 101      D101 <NA>  Juvenile  Hungarian Horntail                <NA>
## 102      D102    F  Subadult  Hungarian Horntail                <NA>
## 103      D103    F     Adult     Hebridean Black                <NA>
## 104      D104    F     Adult     Hebridean Black                <NA>
## 105      D105    M     Adult  Common Welsh Green                <NA>
## 106      D106 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 107      D107 <NA>  Juvenile  Hungarian Horntail                <NA>
## 108      D108    F     Adult Norwegian Ridgeback                <NA>
## 109      D109    M     Adult Swedish Short-Snout                <NA>
## 110      D110    F     Adult Peruvian Vipertooth                <NA>
## 111      D111    M     Adult Norwegian Ridgeback                <NA>
## 112      D112    F     Adult Peruvian Vipertooth                <NA>
## 113      D113 <NA>  Juvenile   Romanian Longhorn                <NA>
## 114      D114 <NA>  Juvenile    Chinese Fireball                <NA>
## 115      D115    F     Adult  Common Welsh Green                <NA>
## 116      D116    F     Adult Norwegian Ridgeback                <NA>
## 117      D117 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 118      D118 <NA>  Juvenile  Hungarian Horntail                <NA>
## 119      D119    F     Adult Norwegian Ridgeback                <NA>
## 120      D120    F     Adult     Hebridean Black                <NA>
## 121      D121 <NA>  Juvenile  Common Welsh Green                <NA>
## 122      D122    F     Adult Ukrainian Ironbelly                <NA>
## 123      D123 <NA>  Juvenile  Common Welsh Green                <NA>
## 124      D124    M     Adult    Chinese Fireball                <NA>
## 125      D125 <NA>  Juvenile Swedish Short-Snout                <NA>
## 126      D126    M     Adult Swedish Short-Snout                <NA>
## 127      D127    M  Subadult Ukrainian Ironbelly                <NA>
## 128      D128 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 129      D129    M     Adult   Romanian Longhorn                <NA>
## 130      D130 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 131      D131 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 132      D132    F     Adult Norwegian Ridgeback                <NA>
## 133      D133    F     Adult Ukrainian Ironbelly                <NA>
## 134      D134    F     Adult     Hebridean Black                <NA>
## 135      D135 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 136      D136    F  Subadult    Chinese Fireball                <NA>
## 137      D137    F     Adult  Antipodean Opaleye                <NA>
## 138      D138    F     Adult  Common Welsh Green                <NA>
## 139      D139    F     Adult     Hebridean Black                <NA>
## 140      D140 <NA>  Juvenile     Hebridean Black                <NA>
## 141      D141    F     Adult   Romanian Longhorn                <NA>
## 142      D142    M  Subadult  Common Welsh Green                <NA>
## 143      D143    F     Adult  Antipodean Opaleye                <NA>
## 144      D144    M     Adult Norwegian Ridgeback                <NA>
## 145      D145    M     Adult Norwegian Ridgeback                <NA>
## 146      D146    M     Adult Norwegian Ridgeback                <NA>
## 147      D147 <NA>  Juvenile  Common Welsh Green                <NA>
## 148      D148 <NA>  Juvenile   Romanian Longhorn                <NA>
## 149      D149    M     Adult Norwegian Ridgeback                <NA>
## 150      D150    F     Adult  Hungarian Horntail                <NA>
## 151      D151 <NA>  Juvenile  Common Welsh Green                <NA>
## 152      D152    F  Subadult Peruvian Vipertooth                <NA>
## 153      D153    M     Adult  Antipodean Opaleye                <NA>
## 154      D154    F     Adult     Hebridean Black                <NA>
## 155      D155    F     Adult  Common Welsh Green                <NA>
## 156      D156    M     Adult Peruvian Vipertooth                <NA>
## 157      D157    F  Subadult Swedish Short-Snout                <NA>
## 158      D158 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 159      D159 <NA>  Juvenile  Common Welsh Green                <NA>
## 160      D160    F     Adult  Common Welsh Green                <NA>
## 161      D161    F     Adult    Chinese Fireball                <NA>
## 162      D162    F     Adult  Hungarian Horntail                <NA>
## 163      D163 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 164      D164 <NA>  Juvenile    Chinese Fireball                <NA>
## 165      D165    M     Adult  Hungarian Horntail                <NA>
## 166      D166 <NA>  Juvenile Swedish Short-Snout                <NA>
## 167      D167    M     Adult Norwegian Ridgeback                <NA>
## 168      D168    F     Adult Swedish Short-Snout                <NA>
## 169      D169    F     Adult     Hebridean Black                <NA>
## 170      D170 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 171      D171    M  Subadult Ukrainian Ironbelly                <NA>
## 172      D172    M     Adult Peruvian Vipertooth                <NA>
## 173      D173    F     Adult Swedish Short-Snout                <NA>
## 174      D174 <NA>  Juvenile   Romanian Longhorn                <NA>
## 175      D175    M     Adult    Chinese Fireball                <NA>
## 176      D176    F     Adult Ukrainian Ironbelly                <NA>
## 177      D177    F     Adult   Romanian Longhorn                <NA>
## 178      D178    F     Adult     Hebridean Black                <NA>
## 179      D179 <NA>  Juvenile  Common Welsh Green                <NA>
## 180      D180    F     Adult Norwegian Ridgeback                <NA>
## 181      D181    F     Adult  Common Welsh Green                <NA>
## 182      D182    F     Adult  Common Welsh Green                <NA>
## 183      D183    F     Adult  Common Welsh Green                <NA>
## 184      D184    F     Adult Norwegian Ridgeback                <NA>
## 185      D185    M     Adult    Chinese Fireball                <NA>
## 186      D186 <NA>  Juvenile  Common Welsh Green                <NA>
## 187      D187 <NA>  Juvenile     Hebridean Black                <NA>
## 188      D188    M     Adult  Hungarian Horntail                <NA>
## 189      D189    F     Adult  Hungarian Horntail                <NA>
## 190      D190    F     Adult     Hebridean Black                <NA>
## 191      D191    M     Adult Peruvian Vipertooth                <NA>
## 192      D192    M     Adult Norwegian Ridgeback                <NA>
## 193      D193    M     Adult Peruvian Vipertooth                <NA>
## 194      D194 <NA>  Juvenile   Romanian Longhorn                <NA>
## 195      D195    F     Adult   Romanian Longhorn                <NA>
## 196      D196    F     Adult   Romanian Longhorn                <NA>
## 197      D197    F     Adult  Hungarian Horntail                <NA>
## 198      D198    M     Adult Ukrainian Ironbelly                <NA>
## 199      D199 <NA>  Juvenile  Hungarian Horntail                <NA>
## 200      D200 <NA>  Juvenile  Hungarian Horntail                <NA>
## 201      D201 <NA>  Juvenile  Common Welsh Green                <NA>
## 202      D202    M     Adult  Common Welsh Green                <NA>
## 203      D203 <NA>  Juvenile   Romanian Longhorn                <NA>
## 204      D204    M     Adult   Romanian Longhorn                <NA>
## 205      D205    F     Adult  Hungarian Horntail                <NA>
## 206      D206    F     Adult  Hungarian Horntail                <NA>
## 207      D207 <NA>  Juvenile   Romanian Longhorn                <NA>
## 208      D208    M     Adult Norwegian Ridgeback                <NA>
## 209      D209 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 210      D210    M     Adult Norwegian Ridgeback                <NA>
## 211      D211    F     Adult     Hebridean Black                <NA>
## 212      D212    F     Adult  Antipodean Opaleye                <NA>
## 213      D213 <NA>  Juvenile     Hebridean Black                <NA>
## 214      D214    F  Subadult  Hungarian Horntail                <NA>
## 215      D215    F     Adult Norwegian Ridgeback                <NA>
## 216      D216 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 217      D217    F     Adult Peruvian Vipertooth                <NA>
## 218      D218    F     Adult     Hebridean Black                <NA>
## 219      D219    M     Adult Peruvian Vipertooth                <NA>
## 220      D220    F     Adult  Common Welsh Green                <NA>
## 221      D221    M     Adult Peruvian Vipertooth                <NA>
## 222      D222    M  Subadult  Hungarian Horntail                <NA>
## 223      D223    M     Adult  Antipodean Opaleye                <NA>
## 224      D224 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 225      D225    M     Adult Norwegian Ridgeback                <NA>
## 226      D226 <NA>  Juvenile Swedish Short-Snout                <NA>
## 227      D227    F     Adult   Romanian Longhorn                <NA>
## 228      D228    F     Adult     Hebridean Black                <NA>
## 229      D229 <NA>  Juvenile  Hungarian Horntail                <NA>
## 230      D230    M     Adult Norwegian Ridgeback                <NA>
## 231      D231    M     Adult Norwegian Ridgeback                <NA>
## 232      D232    F  Subadult     Hebridean Black                <NA>
## 233      D233    F  Subadult   Romanian Longhorn                <NA>
## 234      D234    F     Adult Norwegian Ridgeback                <NA>
## 235      D235 <NA>  Juvenile     Hebridean Black                <NA>
## 236      D236    F     Adult  Hungarian Horntail                <NA>
## 237      D237    M     Adult  Hungarian Horntail                <NA>
## 238      D238    F     Adult Swedish Short-Snout                <NA>
## 239      D239    F     Adult     Hebridean Black                <NA>
## 240      D240    F     Adult Ukrainian Ironbelly                <NA>
## 241      D241 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 242      D242    M     Adult   Romanian Longhorn                <NA>
## 243      D243 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 244      D244    F     Adult  Hungarian Horntail                <NA>
## 245      D245 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 246      D246 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 247      D247    M     Adult Norwegian Ridgeback                <NA>
## 248      D248    M  Subadult  Common Welsh Green                <NA>
## 249      D249 <NA>  Juvenile   Romanian Longhorn                <NA>
## 250      D250    F     Adult   Romanian Longhorn                <NA>
## 251      D251 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 252      D252 <NA>  Juvenile  Common Welsh Green                <NA>
## 253      D253    F     Adult Norwegian Ridgeback                <NA>
## 254      D254    F     Adult   Romanian Longhorn                <NA>
## 255      D255 <NA>  Juvenile  Hungarian Horntail                <NA>
## 256      D256 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 257      D257 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 258      D258 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 259      D259    F     Adult     Hebridean Black                <NA>
## 260      D260    F     Adult  Common Welsh Green                <NA>
## 261      D261 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 262      D262    M     Adult Peruvian Vipertooth                <NA>
## 263      D263    M     Adult     Hebridean Black                <NA>
## 264      D264    F  Subadult Norwegian Ridgeback                <NA>
## 265      D265    M     Adult     Hebridean Black                <NA>
## 266      D266 <NA>  Juvenile Swedish Short-Snout                <NA>
## 267      D267 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 268      D268    F     Adult   Romanian Longhorn                <NA>
## 269      D269    F     Adult Swedish Short-Snout                <NA>
## 270      D270 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 271      D271    F     Adult   Romanian Longhorn                <NA>
## 272      D272    F     Adult   Romanian Longhorn                <NA>
## 273      D273 <NA>  Juvenile  Common Welsh Green                <NA>
## 274      D274    F     Adult Norwegian Ridgeback                <NA>
## 275      D275    F     Adult     Hebridean Black                <NA>
## 276      D276    M     Adult Norwegian Ridgeback                <NA>
## 277      D277    F  Subadult Swedish Short-Snout                <NA>
## 278      D278    F     Adult  Common Welsh Green                <NA>
## 279      D279 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 280      D280    M     Adult Ukrainian Ironbelly                <NA>
## 281      D281    F     Adult   Romanian Longhorn                <NA>
## 282      D282    F     Adult Peruvian Vipertooth                <NA>
## 283      D283    F     Adult  Common Welsh Green                <NA>
## 284      D284    F     Adult   Romanian Longhorn                <NA>
## 285      D285 <NA>  Juvenile     Hebridean Black                <NA>
## 286      D286    F     Adult Ukrainian Ironbelly                <NA>
## 287      D287    M     Adult Norwegian Ridgeback                <NA>
## 288      D288    M  Subadult     Hebridean Black                <NA>
## 289      D289 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 290      D290 <NA>  Juvenile  Common Welsh Green                <NA>
## 291      D291    F     Adult     Hebridean Black                <NA>
## 292      D292 <NA>  Juvenile     Hebridean Black                <NA>
## 293      D293    F     Adult    Chinese Fireball                <NA>
## 294      D294    F     Adult  Common Welsh Green                <NA>
## 295      D295    M  Subadult  Antipodean Opaleye                <NA>
## 296      D296 <NA>  Juvenile  Hungarian Horntail                <NA>
## 297      D297 <NA>  Juvenile     Hebridean Black                <NA>
## 298      D298    M     Adult Peruvian Vipertooth                <NA>
## 299      D299    F     Adult   Romanian Longhorn                <NA>
## 300      D300    F     Adult Peruvian Vipertooth                <NA>
## 301      D301    F  Subadult  Common Welsh Green                <NA>
## 302      D302    F     Adult Ukrainian Ironbelly                <NA>
## 303      D303 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 304      D304    F  Subadult   Romanian Longhorn                <NA>
## 305      D305    M     Adult  Common Welsh Green                <NA>
## 306      D306 <NA>  Juvenile  Common Welsh Green                <NA>
## 307      D307 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 308      D308    F     Adult   Romanian Longhorn                <NA>
## 309      D309 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 310      D310 <NA>  Juvenile  Common Welsh Green                <NA>
## 311      D311    M     Adult Norwegian Ridgeback                <NA>
## 312      D312    M  Subadult    Chinese Fireball                <NA>
## 313      D313    F     Adult  Antipodean Opaleye                <NA>
## 314      D314    M     Adult   Romanian Longhorn                <NA>
## 315      D315    F     Adult  Common Welsh Green                <NA>
## 316      D316    M  Subadult     Hebridean Black                <NA>
## 317      D317    F     Adult  Antipodean Opaleye                <NA>
## 318      D318 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 319      D319 <NA>  Juvenile Swedish Short-Snout                <NA>
## 320      D320 <NA>  Juvenile  Hungarian Horntail                <NA>
## 321      D321    M     Adult Norwegian Ridgeback                <NA>
## 322      D322 <NA>  Juvenile    Chinese Fireball                <NA>
## 323      D323    M     Adult   Romanian Longhorn                <NA>
## 324      D324 <NA>  Juvenile  Common Welsh Green                <NA>
## 325      D325    F     Adult Norwegian Ridgeback                <NA>
## 326      D326    F     Adult Ukrainian Ironbelly                <NA>
## 327      D327    M  Subadult   Romanian Longhorn                <NA>
## 328      D328 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 329      D329    M     Adult   Romanian Longhorn                <NA>
## 330      D330    F     Adult Peruvian Vipertooth                <NA>
## 331      D331 <NA>  Juvenile Swedish Short-Snout                <NA>
## 332      D332    F  Subadult  Common Welsh Green                <NA>
## 333      D333 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 334      D334 <NA>  Juvenile   Romanian Longhorn                <NA>
## 335      D335    F     Adult   Romanian Longhorn                <NA>
## 336      D336 <NA>  Juvenile  Common Welsh Green                <NA>
## 337      D337    M     Adult Peruvian Vipertooth                <NA>
## 338      D338 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 339      D339 <NA>  Juvenile    Chinese Fireball                <NA>
## 340      D340 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 341      D341    F     Adult Norwegian Ridgeback                <NA>
## 342      D342 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 343      D343    F     Adult  Antipodean Opaleye                <NA>
## 344      D344 <NA>  Juvenile   Romanian Longhorn                <NA>
## 345      D345    M     Adult Swedish Short-Snout                <NA>
## 346      D346    F     Adult   Romanian Longhorn                <NA>
## 347      D347    F  Subadult  Common Welsh Green                <NA>
## 348      D348 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 349      D349    M     Adult Peruvian Vipertooth                <NA>
## 350      D350    M     Adult     Hebridean Black                <NA>
## 351      D351    M     Adult Peruvian Vipertooth                <NA>
## 352      D352    M     Adult   Romanian Longhorn                <NA>
## 353      D353 <NA>  Juvenile     Hebridean Black                <NA>
## 354      D354    F     Adult Norwegian Ridgeback                <NA>
## 355      D355 <NA>  Juvenile  Hungarian Horntail                <NA>
## 356      D356    F     Adult   Romanian Longhorn                <NA>
## 357      D357    F  Subadult   Romanian Longhorn                <NA>
## 358      D358 <NA>  Juvenile  Common Welsh Green                <NA>
## 359      D359    F     Adult     Hebridean Black                <NA>
## 360      D360    M     Adult Norwegian Ridgeback                <NA>
## 361      D361    M     Adult Swedish Short-Snout                <NA>
## 362      D362    F  Subadult Norwegian Ridgeback                <NA>
## 363      D363 <NA>  Juvenile   Romanian Longhorn                <NA>
## 364      D364 <NA>  Juvenile  Common Welsh Green                <NA>
## 365      D365    M     Adult   Romanian Longhorn                <NA>
## 366      D366    M     Adult Norwegian Ridgeback                <NA>
## 367      D367    M     Adult Peruvian Vipertooth                <NA>
## 368      D368    F     Adult  Common Welsh Green                <NA>
## 369      D369    M     Adult Norwegian Ridgeback                <NA>
## 370      D370 <NA>  Juvenile  Common Welsh Green                <NA>
## 371      D371 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 372      D372    M     Adult Peruvian Vipertooth                <NA>
## 373      D373    F     Adult Peruvian Vipertooth                <NA>
## 374      D374    M     Adult Swedish Short-Snout                <NA>
## 375      D375    M     Adult Norwegian Ridgeback                <NA>
## 376      D376 <NA>  Juvenile Swedish Short-Snout                <NA>
## 377      D377    F     Adult     Hebridean Black                <NA>
## 378      D378    M     Adult   Romanian Longhorn                <NA>
## 379      D379    F     Adult  Antipodean Opaleye                <NA>
## 380      D380 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 381      D381    M     Adult  Common Welsh Green                <NA>
## 382      D382    F     Adult Norwegian Ridgeback                <NA>
## 383      D383    F     Adult     Hebridean Black                <NA>
## 384      D384    M     Adult  Hungarian Horntail                <NA>
## 385      D385    F     Adult  Common Welsh Green                <NA>
## 386      D386 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 387      D387    M     Adult     Hebridean Black                <NA>
## 388      D388    F  Subadult  Common Welsh Green                <NA>
## 389      D389    F     Adult     Hebridean Black                <NA>
## 390      D390    F     Adult  Common Welsh Green                <NA>
## 391      D391 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 392      D392    M     Adult Peruvian Vipertooth                <NA>
## 393      D393    M     Adult     Hebridean Black                <NA>
## 394      D394    M     Adult Swedish Short-Snout                <NA>
## 395      D395 <NA>  Juvenile  Antipodean Opaleye                <NA>
## 396      D396    F  Subadult Norwegian Ridgeback                <NA>
## 397      D397 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 398      D398    F     Adult  Common Welsh Green                <NA>
## 399      D399    F     Adult   Romanian Longhorn                <NA>
## 400      D400 <NA>  Juvenile  Hungarian Horntail                <NA>
## 401      D401    M  Subadult Norwegian Ridgeback                <NA>
## 402      D402    F     Adult  Common Welsh Green                <NA>
## 403      D403 <NA>  Juvenile  Common Welsh Green                <NA>
## 404      D404    F     Adult  Common Welsh Green                <NA>
## 405      D405    M     Adult   Romanian Longhorn                <NA>
## 406      D406    M     Adult     Hebridean Black                <NA>
## 407      D407    M     Adult  Antipodean Opaleye                <NA>
## 408      D408    F     Adult  Common Welsh Green                <NA>
## 409      D409 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 410      D410    F     Adult     Hebridean Black                <NA>
## 411      D411 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 412      D412 <NA>  Juvenile   Romanian Longhorn                <NA>
## 413      D413    F     Adult   Romanian Longhorn                <NA>
## 414      D414    F     Adult Peruvian Vipertooth                <NA>
## 415      D415    F     Adult   Romanian Longhorn                <NA>
## 416      D416    F     Adult   Romanian Longhorn                <NA>
## 417      D417    M     Adult Norwegian Ridgeback                <NA>
## 418      D418    M  Subadult  Common Welsh Green                <NA>
## 419      D419    F     Adult Norwegian Ridgeback                <NA>
## 420      D420 <NA>  Juvenile  Common Welsh Green                <NA>
## 421      D421 <NA>  Juvenile  Common Welsh Green                <NA>
## 422      D422    F     Adult Norwegian Ridgeback                <NA>
## 423      D423    M     Adult  Common Welsh Green                <NA>
## 424      D424    F     Adult     Hebridean Black                <NA>
## 425      D425    M     Adult   Romanian Longhorn                <NA>
## 426      D426    F     Adult Norwegian Ridgeback                <NA>
## 427      D427    F     Adult Peruvian Vipertooth                <NA>
## 428      D428 <NA>  Juvenile  Common Welsh Green                <NA>
## 429      D429 <NA>  Juvenile  Common Welsh Green                <NA>
## 430      D430    F     Adult  Hungarian Horntail                <NA>
## 431      D431 <NA>  Juvenile Ukrainian Ironbelly                <NA>
## 432      D432    M     Adult  Antipodean Opaleye                <NA>
## 433      D433    M     Adult Peruvian Vipertooth                <NA>
## 434      D434 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 435      D435 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 436      D436 <NA>  Juvenile   Romanian Longhorn                <NA>
## 437      D437    M     Adult  Common Welsh Green                <NA>
## 438      D438 <NA>  Juvenile Norwegian Ridgeback                <NA>
## 439      D439    F  Subadult  Hungarian Horntail                <NA>
## 440      D440    M     Adult  Common Welsh Green                <NA>
## 441      D441    F     Adult Norwegian Ridgeback                <NA>
## 442      D442    M     Adult   Romanian Longhorn                <NA>
## 443      D443    F     Adult     Hebridean Black                <NA>
## 444      D444    M     Adult     Hebridean Black                <NA>
## 445      D445    M     Adult Norwegian Ridgeback                <NA>
## 446      D446    F     Adult Swedish Short-Snout                <NA>
## 447      D447 <NA>  Juvenile   Romanian Longhorn                <NA>
## 448      D448    F     Adult Peruvian Vipertooth                <NA>
## 449      D449    M     Adult    Chinese Fireball                <NA>
## 450      D450    F     Adult Peruvian Vipertooth                <NA>
## 451      D451    F     Adult  Common Welsh Green                <NA>
## 452      D452    F     Adult Swedish Short-Snout                <NA>
## 453      D453    F     Adult  Antipodean Opaleye                <NA>
## 454      D454    M     Adult   Romanian Longhorn                <NA>
## 455      D455    F     Adult  Common Welsh Green                <NA>
## 456      D456    F     Adult Peruvian Vipertooth                <NA>
## 457      D457    F     Adult  Antipodean Opaleye                <NA>
## 458      D458    F     Adult   Romanian Longhorn                <NA>
## 459      D459    F  Subadult Swedish Short-Snout                <NA>
## 460      D460    F     Adult Norwegian Ridgeback                <NA>
## 461      D461    M     Adult     Hebridean Black                <NA>
## 462      D462 <NA>  Juvenile  Common Welsh Green                <NA>
## 463      D463    M     Adult Peruvian Vipertooth                <NA>
## 464      D464    F     Adult     Hebridean Black                <NA>
## 465      D465    M     Adult  Hungarian Horntail                <NA>
## 466      D466 <NA>  Juvenile Peruvian Vipertooth                <NA>
## 467      D467 <NA>  Juvenile  Hungarian Horntail                <NA>
## 468      D468    M     Adult Peruvian Vipertooth                <NA>
## 469      D469    F     Adult   Romanian Longhorn                <NA>
## 470      D470    F     Adult  Hungarian Horntail                <NA>
## 471      D471    F     Adult Peruvian Vipertooth                <NA>
## 472      D472    M     Adult  Hungarian Horntail                <NA>
## 473      D473    M     Adult Peruvian Vipertooth                <NA>
## 474      D474    F     Adult Ukrainian Ironbelly                <NA>
## 475      D475    M  Subadult Norwegian Ridgeback                <NA>
## 476      D476    F     Adult Norwegian Ridgeback                <NA>
## 477      D477    M     Adult Norwegian Ridgeback                <NA>
## 478      D478    M     Adult  Common Welsh Green                <NA>
## 479      D479    F     Adult    Chinese Fireball                <NA>
## 480      D480    F     Adult     Hebridean Black                <NA>
## 481      D481    F     Adult  Hungarian Horntail                <NA>
## 482      D482    M  Subadult   Romanian Longhorn                <NA>
## 483      D483 <NA>  Juvenile     Hebridean Black                <NA>
## 484      D484    M     Adult     Hebridean Black                <NA>
## 485      D485    M     Adult     Hebridean Black                <NA>
## 486      D486 <NA>  Juvenile  Common Welsh Green                <NA>
## 487      D487    F     Adult  Hungarian Horntail                <NA>
## 488      D488 <NA>  Juvenile  Hungarian Horntail                <NA>
## 489      D489    F     Adult  Common Welsh Green                <NA>
## 490      D490    F     Adult   Romanian Longhorn                <NA>
## 491      D491    F     Adult     Hebridean Black                <NA>
## 492      D492    F     Adult  Hungarian Horntail                <NA>
## 493      D493    F     Adult     Hebridean Black                <NA>
## 494      D494    F     Adult    Chinese Fireball                <NA>
## 495      D495    M     Adult   Romanian Longhorn                <NA>
## 496      D496    F     Adult   Romanian Longhorn                <NA>
## 497      D497    F     Adult  Hungarian Horntail                <NA>
## 498      D498    F     Adult  Common Welsh Green                <NA>
## 499      D499    F     Adult Peruvian Vipertooth                <NA>
## 500      D500    F     Adult  Common Welsh Green                <NA>
## 501      D999    F     Adult Norwegian Ridgeback 2021-02-23 17:14:21

6.2.1 Limiting results

If the whole table is too big and we only want to take a look at the first, say, 10 rows, we can add a LIMIT clause:

SELECT * 
FROM dragons 
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D1    F  Subadult     Hebridean Black             <NA>
## 2         D2 <NA>  Juvenile   Romanian Longhorn             <NA>
## 3         D3    F     Adult     Hebridean Black             <NA>
## 4         D4    F     Adult Peruvian Vipertooth             <NA>
## 5         D5 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 6         D6    F     Adult Norwegian Ridgeback             <NA>
## 7         D7    M     Adult     Hebridean Black             <NA>
## 8         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 9         D9    F     Adult Norwegian Ridgeback             <NA>
## 10       D10    F     Adult  Common Welsh Green             <NA>

6.2.2 Sorting results

One thing to remember with SQL is that the data returned by a SELECT statement does not necessarily return the data in any specific order unless we tell it to. You can never assume the output is already sorted. To be explicit about how you want the data ordered, you can add an ORDER BY clause:

SELECT * 
FROM dragons 
ORDER BY species
LIMIT 10;
##    dragon_id  sex age_class            species update_timestamp
## 1        D39 <NA>  Juvenile Antipodean Opaleye             <NA>
## 2        D43    F  Subadult Antipodean Opaleye             <NA>
## 3        D63    F     Adult Antipodean Opaleye             <NA>
## 4        D68    M     Adult Antipodean Opaleye             <NA>
## 5        D74 <NA>  Juvenile Antipodean Opaleye             <NA>
## 6       D117 <NA>  Juvenile Antipodean Opaleye             <NA>
## 7       D137    F     Adult Antipodean Opaleye             <NA>
## 8       D143    F     Adult Antipodean Opaleye             <NA>
## 9       D153    M     Adult Antipodean Opaleye             <NA>
## 10      D212    F     Adult Antipodean Opaleye             <NA>

The default ordering will be ascending. If we want descending ordering instead, we can specify that as follows:

SELECT * 
FROM dragons 
ORDER BY species DESC
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D5 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 2        D21    F     Adult Ukrainian Ironbelly             <NA>
## 3        D32    M     Adult Ukrainian Ironbelly             <NA>
## 4        D35    F     Adult Ukrainian Ironbelly             <NA>
## 5        D41 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 6        D47    F     Adult Ukrainian Ironbelly             <NA>
## 7        D48    M     Adult Ukrainian Ironbelly             <NA>
## 8        D66 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 9       D122    F     Adult Ukrainian Ironbelly             <NA>
## 10      D127    M  Subadult Ukrainian Ironbelly             <NA>

6.2.3 Finding unique values

Say that we wanted to know what species of dragons are in the database. We can ask for unique values as follows:

SELECT DISTINCT species 
FROM dragons;
##                species
## 1      Hebridean Black
## 2    Romanian Longhorn
## 3  Peruvian Vipertooth
## 4  Ukrainian Ironbelly
## 5  Norwegian Ridgeback
## 6   Common Welsh Green
## 7  Swedish Short-Snout
## 8     Chinese Fireball
## 9   Hungarian Horntail
## 10  Antipodean Opaleye

6.2.4 Filtering

We can filter data based on any logical condition using a WHERE clause. For instance, say that we we only want Norwegian Ridgebacks:

SELECT * 
FROM dragons
WHERE species = 'Norwegian Ridgeback'
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D6    F     Adult Norwegian Ridgeback             <NA>
## 2         D9    F     Adult Norwegian Ridgeback             <NA>
## 3        D18    F     Adult Norwegian Ridgeback             <NA>
## 4        D22 <NA>  Juvenile Norwegian Ridgeback             <NA>
## 5        D27    F     Adult Norwegian Ridgeback             <NA>
## 6        D29    F     Adult Norwegian Ridgeback             <NA>
## 7        D34    F     Adult Norwegian Ridgeback             <NA>
## 8        D51    F  Subadult Norwegian Ridgeback             <NA>
## 9        D53 <NA>  Juvenile Norwegian Ridgeback             <NA>
## 10       D54    F     Adult Norwegian Ridgeback             <NA>

We can specify multiple conditions at once. For example, if we want only female Norwegian Ridgebacks:

SELECT * 
FROM dragons
WHERE species = 'Norwegian Ridgeback' AND sex = 'F'
LIMIT 10;
##    dragon_id sex age_class             species update_timestamp
## 1         D6   F     Adult Norwegian Ridgeback             <NA>
## 2         D9   F     Adult Norwegian Ridgeback             <NA>
## 3        D18   F     Adult Norwegian Ridgeback             <NA>
## 4        D27   F     Adult Norwegian Ridgeback             <NA>
## 5        D29   F     Adult Norwegian Ridgeback             <NA>
## 6        D34   F     Adult Norwegian Ridgeback             <NA>
## 7        D51   F  Subadult Norwegian Ridgeback             <NA>
## 8        D54   F     Adult Norwegian Ridgeback             <NA>
## 9        D56   F  Subadult Norwegian Ridgeback             <NA>
## 10       D70   F     Adult Norwegian Ridgeback             <NA>

In the example above, both conditions have to be satisfied at once – dragons have to be Norwegian Ridgeback and females to appear in the results. What if we want to return data where at least one of two conditions is satisfied? The following query will return dragons that are either Norwegian Ridgebacks or Peruvian Vipertooths:

SELECT * 
FROM dragons
WHERE species = 'Norwegian Ridgeback' OR species = 'Peruvian Vipertooth'
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D4    F     Adult Peruvian Vipertooth             <NA>
## 2         D6    F     Adult Norwegian Ridgeback             <NA>
## 3         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 4         D9    F     Adult Norwegian Ridgeback             <NA>
## 5        D11    F     Adult Peruvian Vipertooth             <NA>
## 6        D13 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 7        D17    F     Adult Peruvian Vipertooth             <NA>
## 8        D18    F     Adult Norwegian Ridgeback             <NA>
## 9        D22 <NA>  Juvenile Norwegian Ridgeback             <NA>
## 10       D25    M     Adult Peruvian Vipertooth             <NA>

A more concise way to write the same query would be:

SELECT * 
FROM dragons
WHERE species IN ('Norwegian Ridgeback', 'Peruvian Vipertooth')
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D4    F     Adult Peruvian Vipertooth             <NA>
## 2         D6    F     Adult Norwegian Ridgeback             <NA>
## 3         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 4         D9    F     Adult Norwegian Ridgeback             <NA>
## 5        D11    F     Adult Peruvian Vipertooth             <NA>
## 6        D13 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 7        D17    F     Adult Peruvian Vipertooth             <NA>
## 8        D18    F     Adult Norwegian Ridgeback             <NA>
## 9        D22 <NA>  Juvenile Norwegian Ridgeback             <NA>
## 10       D25    M     Adult Peruvian Vipertooth             <NA>

Logical conditions also include “greater than”, “less than”, “not equal to”. For instance, we can exclude Norwegian Ridgeback like so:

SELECT * 
FROM dragons
WHERE species != 'Norwegian Ridgeback'
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D1    F  Subadult     Hebridean Black             <NA>
## 2         D2 <NA>  Juvenile   Romanian Longhorn             <NA>
## 3         D3    F     Adult     Hebridean Black             <NA>
## 4         D4    F     Adult Peruvian Vipertooth             <NA>
## 5         D5 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 6         D7    M     Adult     Hebridean Black             <NA>
## 7         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 8        D10    F     Adult  Common Welsh Green             <NA>
## 9        D11    F     Adult Peruvian Vipertooth             <NA>
## 10       D12    F     Adult Swedish Short-Snout             <NA>

Let’s see some examples with numbers using the morphometrics table. The following query returns any dragons with wingspan greater than 8 meters (incidentally, you can include comments in SQL is by using a double dash):

SELECT * 
FROM morphometrics
WHERE wingspan_cm > 800 -- 800 cm is 8 meters
LIMIT 10;
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1               1       D96 2012-10-10            1069.8965    1389.553
## 2               3      D316 2017-09-08             866.8935    1052.370
## 3               4      D317 2016-09-05            1146.9708    1356.808
## 4               5      D484 2016-12-04            1032.0520    1720.864
## 5               6      D149 2012-02-13             919.9908    1533.599
## 6               9      D283 2007-06-21            1698.1918    1387.194
## 7              11      D485 2002-10-04             957.4830    1780.262
## 8              16      D343 2016-04-08            1520.7012    1537.747
## 9              19      D237 2009-02-22            1204.7588    2120.408
## 10             20      D312 2001-06-16             927.1642    1267.669
##    tail_length_cm tarsus_length_cm claw_length_cm
## 1        595.2706        121.65175       15.59622
## 2        373.7619         68.16869       12.71970
## 3        542.5670        172.43663       14.80936
## 4        596.4419        114.05057       11.98567
## 5        563.9201        134.18051       11.00507
## 6        666.4246        147.44219       13.18923
## 7        560.8861        130.86021       16.52066
## 8        618.2842        151.93929       10.37198
## 9        655.5090        134.75780       11.13375
## 10       452.7417        108.84082       11.93288

We can also sort the output of the filter:

SELECT * 
FROM morphometrics
WHERE wingspan_cm > 800 
ORDER BY wingspan_cm DESC
LIMIT 10;
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1              90      D494 2004-06-12             1307.224    3101.688
## 2              39       D99 2016-11-16             1334.744    2891.525
## 3              86      D378 2013-09-14             1311.868    2792.250
## 4             138      D414 2015-08-28             1721.825    2653.967
## 5             240      D228 2019-06-28             1789.812    2653.470
## 6             312      D476 2000-06-20             1391.101    2617.818
## 7             127       D70 2010-09-10             1601.945    2518.449
## 8              23      D499 2011-12-09             1268.471    2481.832
## 9             282      D496 2006-08-18             1536.888    2460.436
## 10            256      D321 2017-03-19             1423.789    2440.229
##    tail_length_cm tarsus_length_cm claw_length_cm
## 1        609.9900         131.2439      14.031487
## 2        596.7174         144.0813       7.619286
## 3        601.6576         107.9930      20.714964
## 4        619.6061         128.5858      18.447108
## 5        604.2838         162.4562       8.846004
## 6        599.2680         138.8046       9.720670
## 7        598.8351         135.7797       8.544782
## 8        609.8503         129.6169      10.313340
## 9        679.7031         136.7213      17.495741
## 10       619.7223         116.3764       7.266753

6.2.5 Calculations

The morphometric measurements are in centimeters, but we often define our filtering conditions using meters in our minds (see above, “wingspan greater than 8 meters”). Instead of converting the condition into centimeters, wouldn’t it be easier to have SQLite transform the data in meters and then evaluate the condition? We can do this calculation on the fly:

SELECT * 
FROM morphometrics
WHERE wingspan_cm/100 > 8
LIMIT 10;
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1               1       D96 2012-10-10            1069.8965    1389.553
## 2               3      D316 2017-09-08             866.8935    1052.370
## 3               4      D317 2016-09-05            1146.9708    1356.808
## 4               5      D484 2016-12-04            1032.0520    1720.864
## 5               6      D149 2012-02-13             919.9908    1533.599
## 6               9      D283 2007-06-21            1698.1918    1387.194
## 7              11      D485 2002-10-04             957.4830    1780.262
## 8              16      D343 2016-04-08            1520.7012    1537.747
## 9              19      D237 2009-02-22            1204.7588    2120.408
## 10             20      D312 2001-06-16             927.1642    1267.669
##    tail_length_cm tarsus_length_cm claw_length_cm
## 1        595.2706        121.65175       15.59622
## 2        373.7619         68.16869       12.71970
## 3        542.5670        172.43663       14.80936
## 4        596.4419        114.05057       11.98567
## 5        563.9201        134.18051       11.00507
## 6        666.4246        147.44219       13.18923
## 7        560.8861        130.86021       16.52066
## 8        618.2842        151.93929       10.37198
## 9        655.5090        134.75780       11.13375
## 10       452.7417        108.84082       11.93288

6.2.6 Aggregate functions

Summary statistics are calculated using aggregate functions in SQL. Counting, summing, calculating the mean, minimum, maximum values are all examples of aggregate functions. Let’s see how they work.

If we want to know how many Norwegian Ridgebacks are in the database, we can use:

SELECT COUNT(*) 
FROM dragons
WHERE species = 'Norwegian Ridgeback';
##   COUNT(*)
## 1       87

The dragons table has no repeated IDs, so we shouldn’t have counted anybody twice. However, let’s double check if that’s true by specifying that we want to only count distinct IDs:

SELECT COUNT(DISTINCT dragon_id) 
FROM dragons
WHERE species = 'Norwegian Ridgeback';
##   COUNT(DISTINCT dragon_id)
## 1                        87

Now let’s see how to calculate some summary stats on dragon morphometrics. What is the mean total body length of dragons in our sample?

SELECT AVG(total_body_length_cm)
FROM morphometrics;
##   AVG(total_body_length_cm)
## 1                  961.3612

We can also calculate several different things at once:

SELECT AVG(total_body_length_cm), MIN(total_body_length_cm), MAX(total_body_length_cm)
FROM morphometrics;
##   AVG(total_body_length_cm) MIN(total_body_length_cm) MAX(total_body_length_cm)
## 1                  961.3612                  285.6049                  2250.789

6.2.7 Aliases

Aliases are temporary names that we can assign to a column (or a table) during the execution of a query. For example, we can use an alias to rename the output of our mean body length calculation:

SELECT AVG(total_body_length_cm) AS mean_body_length
FROM morphometrics;
##   mean_body_length
## 1         961.3612

6.2.8 Grouping

If we want to apply the same calculation to different groups within the data, we can use the GROUP BY clause. GROUP BY is used in conjunction with an aggregate function to return the computed values broken down by group. For instance, if we want to count how many dragons we have for each species (and sort them from the most numerous to the least numerous):

SELECT species, COUNT(*) AS n_dragons
FROM dragons
GROUP BY species
ORDER BY n_dragons DESC;
##                species n_dragons
## 1  Norwegian Ridgeback        87
## 2   Common Welsh Green        81
## 3    Romanian Longhorn        69
## 4      Hebridean Black        66
## 5  Peruvian Vipertooth        60
## 6   Hungarian Horntail        40
## 7  Swedish Short-Snout        27
## 8  Ukrainian Ironbelly        26
## 9   Antipodean Opaleye        25
## 10    Chinese Fireball        20

6.2.9 Filtering based on computed values

If we want to apply any filtering to our results based on calculated values, the WHERE clause is not going to work. Instead, a HAVING clause is used in combination with an aggregate function and a GROUP BY. HAVING does the same thing as WHERE except that it handles logical conditions on computed values. For example, say that we want to count how many individuals we have in our database and then only keep the species for which we have at least 50 individuals. The following query counts how many individual dragons we have for each species:

SELECT COUNT(DISTINCT dragon_id) AS n_dragons, species
FROM dragons
GROUP BY species;
##    n_dragons             species
## 1         25  Antipodean Opaleye
## 2         20    Chinese Fireball
## 3         81  Common Welsh Green
## 4         66     Hebridean Black
## 5         40  Hungarian Horntail
## 6         87 Norwegian Ridgeback
## 7         60 Peruvian Vipertooth
## 8         69   Romanian Longhorn
## 9         27 Swedish Short-Snout
## 10        26 Ukrainian Ironbelly

Because the count is a calculated value (the output of the aggregate function COUNT), if we try to apply a filter using WHERE we’ll get an error. Instead, we can use HAVING:

SELECT COUNT(DISTINCT dragon_id) AS n_dragons, species
FROM dragons
GROUP BY species
HAVING n_dragons > 50;
##   n_dragons             species
## 1        81  Common Welsh Green
## 2        66     Hebridean Black
## 3        87 Norwegian Ridgeback
## 4        60 Peruvian Vipertooth
## 5        69   Romanian Longhorn

6.2.10 Null values

Missing values in SQL are represented as NULL. We can filter (or filter out) these values using the IS NULL operator:

SELECT * 
FROM dragons
WHERE sex IS NULL
LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D2 <NA>  Juvenile   Romanian Longhorn             <NA>
## 2         D5 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 3         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 4        D13 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 5        D14 <NA>  Juvenile     Hebridean Black             <NA>
## 6        D15 <NA>  Juvenile    Chinese Fireball             <NA>
## 7        D22 <NA>  Juvenile Norwegian Ridgeback             <NA>
## 8        D24 <NA>  Juvenile  Common Welsh Green             <NA>
## 9        D26 <NA>  Juvenile  Common Welsh Green             <NA>
## 10       D31 <NA>  Juvenile  Common Welsh Green             <NA>

The query above returns all the dragons for which sex is unknown. If we want to exclude any individuals for which sex is unknown, then we can add the NOT operator to our statement:

SELECT * 
FROM dragons
WHERE sex IS NOT NULL
LIMIT 10;
##    dragon_id sex age_class             species update_timestamp
## 1         D1   F  Subadult     Hebridean Black             <NA>
## 2         D3   F     Adult     Hebridean Black             <NA>
## 3         D4   F     Adult Peruvian Vipertooth             <NA>
## 4         D6   F     Adult Norwegian Ridgeback             <NA>
## 5         D7   M     Adult     Hebridean Black             <NA>
## 6         D9   F     Adult Norwegian Ridgeback             <NA>
## 7        D10   F     Adult  Common Welsh Green             <NA>
## 8        D11   F     Adult Peruvian Vipertooth             <NA>
## 9        D12   F     Adult Swedish Short-Snout             <NA>
## 10       D16   F     Adult    Chinese Fireball             <NA>

6.2.11 Joins

So far, we have learned how to query data from a single table. But what if we need to combine information from multiple tables to get the result we want? Let’s make an example. Say that we want to calculate the average wingspan for male versus female dragons. The wingspan measurements are in the morphometrics table. However, the morphometrics table does not have a “sex” column. The information on sex is in the dragons table. To get our answer, we need to combine information from the dragons and morphometrics tables. To do so, we introduce SQL joins.

Joins are the heart of relational database use. Without joins, there is no point in a relational database because we can’t take advantage of the relations between tables. Joins exploit the links we established between tables via foreign keys to combine information across them.

There are several types of join. The most common types are left join, inner join, or full join. To understand this terminology, consider this: whenever you are joining two tables, the first table you mention (the one to which you’re joining) is called the left table, whereas the second table (the one you’re joining to the first) is called the right table. With a left join, you keep all the records in the left table and add information from the right table whenever there’s a matching row. A full join means that you retain all rows from both tables, matching them whenever possible. An inner join means that you only retain the rows that match between the two tables.

SQL Joins

Figure 6.1: SQL Joins

Let’s look at this in practice and it will make more sense. Our database contains a deployments table and a morphometrics table. Not all dragons that were tracked were also measured. If we do a left join using the deployments as the left table, we get in output the whole deployments table (all the dragons that were tracked) with the associated morphometric information whenever available (for the dragons that were also measured):

SELECT *
FROM deployments -- this is our left table
LEFT JOIN morphometrics -- this is our right table 
ON deployments.dragon_id = morphometrics.dragon_id -- this is the shared column
LIMIT 10;
##    deployment_id dragon_id tag_id start_deployment end_deployment
## 1              1      D486 N33529       2006-12-20     2007-07-04
## 2              2      D393 J47978       2008-01-13     2008-10-13
## 3              3       D88 R61684       2004-09-19     2004-12-01
## 4              4      D330 B35524       2015-11-21     2016-08-02
## 5              5      D478 T55954       2002-05-07     2002-10-26
## 6              6      D300 O31688       2006-01-20     2006-08-23
## 7              7      D380 B61925       2015-01-27     2015-10-21
## 8              8      D315 P47677       2006-03-03     2006-06-06
## 9              9      D209 P35987       2007-02-26     2007-05-31
## 10            10      D357 I41609       2013-09-09     2014-03-08
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1              14      D486 2006-12-20             315.2255    616.5057
## 2             146      D393 2008-01-13            1275.6809    803.7776
## 3              NA      <NA>       <NA>                   NA          NA
## 4             252      D330 2015-11-21            1364.8418   1351.6375
## 5              NA      <NA>       <NA>                   NA          NA
## 6              NA      <NA>       <NA>                   NA          NA
## 7             260      D380 2015-01-27             327.5435    616.5409
## 8              NA      <NA>       <NA>                   NA          NA
## 9              82      D209 2007-02-26             325.2996    670.5136
## 10             31      D357 2013-09-09             730.9783    972.8464
##    tail_length_cm tarsus_length_cm claw_length_cm
## 1        117.3121         31.70299       4.980824
## 2        638.4500        133.38012      12.472582
## 3              NA               NA             NA
## 4        637.7992        140.29208      14.076526
## 5              NA               NA             NA
## 6              NA               NA             NA
## 7        102.2240         35.78315       4.716980
## 8              NA               NA             NA
## 9        118.8795         41.10574       4.926693
## 10       489.0859         93.98672      17.895960

Since the shared column is named the same in the two tables (dragon_id), I had to specify which table I was referring to each time in the ON clause, or it would have been ambiguous. The syntax to do so is table.column. Also note that, because the dragon ID column appears in both tables and we did not specify which table we wanted it from, SQLite is duplicating it. To avoid that, we need to get rid of the wildcard and spell out the name of each of the columns we want in output (also specifying the table when ambiguous):

SELECT deployments.dragon_id, date, tag_id, start_deployment, end_deployment,
total_body_length_cm, wingspan_cm, tail_length_cm, tarsus_length_cm, claw_length_cm
FROM deployments 
LEFT JOIN morphometrics 
ON deployments.dragon_id = morphometrics.dragon_id 
LIMIT 10;
##    dragon_id       date tag_id start_deployment end_deployment
## 1       D486 2006-12-20 N33529       2006-12-20     2007-07-04
## 2       D393 2008-01-13 J47978       2008-01-13     2008-10-13
## 3        D88       <NA> R61684       2004-09-19     2004-12-01
## 4       D330 2015-11-21 B35524       2015-11-21     2016-08-02
## 5       D478       <NA> T55954       2002-05-07     2002-10-26
## 6       D300       <NA> O31688       2006-01-20     2006-08-23
## 7       D380 2015-01-27 B61925       2015-01-27     2015-10-21
## 8       D315       <NA> P47677       2006-03-03     2006-06-06
## 9       D209 2007-02-26 P35987       2007-02-26     2007-05-31
## 10      D357 2013-09-09 I41609       2013-09-09     2014-03-08
##    total_body_length_cm wingspan_cm tail_length_cm tarsus_length_cm
## 1              315.2255    616.5057       117.3121         31.70299
## 2             1275.6809    803.7776       638.4500        133.38012
## 3                    NA          NA             NA               NA
## 4             1364.8418   1351.6375       637.7992        140.29208
## 5                    NA          NA             NA               NA
## 6                    NA          NA             NA               NA
## 7              327.5435    616.5409       102.2240         35.78315
## 8                    NA          NA             NA               NA
## 9              325.2996    670.5136       118.8795         41.10574
## 10             730.9783    972.8464       489.0859         93.98672
##    claw_length_cm
## 1        4.980824
## 2       12.472582
## 3              NA
## 4       14.076526
## 5              NA
## 6              NA
## 7        4.716980
## 8              NA
## 9        4.926693
## 10      17.895960

If we invert the tables (the morphometrics becomes the left table and the deployments becomes the right), we get in output the whole morphometric table (all the dragons that were measured) with the associated deployment information whenever available (for the dragons that were also tracked):

SELECT *
FROM morphometrics -- this is our left table
LEFT JOIN deployments -- this is our right table 
ON morphometrics.dragon_id = deployments.dragon_id 
LIMIT 10;
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1               1       D96 2012-10-10            1069.8965   1389.5527
## 2               2      D400 2003-07-03             333.4600    634.9109
## 3               3      D316 2017-09-08             866.8935   1052.3702
## 4               4      D317 2016-09-05            1146.9708   1356.8084
## 5               5      D484 2016-12-04            1032.0520   1720.8641
## 6               6      D149 2012-02-13             919.9908   1533.5991
## 7               7      D285 2016-03-23             304.8285    698.8157
## 8               8      D256 2013-09-07             358.9701    652.0053
## 9               9      D283 2007-06-21            1698.1918   1387.1943
## 10             10      D213 2001-12-12             353.6952    670.5283
##    tail_length_cm tarsus_length_cm claw_length_cm deployment_id dragon_id
## 1        595.2706        121.65175      15.596219           274       D96
## 2        104.2241         38.10844       4.305086           260      D400
## 3        373.7619         68.16869      12.719697            NA      <NA>
## 4        542.5670        172.43663      14.809363            98      D317
## 5        596.4419        114.05057      11.985672           285      D484
## 6        563.9201        134.18051      11.005070            NA      <NA>
## 7        115.8964         42.18657       5.041634           126      D285
## 8        148.3656         43.57849       4.576640           291      D256
## 9        666.4246        147.44219      13.189233           145      D283
## 10       140.4342         43.60513       4.248564           128      D213
##    tag_id start_deployment end_deployment
## 1  H83101       2012-10-10     2013-06-02
## 2  S49621       2003-07-03     2004-01-17
## 3    <NA>             <NA>           <NA>
## 4  A49007       2016-09-05     2017-01-09
## 5  V63425       2016-12-04     2017-02-19
## 6    <NA>             <NA>           <NA>
## 7  B74989       2016-03-23     2016-11-08
## 8  O66678       2013-09-07     2013-11-10
## 9  G59298       2007-06-21     2007-11-14
## 10 F64333       2001-12-12     2002-02-05

An inner join only keeps the rows that match between two tables. In this case, that means we only get in output data for animals that were both tracked and measured:

SELECT *
FROM morphometrics
INNER JOIN deployments 
ON morphometrics.dragon_id = deployments.dragon_id 
LIMIT 10;
##    measurement_id dragon_id       date total_body_length_cm wingspan_cm
## 1               1       D96 2012-10-10            1069.8965   1389.5527
## 2               2      D400 2003-07-03             333.4600    634.9109
## 3               4      D317 2016-09-05            1146.9708   1356.8084
## 4               5      D484 2016-12-04            1032.0520   1720.8641
## 5               7      D285 2016-03-23             304.8285    698.8157
## 6               8      D256 2013-09-07             358.9701    652.0053
## 7               9      D283 2007-06-21            1698.1918   1387.1943
## 8              10      D213 2001-12-12             353.6952    670.5283
## 9              13       D41 2016-11-03             325.5946    626.2576
## 10             14      D486 2006-12-20             315.2255    616.5057
##    tail_length_cm tarsus_length_cm claw_length_cm deployment_id dragon_id
## 1       595.27056        121.65175      15.596219           274       D96
## 2       104.22414         38.10844       4.305086           260      D400
## 3       542.56700        172.43663      14.809363            98      D317
## 4       596.44195        114.05057      11.985672           285      D484
## 5       115.89640         42.18657       5.041634           126      D285
## 6       148.36555         43.57849       4.576640           291      D256
## 7       666.42455        147.44219      13.189233           145      D283
## 8       140.43424         43.60513       4.248564           128      D213
## 9        86.63968         39.26696       6.077829           114       D41
## 10      117.31209         31.70299       4.980824             1      D486
##    tag_id start_deployment end_deployment
## 1  H83101       2012-10-10     2013-06-02
## 2  S49621       2003-07-03     2004-01-17
## 3  A49007       2016-09-05     2017-01-09
## 4  V63425       2016-12-04     2017-02-19
## 5  B74989       2016-03-23     2016-11-08
## 6  O66678       2013-09-07     2013-11-10
## 7  G59298       2007-06-21     2007-11-14
## 8  F64333       2001-12-12     2002-02-05
## 9  X71699       2016-11-03     2017-05-10
## 10 N33529       2006-12-20     2007-07-04

A full join keeps all records from both the left and the right table, matching them whenever possible. That is, we get in output all the dragons, with blanks in the morphometrics table for those that were only tracked and blanks in the deployments table for those that were only measured. Unfortunately, SQLite does not support full joins, so we cannot demonstrate it here, but it’s important to know that a full join exists and what it does because you’ll find yourself using it in any other database management program or even in R (see Chapter 13).

6.2.12 Nested SELECT statements

Now that we know how to use joins, we can go back to our challenge of calculating the average wingspan for male versus female dragons. What we need to do is add the sex column to the morphometrics table. We can do this by looking at the dragon ID, which is the column the two tables have in common. So the join clause will look like this:

SELECT morphometrics.dragon_id, wingspan_cm, sex
FROM morphometrics -- this is our left table
LEFT JOIN dragons -- this is our left table 
ON morphometrics.dragon_id = dragons.dragon_id -- this is the shared column
LIMIT 10;
##    dragon_id wingspan_cm  sex
## 1        D96   1389.5527    F
## 2       D400    634.9109 <NA>
## 3       D316   1052.3702    M
## 4       D317   1356.8084    F
## 5       D484   1720.8641    M
## 6       D149   1533.5991    M
## 7       D285    698.8157 <NA>
## 8       D256    652.0053 <NA>
## 9       D283   1387.1943    F
## 10      D213    670.5283 <NA>

By using a left join, we are saying that we want to keep all rows from the left table (morphometrics) while adding information on sex whenever available. Now we can compute the average wingspan broken down by sex:

SELECT sex, AVG(wingspan_cm) AS mean_wingspan
FROM (
    SELECT morphometrics.dragon_id, wingspan_cm, sex
    FROM morphometrics 
    LEFT JOIN dragons 
    ON morphometrics.dragon_id = dragons.dragon_id
    )
GROUP BY sex; 
##    sex mean_wingspan
## 1 <NA>      645.3158
## 2    F     1550.0009
## 3    M     1549.1300

The one above is a nested query, which means there are two SELECT statements nested within one another. The inner SELECT statement (everything between the parentheses) is treated as if it were a table in the database. SQL will execute the query from the inside out, so even though that table physically does not exist, it gets temporarily created when the inner query is run and is then used in input to the second query.

6.2.13 Order of operations

The fact that SQL will execute nested queries from the inside out is one aspect of order of operations. But there’s also a logic to the order in which SQL executes clauses within a query: first, it gathers the data, then it filters it and aggregates it as needed, then it computes results, and finally it sorts them and truncate them if necessary. Let’s look at this step by step:

  1. First, SQL executes the FROM clause and any JOIN clauses, if present, to determine what tables are being queried.
  2. Then, it will execute the WHERE clause to filter the data.
  3. Then, it will GROUP BY the column/s of interest.
  4. Then it will calculate any derived values based on aggregate functions.
  5. Then, it will apply any filters specified by HAVING.
  6. Now, and only now, it will execute the SELECT clause!
  7. Once the computations are completed and the output of SELECT is ready, SQL can start refining it. First, it will discard any duplicates if DISTINCT is specified.
  8. Then, it will sort the results based on ORDER BY.
  9. Finally, it will truncate the output if a LIMIT is specified.

If you think about it, all of this makes logical sense. You can’t select a column if you’re not sure what table to look in. You can’t compute a per-group count before defining the groups. You can’t order results that you haven’t computed yet. But then why do we start our queries with SELECT? And does the order in which we write our queries matter?

The answer to both of those questions is that there is a difference between the logical order of operations and the lexical (or syntactical) order of operations, which is how we actually write queries. Even though these don’t correspond, writing a query in any other order than the following is incorrect (and will return an error):

SELECT 
FROM 
    JOIN 
      ON 
    WHERE 
    GROUP BY 
    HAVING 
    ORDER BY 
    LIMIT;

6.3 Building a database

6.3.1 Creating a new database in SQLite Studio

We are now going to see how to build a database by recreating the dragons database we have been practicing on. Let’s open up SQLite Studio and, on the Database tab, click “Add a database” (or use Ctrl/Cmd + O as a shortcut). Click on the green plus sign, navigate to the folder where you want to save the database, and choose a file name. Click “Save”. Now we need to also choose a name to use internally (this name will appear in the drop-down list of databases within SQLite, but it does not necessarily need to be the same as the filename.) Click “OK” and you should see your new database in the list on the left.

6.3.2 Creating tables

Now let’s open the SQL editor (Tools > SQL editor). Double check that you are working on the correct database (the name of the active database is shown in the toolbar). We are ready to create our first table!

To create a table, we need to specify a name for the table, as well as a name and a data type for each column. The basic format is as follows:


CREATE TABLE table_name (
column_1 data_type,
column_2 data_type,
column_3 data_type,
...
)

Let’s start by creating the dragons table. Remember that each table needs to have a primary key, i.e., a column that serves as the unique identifier of each row. The values in this column need to be unique and cannot be null. In this table, we can use the dragon ID as the primary key:


CREATE TABLE dragons (
dragon_id varchar(5),
sex char(1),
age_class varchar(8),
species varchar(50),
PRIMARY KEY (dragon_id)
);

Notice that I specified a number of digits for each character variable. Because the format of the dragon ID is “D” + number from 1 up to 500 (for now), setting the ID as a character string with varying size up to 5 digits allows for IDs from “D1” to “D9999”. So far, we only have 500 individuals, so strictly it would have been enough to set the limit to 4 digits only, but it’s good to have some forethought and allow for growth into the future. At the same time, it seems reasonable to assume we won’t catch 10 thousand dragons in this project, so 5 digits is a good compromise. In reality, space is rarely limiting to the point where 4 or 10 digits makes a difference, so when in doubt err on the side of more room for growth.

While dragon ID, age class, and species have a variable number of digits, sex always has one (it can be “M” or “F”). So we can make this a char instead of a varchar and specify it will have 1 digit.

6.3.3 Adding constraints

We can also add some constraints for the purpose of quality assurance. For example, because the dragon ID is the primary key, we can’t accept null values in this column. Sex is always one of “M” or “F” and age is always one of “Juvenile”, “Subadult”, or “Adult”. In other SQL dialects you can add these after the fact, but in SQLite the only way to add these constraints is at the same time as you create the table. We can delete the table and re-create it by modifying the above syntax as follows:


DROP TABLE dragons;

CREATE TABLE dragons (
dragon_id varchar(5) NOT NULL,
sex char(1) CHECK (sex IN ('M', 'F')),
age_class varchar(8) CHECK (age_class IN ('Juvenile', 'Subadult', 'Adult')),
species varchar(50),
PRIMARY KEY (dragon_id)
);

6.3.4 Order of table creation

Foreign keys are also specified as constraints. Because it’s not possible to enforce constraints as an afterthought, we need to plan the order in which we add tables carefully. If we try to add a foreign key that refers to a table that doesn’t exist yet, we won’t be able to. Let’s take another look at the diagram of our table relationships:

Diagram of the dragons database

Figure 6.2: Diagram of the dragons database

Any table that has one or more foreign keys needs to be added after the related table/s. In our case, there are three “root” tables that do not have any foreign keys: the dragons table, the tags table, and the capture sites table. We can add these three first and then all the others.

The tags table has a tag_id field which is unique and can therefore be used as a primary key. Note that when you make a column the primary key you automatically enforce a unique constraint on that column. However, SQLite does not enforce a not-null constraint on primary keys like most other SQL dialects. Because SQLite doesn’t, we have to do it ourselves:


CREATE TABLE tags (
tag_id char(6) NOT NULL PRIMARY KEY,
brand varchar(50),
status varchar(20)
);

The capture sites table has a site field which contains a 3-letter code that uniquely identifies the site. We can use that as a primary key. The UTM coordinates are numeric values for which we can use the data type called “double precision”, or just “double”. There are several possible choices for numeric data types. “Double precision” and “float” are both suitable data types for storing real numbers, but double precision stores numeric values with higher precision than float. Both of these allow for variable decimal places, whereas the data type “decimal” enforces the same number of decimal places across records. The choice between these data types depends on 1. whether we want the measurements to all have the same number of decimal places or not, and 2. how many significant digits do we expect/care about. Double precision can store up to 15 significant digits (unlike float which can store up to 8), but it also occupies double the space (64 bit instead of 32.) For UTM coordinates, the Y value has a minimum of 7 digits, and we want to keep decimal places because we need the highest precision possible on the position of animals in space. Therefore, we’ll go with double precision:

CREATE TABLE capture_sites (
site char(3) NOT NULL PRIMARY KEY,
utm_x double,
utm_y double
);

Note that I used an alternative syntax to specify the primary key. This is equivalent to the one I used above for the dragons table.

6.3.5 Populating the tables

Now that we added a few tables, we can start to populate the database by loading the data as .csv files. On the Tools tab, click “Import.”

Check that the database is correct and then select the table you want to import data into. For example, let’s start with the dragons table:

Click Next. Now, browse your directory to select the input file. Then check the box that says “First line represents CSV column names”, make sure the field separator is a comma, and set the NULL values to NA (which is how they are encoded in the .csv.) Click Finish.

To check if the table was imported correctly, go back into the query editor and try to look at the first 10 rows:

SELECT * FROM dragons LIMIT 10;
##    dragon_id  sex age_class             species update_timestamp
## 1         D1    F  Subadult     Hebridean Black             <NA>
## 2         D2 <NA>  Juvenile   Romanian Longhorn             <NA>
## 3         D3    F     Adult     Hebridean Black             <NA>
## 4         D4    F     Adult Peruvian Vipertooth             <NA>
## 5         D5 <NA>  Juvenile Ukrainian Ironbelly             <NA>
## 6         D6    F     Adult Norwegian Ridgeback             <NA>
## 7         D7    M     Adult     Hebridean Black             <NA>
## 8         D8 <NA>  Juvenile Peruvian Vipertooth             <NA>
## 9         D9    F     Adult Norwegian Ridgeback             <NA>
## 10       D10    F     Adult  Common Welsh Green             <NA>

If everything looks good, we can go ahead and import data in the other two tables.

6.3.6 Autoincrements as primary keys

Sometimes a table does not contain any columns with unique values. This is the case for all the tables we have left to add. What do we use as primary key in these situations? Adding a column with a serial number that gets automatically updated is a good option. SQLite has something called auto-increment, which is an integer column that automatically increases by 1 each time you add a row to a table. Because it’s incremental, the values of an auto-increment will always be unique, and because it’s automatic, they will never be null. Sounds like a perfect candidate for a primary key.

The problem with using an auto-increment as a primary key arises when you’re trying to import data from a .csv file. If the column with the auto-increment does not already exist in the .csv, SQLite won’t let you import the file into the table because the number of columns does not match. We don’t want to add the auto-increment into the .csv ourselves because that defeats the purpose of having SQLite doing it for us. But if we import the .csv as it is (without primary key) and then add the auto-increment column later, we won’t be able to make it the primary key because SQLite won’t let us. So, what do we do?

One workaround that people have found to solve this issue is to trick SQLite by using a temporary table. This is how the process works:

  1. Create the table the way we want it, with an auto-increment primary key plus all the columns we want to import from the .csv;
  2. Create a temporary table without primary key that only contains the columns from the .csv;
  3. Import the .csv into the temporary table;
  4. Populate the final table by pulling data from the temporary table;
  5. Delete the temporary table.

A little convoluted, but it does the job. Let’s demonstrate this on the captures table. First, we create the table like we want it to look in the end (note that I am adding foreign keys; we’ll go over that part in the next section):

CREATE TABLE captures (
capture_id INTEGER PRIMARY KEY AUTOINCREMENT,
dragon_id varchar(5),
date text,
site char(3),
FOREIGN KEY(dragon_id) REFERENCES dragons(dragon_id)
FOREIGN KEY(site) REFERENCES capture_sites(site)
);

Second, we create a temporary table without the primary key (no need to add foreign keys to this one as we are going to delete it anyway):

CREATE TABLE captures_temp (
dragon_id varchar(5),
date text,
site char(3));

Now on Tools > Import, we upload captures.csv into captures_temp. Then we can populate the final table as follows:

INSERT INTO captures(dragon_id, date, site) SELECT * FROM captures_temp;

And finally we delete our temporary table:

DROP TABLE captures_temp;

We better get familiar with this workflow because we are going to use it for all the other tables now.

6.3.7 Foreign keys

For each of the remaining tables, we will specify one or more foreign keys to enforce the relationships between tables. Each foreign key is the primary key of another table. See for example what we did above: the captures table we just imported contains information on when and where each dragon was captured. This means this table needs to have two foreign keys: the dragon_id column links it to the dragons table and the site column links it to the capture sites table. Now let’s apply the concept to the other tables.

The morphometrics table will have a single foreign key linking it to the dragons table (the dragon ID). The second column is a date – SQLite does not have a dedicated data type for dates. Instead, we stored this as a character string in ISO8601 format (“YYYY-MM-DD HH:MM:SS.SSS”). This time there is no need for double precision because we want to only retain up to the third decimal place and the numbers are not larger than ~3000, so 8 significant digits is sufficient. We’ll use float as the data type for the measurements. Because individuals may have been measured multiple times, none of the existing columns are unique. This means we’ll create a serial number to use as the primary key. We’ll use the same trick as above:

CREATE TABLE morphometrics (
measurement_id INTEGER PRIMARY KEY AUTOINCREMENT,
dragon_id varchar(5),
date text,
total_body_length_cm float,
wingspan_cm float,
tail_length_cm float,
tarsus_length_cm float,
claw_length_cm float,
FOREIGN KEY (dragon_id) REFERENCES dragons(dragon_id)
);

CREATE TABLE morphometrics_temp (
dragon_id varchar(5),
date text,
total_body_length_cm float,
wingspan_cm float,
tail_length_cm float,
tarsus_length_cm float,
claw_length_cm float
);

Now on Tools > Import, we upload morphometrics.csv into morphometrics_temp. Then we populate the final table:

INSERT INTO morphometrics(dragon_id, date, total_body_length_cm, wingspan_cm,
tail_length_cm, tarsus_length_cm, claw_length_cm) 
SELECT * FROM morphometrics_temp;

And delete our temporary table:

DROP TABLE morphometrics_temp;

The diet table contains repeated sample IDs and repeated item IDs within each sample. We’ll need a serial number here too because none of the columns are unique. The item ID is an integer so we are going to use a new numeric data type for it. The foreign key will be, again, the dragon ID referring to the dragons table:

CREATE TABLE diet (
diet_id INTEGER PRIMARY KEY AUTOINCREMENT,
dragon_id varchar(5),
sample_id varchar(8),
date text,
item_id integer,
item varchar(50),
FOREIGN KEY (dragon_id) REFERENCES dragons(dragon_id)
);

CREATE TABLE diet_temp (
dragon_id varchar(5),
sample_id varchar(8),
date text,
item_id integer,
item varchar(50)
);

Upload diet.csv into diet_temp.

INSERT INTO diet(dragon_id, sample_id, date, item_id, item) 
SELECT * FROM diet_temp;

DROP TABLE diet_temp;

The deployments table assigns a tag to each individual within a certain period of time. The dragon_id column will be the foreign key that links it to the dragons table, and the tag_id column will link it to the tags table. The start and end deployment dates will be stored as ISO8601 text. Again, we’ll need a serial number to use as a primary key:


CREATE TABLE deployments (
deployment_id INTEGER PRIMARY KEY AUTOINCREMENT,
dragon_id varchar(5),
tag_id char(6),
start_deployment text,
end_deployment text,
FOREIGN KEY(dragon_id) REFERENCES dragons(dragon_id)
FOREIGN KEY(tag_id) REFERENCES tags(tag_id)
);

CREATE TABLE deployments_temp (
dragon_id varchar(5),
tag_id char(6),
start_deployment text,
end_deployment text
);

Upload deployments.csv into deployments_temp.

INSERT INTO deployments(dragon_id, tag_id, start_deployment, end_deployment) 
SELECT * FROM deployments_temp;

DROP TABLE deployments_temp;

Now we can input the telemetry data. This table contains the raw tracking data as we download it from the tags. We’ll need a serial number to uniquely identify each record, and we’ll add the tag ID as the foreign key to the tags table:

CREATE TABLE gps_data_raw (
gps_id INTEGER PRIMARY KEY AUTOINCREMENT,
tag_id char(6),
timestamp text, 
utm_x double,
utm_y double,
FOREIGN KEY(tag_id) REFERENCES tags(tag_id)
);

CREATE TABLE gps_data_raw_temp (
tag_id char(6),
timestamp text, 
utm_x double,
utm_y double
);

Upload telemetry_raw.csv into gps_data_raw_temp.

INSERT INTO gps_data_raw(tag_id, timestamp, utm_x, utm_y) 
SELECT * FROM gps_data_raw_temp;

DROP TABLE gps_data_raw_temp;

6.3.8 Crossing existing information to derive new tables

The raw GPS data table does not give us any information about which animal each location corresponds to; all we know is the tag ID. Each tag does not correspond to an individual, because some tags are reused on multiple dragons. So how do we make these data usable? How do we know who is who? To associate each location to the correct animal, we need to know who was wearing the tag at the time that location was taken. There is a very elegant solution to our problem, as we can pull the range of dates an individual was wearing a certain tag from the deployments table, cross those dates with the dates in the raw GPS data, and create an updated telemetry table where each location is assigned to the correct individual. No manual work involved. Are you ready for some magic?! First, we create the table structure:


CREATE TABLE gps_data (
loc_id INTEGER PRIMARY KEY,
tag_id char(6),
dragon_id varchar(5),
timestamp text,
utm_x double,
utm_y double,
FOREIGN KEY (tag_id) REFERENCES tags(tag_id)
FOREIGN KEY (dragon_id) REFERENCES dragons(dragon_id)
);

And then we populate it by pulling information from the raw GPS and deployment tables. Locations are assigned to the individual that was wearing the tag at the time based on the WHERE clause:

INSERT INTO gps_data (
tag_id, dragon_id, timestamp, utm_x, utm_y)
SELECT
deployments.tag_id,
deployments.dragon_id,
gps_data_raw.timestamp,
gps_data_raw.utm_x,
gps_data_raw.utm_y
FROM deployments LEFT JOIN gps_data_raw USING (tag_id)
WHERE gps_data_raw.tag_id = deployments.tag_id AND
(
    (
    (strftime(gps_data_raw.timestamp) >= strftime(deployments.start_deployment)) AND
    (strftime(gps_data_raw.timestamp) <= strftime(deployments.end_deployment))
    )
OR 
    (
    (gps_data_raw.timestamp >= deployments.start_deployment) AND
    (deployments.end_deployment IS NULL)
    )
);

Note that, because we populated this table with data from other existing tables, we ended up using INSERT INTO anyway and there was no need to use our trick with a temporary table.

It should now be apparent that keeping the deployments table correctly filled out with no gaps or errors is of vital importance for the integrity of the whole telemetry database. Having the database set up this way means we never have to manually assign locations to animals, which would for sure lead to errors, but it also means that any analysis downstream hinges on keeping the deployments table up-to-date with new captures, tag retrievals, and deaths. This is just one example of how databases can save us time and ensure data integrity with minimal routine effort, but only if the data is curated with care in the first place.