The decimal point is 1 digit(s) to the left of the |
20 | 00000
19 | 0000000000
18 | 00
17 |
16 |
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 | 0000
2024-09-07
Week THREE
You shouldn’t ask me to look over your homework and make sure that everything is okay. That is because it amounts to pre-grading. If I say it looks okay and then you turn it in and then I take points off for something I didn’t previously notice, you will object, saying “But you said it was okay.” Therefore I don’t want to get into this kind of situation.
On the other hand, it’s okay to come to me with vague questions, like “I don’t get question two.” That opens up the possibility of explaining it better.
You should remove the instructions from the file you turn in. That means remove the first two paragraphs and the last sentence. Leave the questions in and interleave the questions with your answers.
Two people said that California did, even though California only leads in rentals.
One person formatted the document incorrectly, leading to a heading not appearing as a heading. You should always review the work before you turn it in.
I have a reason for this, which involves how I process your files. They should have been named week02exercises.qmd
and week02exercises.html
I will go over the homework on the Monday after it’s due, so I can’t accept late submissions after that. If I accept something between Friday and Monday, it will be with a substantial penalty.
Look at the solution file! There are a lot of tips there!
Loading the project data:
pacman::p_load(tidyverse)
df <- read_csv(paste0(Sys.getenv("STATS_DATA_DIR"),"/amesHousing2011.csv"))
# df <- read_csv("amesHousing2011.csv")
str(df)
spc_tbl_ [2,925 × 82] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Order : num [1:2925] 1498 2738 2446 2667 2451 ...
$ PID : chr [1:2925] "0908154080" "0905427030" "0528320060" "0902400110" ...
$ MSSubClass : chr [1:2925] "020" "075" "060" "075" ...
$ MSZoning : chr [1:2925] "RL" "RL" "RL" "RM" ...
$ LotFrontage : num [1:2925] 123 60 118 90 114 87 NA 60 60 47 ...
$ LotArea : num [1:2925] 47007 19800 35760 22950 17242 ...
$ Street : chr [1:2925] "Pave" "Pave" "Pave" "Pave" ...
$ Alley : chr [1:2925] NA NA NA NA ...
$ LotShape : chr [1:2925] "IR1" "Reg" "IR1" "IR2" ...
$ LandContour : chr [1:2925] "Lvl" "Lvl" "Lvl" "Lvl" ...
$ Utilities : chr [1:2925] "AllPub" "AllPub" "AllPub" "AllPub" ...
$ LotConfig : chr [1:2925] "Inside" "Inside" "CulDSac" "Inside" ...
$ LandSlope : chr [1:2925] "Gtl" "Gtl" "Gtl" "Gtl" ...
$ Neighborhood : chr [1:2925] "Edwards" "Edwards" "NoRidge" "OldTown" ...
$ Condition1 : chr [1:2925] "Norm" "Norm" "Norm" "Artery" ...
$ Condition2 : chr [1:2925] "Norm" "Norm" "Norm" "Norm" ...
$ BldgType : chr [1:2925] "1Fam" "1Fam" "1Fam" "1Fam" ...
$ HouseStyle : chr [1:2925] "1Story" "2.5Unf" "2Story" "2.5Fin" ...
$ OverallQual : num [1:2925] 5 6 10 10 9 7 8 6 10 8 ...
$ OverallCond : num [1:2925] 7 8 5 9 5 9 9 7 5 5 ...
$ YearBuilt : num [1:2925] 1959 1935 1995 1892 1993 ...
$ YearRemod/Add: num [1:2925] 1996 1990 1996 1993 1994 ...
$ RoofStyle : chr [1:2925] "Gable" "Gable" "Hip" "Gable" ...
$ RoofMatl : chr [1:2925] "CompShg" "CompShg" "CompShg" "WdShngl" ...
$ Exterior1st : chr [1:2925] "Plywood" "BrkFace" "HdBoard" "Wd Sdng" ...
$ Exterior2nd : chr [1:2925] "Plywood" "Wd Sdng" "HdBoard" "Wd Sdng" ...
$ MasVnrType : chr [1:2925] "None" "None" "BrkFace" "None" ...
$ MasVnrArea : num [1:2925] 0 0 1378 0 738 ...
$ ExterQual : chr [1:2925] "TA" "TA" "Gd" "Gd" ...
$ ExterCond : chr [1:2925] "TA" "TA" "Gd" "Gd" ...
$ Foundation : chr [1:2925] "Slab" "BrkTil" "PConc" "BrkTil" ...
$ BsmtQual : chr [1:2925] NA "TA" "Ex" "TA" ...
$ BsmtCond : chr [1:2925] NA "TA" "TA" "TA" ...
$ BsmtExposure : chr [1:2925] NA "No" "Gd" "Mn" ...
$ BsmtFinType1 : chr [1:2925] NA "Rec" "GLQ" "Unf" ...
$ BsmtFinSF1 : num [1:2925] 0 425 1387 0 292 ...
$ BsmtFinType2 : chr [1:2925] NA "Unf" "Unf" "Unf" ...
$ BsmtFinSF2 : num [1:2925] 0 0 0 0 1393 ...
$ BsmtUnfSF : num [1:2925] 0 1411 543 1107 48 ...
$ TotalBsmtSF : num [1:2925] 0 1836 1930 1107 1733 ...
$ Heating : chr [1:2925] "GasA" "GasA" "GasA" "GasA" ...
$ HeatingQC : chr [1:2925] "TA" "Gd" "Ex" "Ex" ...
$ CentralAir : chr [1:2925] "Y" "Y" "Y" "Y" ...
$ Electrical : chr [1:2925] "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
$ 1stFlrSF : num [1:2925] 3820 1836 1831 1518 1933 ...
$ 2ndFlrSF : num [1:2925] 0 1836 1796 1518 1567 ...
$ LowQualFinSF : num [1:2925] 0 0 0 572 0 0 0 515 0 0 ...
$ GrLivArea : num [1:2925] 3820 3672 3627 3608 3500 ...
$ BsmtFullBath : num [1:2925] NA 0 1 0 1 0 0 0 0 1 ...
$ BsmtHalfBath : num [1:2925] NA 0 0 0 0 0 0 0 0 0 ...
$ FullBath : num [1:2925] 3 3 3 2 3 3 3 2 3 3 ...
$ HalfBath : num [1:2925] 1 1 1 1 1 0 1 0 1 1 ...
$ BedroomAbvGr : num [1:2925] 5 5 4 4 4 3 4 8 5 4 ...
$ KitchenAbvGr : num [1:2925] 1 1 1 1 1 1 1 2 1 1 ...
$ KitchenQual : chr [1:2925] "Ex" "Gd" "Gd" "Ex" ...
$ TotRmsAbvGrd : num [1:2925] 11 7 10 12 11 10 11 14 10 12 ...
$ Functional : chr [1:2925] "Typ" "Typ" "Typ" "Typ" ...
$ Fireplaces : num [1:2925] 2 2 1 2 1 1 2 0 1 1 ...
$ FireplaceQu : chr [1:2925] "Gd" "Gd" "TA" "TA" ...
$ GarageType : chr [1:2925] "Attchd" "Detchd" "Attchd" "Detchd" ...
$ GarageYrBlt : num [1:2925] 1959 1993 1995 1993 1993 ...
$ GarageFinish : chr [1:2925] "Unf" "Unf" "Fin" "Unf" ...
$ GarageCars : num [1:2925] 2 2 3 3 3 3 3 0 3 3 ...
$ GarageArea : num [1:2925] 624 836 807 840 959 ...
$ GarageQual : chr [1:2925] "TA" "TA" "TA" "Ex" ...
$ GarageCond : chr [1:2925] "TA" "TA" "TA" "TA" ...
$ PavedDrive : chr [1:2925] "Y" "Y" "Y" "Y" ...
$ WoodDeckSF : num [1:2925] 0 684 361 0 870 302 314 0 204 503 ...
$ OpenPorchSF : num [1:2925] 372 80 76 260 86 0 12 110 34 36 ...
$ EnclosedPorch: num [1:2925] 0 32 0 0 0 0 0 0 0 0 ...
$ 3SsnPorch : num [1:2925] 0 0 0 0 0 0 0 0 0 0 ...
$ ScreenPorch : num [1:2925] 0 0 0 410 210 0 0 0 0 210 ...
$ PoolArea : num [1:2925] 0 0 0 0 0 0 0 0 0 0 ...
$ PoolQC : chr [1:2925] NA NA NA NA ...
$ Fence : chr [1:2925] NA NA NA "GdPrv" ...
$ MiscFeature : chr [1:2925] NA NA NA NA ...
$ MiscVal : num [1:2925] 0 0 0 0 0 0 0 0 0 0 ...
$ MoSold : num [1:2925] 7 12 7 6 5 5 5 3 9 6 ...
$ YrSold : num [1:2925] 2008 2006 2006 2006 2006 ...
$ SaleType : chr [1:2925] "WD" "WD" "WD" "WD" ...
$ SaleCondition: chr [1:2925] "Normal" "Normal" "Normal" "Normal" ...
$ SalePrice : num [1:2925] 284700 415000 625000 475000 584500 ...
- attr(*, "spec")=
.. cols(
.. Order = col_double(),
.. PID = col_character(),
.. MSSubClass = col_character(),
.. MSZoning = col_character(),
.. LotFrontage = col_double(),
.. LotArea = col_double(),
.. Street = col_character(),
.. Alley = col_character(),
.. LotShape = col_character(),
.. LandContour = col_character(),
.. Utilities = col_character(),
.. LotConfig = col_character(),
.. LandSlope = col_character(),
.. Neighborhood = col_character(),
.. Condition1 = col_character(),
.. Condition2 = col_character(),
.. BldgType = col_character(),
.. HouseStyle = col_character(),
.. OverallQual = col_double(),
.. OverallCond = col_double(),
.. YearBuilt = col_double(),
.. `YearRemod/Add` = col_double(),
.. RoofStyle = col_character(),
.. RoofMatl = col_character(),
.. Exterior1st = col_character(),
.. Exterior2nd = col_character(),
.. MasVnrType = col_character(),
.. MasVnrArea = col_double(),
.. ExterQual = col_character(),
.. ExterCond = col_character(),
.. Foundation = col_character(),
.. BsmtQual = col_character(),
.. BsmtCond = col_character(),
.. BsmtExposure = col_character(),
.. BsmtFinType1 = col_character(),
.. BsmtFinSF1 = col_double(),
.. BsmtFinType2 = col_character(),
.. BsmtFinSF2 = col_double(),
.. BsmtUnfSF = col_double(),
.. TotalBsmtSF = col_double(),
.. Heating = col_character(),
.. HeatingQC = col_character(),
.. CentralAir = col_character(),
.. Electrical = col_character(),
.. `1stFlrSF` = col_double(),
.. `2ndFlrSF` = col_double(),
.. LowQualFinSF = col_double(),
.. GrLivArea = col_double(),
.. BsmtFullBath = col_double(),
.. BsmtHalfBath = col_double(),
.. FullBath = col_double(),
.. HalfBath = col_double(),
.. BedroomAbvGr = col_double(),
.. KitchenAbvGr = col_double(),
.. KitchenQual = col_character(),
.. TotRmsAbvGrd = col_double(),
.. Functional = col_character(),
.. Fireplaces = col_double(),
.. FireplaceQu = col_character(),
.. GarageType = col_character(),
.. GarageYrBlt = col_double(),
.. GarageFinish = col_character(),
.. GarageCars = col_double(),
.. GarageArea = col_double(),
.. GarageQual = col_character(),
.. GarageCond = col_character(),
.. PavedDrive = col_character(),
.. WoodDeckSF = col_double(),
.. OpenPorchSF = col_double(),
.. EnclosedPorch = col_double(),
.. `3SsnPorch` = col_double(),
.. ScreenPorch = col_double(),
.. PoolArea = col_double(),
.. PoolQC = col_character(),
.. Fence = col_character(),
.. MiscFeature = col_character(),
.. MiscVal = col_double(),
.. MoSold = col_double(),
.. YrSold = col_double(),
.. SaleType = col_character(),
.. SaleCondition = col_character(),
.. SalePrice = col_double()
.. )
- attr(*, "problems")=<externalptr>
load()
function creates a data frameread_csv()
function returns a data frameread_csv
, you must read it into a variable, e.g., df<-read_csv("filename")
! If you just say read_csv("filename")
, it will display the data frame, but not save itdf <- load("filename")
or to say read_csv("filename")
Some columns should simply be removed, such as Order and PID. Others are useful as factors. How to tell?
MSSubClass
020 030 040 045 050 060 070 075 080 085 090 120 150 160 180 190
1078 139 6 18 287 571 128 23 118 48 109 192 1 129 17 61
Use it in conjunction with amesHousing2011doc.txt
.
Tidyverse consists of packages, listed at https://www.tidyverse.org/packages/
Tidyverse package website lists several sections for learning: Installation and use, Core tidyverse, Import, Wrangle, and others.
The dplr package includes the following functions for data manipulation
mutate()
adds new columns that are functions of existing columnsselect()
picks columns based on their names.filter()
picks rows based on their values.summarise()
reduces multiple values down to a single summary.arrange()
changes the ordering of the rows.# A tibble: 2,925 × 80
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 020 RL 123 47007 Pave <NA> IR1 Lvl
2 075 RL 60 19800 Pave <NA> Reg Lvl
3 060 RL 118 35760 Pave <NA> IR1 Lvl
4 075 RM 90 22950 Pave <NA> IR2 Lvl
5 060 RL 114 17242 Pave <NA> IR1 Lvl
6 075 RM 87 18386 Pave <NA> Reg Lvl
7 050 RL NA 14100 Pave <NA> IR1 Lvl
8 190 RH 60 10896 Pave Pave Reg Bnk
9 060 RL 60 18062 Pave <NA> IR1 HLS
10 060 RL 47 53504 Pave <NA> IR2 HLS
# ℹ 2,915 more rows
# ℹ 72 more variables: Utilities <chr>, LotConfig <chr>, LandSlope <chr>,
# Neighborhood <chr>, Condition1 <chr>, Condition2 <chr>, BldgType <chr>,
# HouseStyle <chr>, OverallQual <dbl>, OverallCond <dbl>, YearBuilt <dbl>,
# `YearRemod/Add` <dbl>, RoofStyle <chr>, RoofMatl <chr>, Exterior1st <chr>,
# Exterior2nd <chr>, MasVnrType <chr>, MasVnrArea <dbl>, ExterQual <chr>,
# ExterCond <chr>, Foundation <chr>, BsmtQual <chr>, BsmtCond <chr>, …
dfClasses <- read_tsv("classes.tsv")
(dfPriceByClass <- df |>
select(c(MSSubClass,SalePrice)) |>
group_by(MSSubClass) |>
summarize(avPriceByClass = mean(SalePrice),n=n()) |>
arrange(desc(avPriceByClass)) |>
inner_join(dfClasses))
# A tibble: 16 × 4
MSSubClass avPriceByClass n subClassDescr
<chr> <dbl> <int> <chr>
1 060 237810. 571 2-STORY 1946 & NEWER
2 120 208019. 192 1-STORY PUD (Planned Unit Development) - 194…
3 075 199978. 23 2-1/2 STORY ALL AGES
4 020 187359. 1078 1-STORY 1946 & NEWER ALL STYLES
5 080 168009. 118 SPLIT OR MULTI-LEVEL
6 070 156526. 128 2-STORY 1945 & OLDER
7 085 149842. 48 SPLIT FOYER
8 150 148400 1 1-1/2 STORY PUD - ALL AGES
9 040 144917. 6 1-STORY W/FINISHED ATTIC ALL AGES
10 090 139809. 109 DUPLEX - ALL STYLES AND AGES
11 050 137433. 287 1-1/2 STORY FINISHED ALL AGES
12 160 137080. 129 2-STORY PUD - 1946 & NEWER
13 190 125870. 61 2 FAMILY CONVERSION - ALL STYLES AND AGES
14 045 111783. 18 1-1/2 STORY - UNFINISHED ALL AGES
15 180 107671. 17 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
16 030 96727. 139 1-STORY 1945 & OLDER
By default, Quarto files end in .qmd
, although other extensions will work. When you feed a .qmd
file to RStudio, it assumes that it’s a Quarto file and opens it accordingly.
A quarto file just contains plain text, no binary information. It can be read by any text editor, although what they do with it depends on how Quarto-aware the editor is.
For example, an R chunk (prefaced by a blank line followed by three backticks and r
in curly braces) is assumed to be R code. It is syntax-highlighted as R and, in some editors such as RStudio, can be independently executed. In RStudio this is done by clicking a green triangle to the right of the chunk.
Everything not in a code chunk is assumed to be Pandoc-flavored Markdown.
Since Markdown was invented around 2004, many flavors of it have developed. The one we’re using is the one interpreted by the program pandoc
, documented at https://pandoc.org/.
Markdown was originally devised as a shorthand for HTML by a person (Jon Gruber) who was tired of having to write out lengthy HTML constructs for his blog. He wanted something simpler but also readable on its own. By the way, the original Markdown description is still on the web after all these years at https://daringfireball.net/projects/markdown/, although there are many more descriptive sites. What happened in the years since was that (A) people wanted their own shorthand sets, and (B) it turned out to be really easy to write a converter from Markdown to HTML.
You have experienced some of Markdown’s features, such as a blank line followed by two hashtags followed by a space for a level two heading. You might have surrounded a word by asterisks to italicize it, or double asterisks to bold-face it. You might have used straight quotation marks and found them converted to typographical quotation marks (a different opening and closing mark).
You can write inline code in Markdown chunks! Use a single backtick, followed by an r
in curly braces, then the code, then a single backtick. Hence, you can let the data speak instead of laboriously running the file and extracting results from R chunks and manually inserting them into a Markdown chunk.
URLs can be included in Markdown chunks by saying [displayname](actualURL)
. I usually make the displayname
be the actual URL, but you can put anything you want in the displayname
brackets.
Pictures can be included by saying
![caption](filename)
on a line by itself.
END
This slideshow was produced using quarto
Fonts are Roboto Condensed Bold, JetBrains Mono Nerd Font, and STIX2