class: center, middle, inverse, title-slide # R & Tidyverse
ggplot2
## Una introduccion a la visualizacion de datos ### Paul Efren Santos Andrade
@PaulEfrenSantos
### 19 August, 2019 --- class: fullscreen, inverse, top, center, text-white --- layout: false class: inverse center middle text-white ![](https://user-images.githubusercontent.com/6571451/51793018-06c26380-216f-11e9-997a-7a08066ae4e2.gif) --- layout: false class: inverse center middle text-white ![](images/hex-ggplot2.png) --- layout: true # Los paquetes --- **Simple**: instalar [tidyverse](http://tidyverse.org) ```r install.packages('tidyverse') ``` **Medio**: instalar unicamente `ggplot2` ```r install.pacakages('ggplot2') ``` **Avanzado**: instalar desde GitHub ```r devtools::install_github('tidyverse/ggplot2') ``` --- layout: false ## Tidyverse ```r library(tidyverse) ``` ``` ## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 -- ``` ``` ## v ggplot2 3.2.1 v purrr 0.3.2 ## v tibble 2.1.3 v dplyr 0.8.3 ## v tidyr 0.8.3 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ``` ``` ## -- Conflicts ----------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ``` --- layout: false class: center # Grammar of Graphics ![](images/ggraphic.jpg) --- layout: true # *ggplot2* --- .left-column[ ![](images/hadley.jpg) __Hadley Wickham__ ] .right-column[ - Crear representaciones gráficas de nuestros datos es un paso clave para poder comunicar información y hallazgos a otros. - ggplot2 es un sistema para crear gráficos, basado en [The Grammar of Graphics](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448/ref=as_li_ss_tl?ie=UTF8&qid=1477928463&sr=8-1&keywords=the+grammar+of+graphics&linkCode=sl1&tag=ggplot2-20&linkId=f0130e557161b83fbe97ba0e9175c431). - Usted proporciona los datos, le dice a ggplot2 cómo asignar variables, qué caracteristicas mostrar y esta se ocupa de los detalles. ] --- layout: false class: center # ggplot2 ![](images/ggplot2_book.jpg) --- ## Algunas razones - Visualizacion de forma .hl[esquematica] 1. Datos organizados 2. Expresar los datos como elementos visuales 3. scales, guides, axis, labels, theme - Secuencia .hl[comprensible] de la creacion de graficos - Sencilla de .hl[replicar] - Secuencia .hl[concistente] --- layout: false # Con que estamos tratando? <br> `ggplot2` es un gran paquete: .hl[filosofia] + .hl[funciones] <br>...muy bien organizadas -- <br><br> Muchos ejemplos de visualizacion impresionantes <br>...no estan en estas diapositivas -- <br><br> Quiza toquemos muchas ideas muy rapido <br>...pero conoceras **donde** buscar ayuda -- .img-right[![](images/ggplot2_exploratory.png)] .footnote[<https://github.com/allisonhorst/stats-illustrations>] ] --- layout: false class: inverse center middle text-white .font200[<br>Grammar of Graphics] --- # Gramatica de visualizacion? .left-code[ "Una idea clara de lo que deseas mostrar, encamina la construccion de un grafico." .footnote[<http://vita.had.co.nz/papers/layered-grammar.pdf>] ] .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-0-1.png" width="100%" /> ] --- layout: true # Que datos estan detras? .left-code[ ### MPG - Manufacturer - Car Type (Class) - City MPG - Highway MPG ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-2-1.png" width="100%" /> ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-3-1.png" width="100%" /> ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-1-1.png" width="100%" /> ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-4-1.png" width="100%" /> ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-5-1.png" width="100%" /> ] --- .right-plot[ <img src="index_files/figure-html/guess-data-from-plot-6-1.png" width="100%" /> ] --- .right-plot[ <table> <thead> <tr> <th style="text-align:left;"> manufacturer </th> <th style="text-align:left;"> class </th> <th style="text-align:right;"> cty </th> <th style="text-align:right;"> hwy </th> <th style="text-align:left;"> model </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> audi </td> <td style="text-align:left;"> compact </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 27 </td> <td style="text-align:left;"> a4 </td> </tr> <tr> <td style="text-align:left;"> audi </td> <td style="text-align:left;"> compact </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 25 </td> <td style="text-align:left;"> a4 quattro </td> </tr> <tr> <td style="text-align:left;"> ford </td> <td style="text-align:left;"> suv </td> <td style="text-align:right;"> 12 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:left;"> expedition 2wd </td> </tr> <tr> <td style="text-align:left;"> ford </td> <td style="text-align:left;"> suv </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:left;"> explorer 4wd </td> </tr> <tr> <td style="text-align:left;"> toyota </td> <td style="text-align:left;"> suv </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:left;"> 4runner 4wd </td> </tr> <tr> <td style="text-align:left;"> toyota </td> <td style="text-align:left;"> compact </td> <td style="text-align:right;"> 21 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:left;"> camry solara </td> </tr> <tr> <td style="text-align:left;"> toyota </td> <td style="text-align:left;"> compact </td> <td style="text-align:right;"> 28 </td> <td style="text-align:right;"> 37 </td> <td style="text-align:left;"> corolla </td> </tr> <tr> <td style="text-align:left;"> toyota </td> <td style="text-align:left;"> suv </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:left;"> land cruiser wagon 4wd </td> </tr> </tbody> </table> ] --- layout: false # Argumentos .font120[ - **Data** datos a ser graficados ] -- .font120[ - **.hlb[Geom]etric ** las formas que se mostraran ] -- .font120[ - **.hlb[Aes]thetic mappings** los datos a ser representados ] -- .font120[ - **.hlb[Stat]istics** datos procesados para ser visualizados ] -- .font120[ - **.hlb[Coord]inates** dispocision de los elementos ] -- .font120[ - **.hlb[Scale]s** limites en los cuales varia los **aes()** ] -- .font120[ - **.hlb[Facet]s** Produce multiples paneles ] --- layout: true # Gramatica de visualizacion .left-column[ ### Data - Los datos ```r ggplot(data) ``` ] --- .right-column[ #### **Tidy** Data 1. Cada variable una a .hl[columna] 2. Cada observaCion es una .hl[fila] 3. Cada mediocion es una unidad de la observacion ] --- .right-column[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:right;"> 1997 </th> <th style="text-align:right;"> 2002 </th> <th style="text-align:right;"> 2007 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Canada </td> <td style="text-align:right;"> 30.30584 </td> <td style="text-align:right;"> 31.90227 </td> <td style="text-align:right;"> 33.39014 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:right;"> 1230.07500 </td> <td style="text-align:right;"> 1280.40000 </td> <td style="text-align:right;"> 1318.68310 </td> </tr> <tr> <td style="text-align:left;"> United States </td> <td style="text-align:right;"> 272.91176 </td> <td style="text-align:right;"> 287.67553 </td> <td style="text-align:right;"> 301.13995 </td> </tr> </tbody> </table> ] -- .right-column[ ```r tidy_pop <- gather(messy_pop, 'year', 'pop', -country) ``` <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> year </th> <th style="text-align:right;"> pop </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Canada </td> <td style="text-align:left;"> 1997 </td> <td style="text-align:right;"> 30.306 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:left;"> 1997 </td> <td style="text-align:right;"> 1230.075 </td> </tr> <tr> <td style="text-align:left;"> United States </td> <td style="text-align:left;"> 1997 </td> <td style="text-align:right;"> 272.912 </td> </tr> <tr> <td style="text-align:left;"> Canada </td> <td style="text-align:left;"> 2002 </td> <td style="text-align:right;"> 31.902 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:left;"> 2002 </td> <td style="text-align:right;"> 1280.400 </td> </tr> <tr> <td style="text-align:left;"> United States </td> <td style="text-align:left;"> 2002 </td> <td style="text-align:right;"> 287.676 </td> </tr> <tr> <td style="text-align:left;"> Canada </td> <td style="text-align:left;"> 2007 </td> <td style="text-align:right;"> 33.390 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:left;"> 2007 </td> <td style="text-align:right;"> 1318.683 </td> </tr> <tr> <td style="text-align:left;"> United States </td> <td style="text-align:left;"> 2007 </td> <td style="text-align:right;"> 301.140 </td> </tr> </tbody> </table> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Estetica - **Aesthetics** ```r + aes() ``` ] --- .right-column[ Informacion a ser visualizada - year - pop - country ] --- .right-column[ Informacion a ser visualizada - year → **x** - pop → **y** - country → *shape*, *color*, etc. ] --- .right-column[ Informacion a ser visualizada ```r aes( x = year, y = pop, color = country ) ``` ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data - Datos ### Aesthetics - Estetica ### Geoms - Formas ```r + geom_*() ``` ] --- .right-column[ Formas geometricas que se mostraran en el grafico <img src="index_files/figure-html/geom_demo-1.png" width="650px" /> ] --- .right-column[ [**geom_** mas usuales](https://eric.netlify.com/2017/08/10/most-popular-ggplot2-geoms/) .font70.center[ | Tipo | Funcion | |:----:|:--------:| | Punto | `geom_point()` | | Linea | `geom_line()` | | Barra | `geom_bar()`, `geom_col()` | | Histograma | `geom_histogram()` | | Regrecion | `geom_smooth()` | | Boxplot | `geom_boxplot()` | | Texto | `geom_text()` | | Linea Vertical | `geom_vline()` | | Linea Horizontal | `geom_hline()` | | Columnas | `geom_col()` | <https://eric.netlify.com/2017/08/10/most-popular-ggplot2-geoms/> ] ] --- .right-column[ Visisten <http://ggplot2.tidyverse.org/reference/> para mas opciones .font70[ ``` ## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin2d" ## [5] "geom_blank" "geom_boxplot" "geom_col" "geom_contour" ## [9] "geom_count" "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" "geom_errorbar" ## [17] "geom_errorbarh" "geom_freqpoly" "geom_hex" "geom_histogram" ## [21] "geom_hline" "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" "geom_point" ## [29] "geom_pointrange" "geom_polygon" "geom_qq" "geom_qq_line" ## [33] "geom_quantile" "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" "geom_sf_label" ## [41] "geom_sf_text" "geom_smooth" "geom_spoke" "geom_step" ## [45] "geom_text" "geom_tile" "geom_violin" "geom_vline" ``` ] ] -- .right-column[ <img src="images/geom.gif" width="200px" style="float: right; margin-right: 100px; margin-top: -25px;"> O solo escriban `geom_` en la consola de RStudio ] --- layout: true # Nuestro primer grafico! --- .left-code[ ```r ggplot(tidy_pop) ``` ] .right-plot[ <img src="index_files/figure-html/first-plot1a-out-1.png" width="100%" /> ] --- .left-code[ ```r ggplot(tidy_pop) + * aes(x = year, * y = pop) ``` ] .right-plot[ <img src="index_files/figure-html/first-plot1b-out-1.png" width="100%" /> ] --- .left-code[ ```r ggplot(tidy_pop) + aes(x = year, y = pop) + * geom_point() ``` ] .right-plot[ <img src="index_files/figure-html/first-plot1c-out-1.png" width="100%" /> ] --- .left-code[ ```r ggplot(tidy_pop) + aes(x = year, y = pop, * color = country) + geom_point() ``` ] .right-plot[ <img src="index_files/figure-html/first-plot1-out-1.png" width="100%" /> ] --- .left-code[ ```r ggplot(tidy_pop) + aes(x = year, y = pop, color = country) + geom_point() + * geom_line() ``` .font80[ ```r geom_path: Cada grupo esta formado solo por una observacion. Es necesario especificar el argumento group en aes() ``` ] ] .right-plot[ <img src="index_files/figure-html/first-plot2-fake-out-1.png" width="100%" /> ] --- .left-code[ ```r ggplot(tidy_pop) + aes(x = year, y = pop, color = country) + geom_point() + geom_line( * aes(group = country)) ``` ] .right-plot[ <img src="index_files/figure-html/first-plot2-out-1.png" width="100%" /> ] --- .left-code[ ```r g <- ggplot(tidy_pop) + aes(x = year, y = pop, color = country) + geom_point() + geom_line( aes(group = country)) g ``` ] .right-plot[ <img src="index_files/figure-html/first-plot3-out-1.png" width="100%" /> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### data ### aes() ### geom_*() ```r + geom_*() ``` ] --- .right-column[ ```r geom_*(mapping, data, stat, position) ``` - `data` cada geom puede tener su propia data - Mostrada en un sistema unico de coordenadas - `map` geom_*, argumentos en aes() - geom_ puede usar aes globales - Algunos geom_ tienen argumentos especificos - `geom_point` requiere `x` - `y`, opcional `shape`, `color`, `size`, etc. - `geom_ribbon` requiere `x`, `ymin` - `ymax`, opcional `fill` - `?geom_ribbon` ] --- .right-column[ ```r geom_*(mapping, data, position) ``` - `position` Ajusta lapocision de los objetos - `'dodge'`, `'stack'`, `'jitter'` ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ```r +facet_wrap() +facet_grid() ``` ] --- .right-column[ ```r g + facet_wrap(~ country) ``` <img src="index_files/figure-html/geom_facet-1.png" width="90%" /> ] --- .right-column[ ```r g + facet_grid(continent ~ country) ``` <img src="index_files/figure-html/geom_grid-1.png" width="90%" /> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ```r + labs() ``` ] --- .right-column[ ```r g + labs(x = "Year", y = "Population") ``` <img src="index_files/figure-html/labs-ex-1.png" width="90%" /> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ### Coords ```r + coord_*() ``` ] --- .right-column[ ```r g + coord_flip() ``` <img src="index_files/figure-html/coord-ex-1.png" width="90%" /> ] --- .right-column[ ```r g + coord_polar() ``` <img src="index_files/figure-html/coord-ex2-1.png" width="90%" /> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ### Coords ### Scales ```r + scale_*_*() ``` ] --- .right-column[ `scale` + `_` + `<aes>` + `_` + `<type>` + `()` Que parametr deseas modificar? → `<aes>` <br> De que tipo es el parametro? → `<type>` - Eje x discreto<br>`scale_x_discrete()` - Ajustar los ejes a escala logaritmica<br>`scale_y_log10()` - Para especificar los colores de forma manual<br>`scale_fill_discrete()`<br>`scale_color_manual()` ] --- .right-column[ ```r g + scale_color_manual(values = c("peru", "pink", "plum")) ``` <img src="index_files/figure-html/scale_ex1-1.png" width="90%" /> ] --- .right-column[ ```r g + scale_y_log10() ``` <img src="index_files/figure-html/scale_ex2-1.png" width="90%" /> ] --- .right-column[ ```r g + scale_x_discrete(labels = c("MCMXCVII", "MMII", "MMVII")) ``` <img src="index_files/figure-html/scale_ex4-1.png" width="90%" /> ] --- layout: true # Gramatica de Visualizacion .left-column[ ### Data ### Aesthetics ### Geoms ### Facet ### Labels ### Coords ### Scales ### Theme ```r + theme() ``` ] --- .right-column[ Se puede manipular la apariencia del grafico Algunas opciones dentro de `ggplot2` - `g + theme_bw()` - `g + theme_dark()` - `g + theme_gray()` - `g + theme_light()` - `g + theme_minimal()` ] --- .right-column[ Se tiene un gran numero de parametros,<br>agrupadas por el area de trabajo: - Opciones globales: `line`, `rect`, `text`, `title` - `axis`: x-, y-, title, ticks, lines - `legend`: leyenda - `panel`: area actual de trabajo - `plot`: todo el panel - `strip`: etiquetas de los sub paneles ] --- .right-column[ Para modificar los elementos se cuenta con: - `element_blank()` elimina los elementos - `element_line()` - `element_rect()` - `element_text()` ] --- .right-column[ ```r g + theme_bw() ``` <img src="index_files/figure-html/unnamed-chunk-1-1.png" width="90%" /> ] --- .right-column[ .font80[ ```r g + theme_minimal() + theme(text = element_text(family = "Palatino")) ``` <img src="index_files/figure-html/unnamed-chunk-2-1.png" width="90%" /> ] ] --- .right-column[ Pueden cargar un tema como opcion por defecto con: `theme_set()` ```r my_theme <- theme_bw() + theme( text = element_text(family = "Palatino", size = 12), panel.border = element_rect(colour = 'grey80'), panel.grid.minor = element_blank() ) theme_set(my_theme) ``` Todos los graficos pueden usar ahora este tema! ] --- .right-column[ ```r g ``` <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="90%" /> ] --- .right-column[ ```r g + theme(legend.position = 'bottom') ``` <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="90%" /> ] --- layout: false # Como guardar las imagenes Se tiene la funcion, **ggsave( )** ```r ggsave( filename = "my_plot.png", plot = my_plot, width = 10, height = 8, dpi = 100, device = "png" ) ``` --- layout: false # install.packages("ggalt") .left-code[ ```r library(ggalt) df <- tibble(trt = LETTERS[1 : 10], value = seq(100, 10, by = -10)) ggplot(df, aes(trt, value)) + geom_lollipop() ``` ] .right-plot[ <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] --- layout: false # install.packages("gqqplotr") .left-code[ ```r library(qqplotr) set.seed(0) smp <- data.frame(norm = rnorm(100)) # Normal Q-Q plot of Normal data gg <- ggplot(data = smp, mapping = aes(sample = norm)) + stat_qq_band() + stat_qq_line() + stat_qq_point() gg + labs(x = "Theoretical Quantiles", y = "Sample Quantiles") ``` ] .right-plot[ <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> ] --- ### Donde buscar ayuda - **ggplot2 docs:** <http://ggplot2.tidyverse.org/> - **R4DS - Data visualization:** <http://r4ds.had.co.nz/data-visualisation.html> - **Hadley Wickham's ggplot2 book:** <https://www.amazon.com/dp/0387981403/> --- class: inverse, center, middle # Gracias!