This notebook is an example of using R to tackle a prediction problem using machine learning.
The aim is to fit a model that predicts a vehicle’s fuel efficiency - as measured in miles per gallon (MPG) - of vehicles, using the known values of other characteristics of the vehicle.
This particular example was most-immediately inspired by a case study in the excellent free online course Supervised Machine Learning Case Studies in R developed by Julia Silge.
While I consider myself to be a proficient Python programmer generally, I have definitely used R much more in the recent past. R is my preferred language for doing data-centric tasks but I would like to improve my fluency with tools in the Python Data Analysis ecosystem.
In particular, I intend to make a concerted effort to learn the scikit-learn library for Machine Learning in Python. In this notebook, I work through simple examples of using scikit-learn to do linear regression.
Earlier this week, I had made a calendar visualization of some data that I had come across participating in TidyTuesday in December.
The data set was the set of tickets issued for parking violations in the city of Philadelphia, Pennsylvania in 2017. Originally, I had made a few other visualizations and a friend of mine had mentioned that they would be interested in seeing visualizations of the revenues generated by the city due to these parking tickets.
The sf package Using read_sf to read in a GeoJSON file A note on using shapefiles Using ggplot2 with geom_sf First map with geom_sf Adding labels with geom_sf_text Changing the theme of a map Getting some other data Joining an sf object to another dataframe Making a chloropleth map Changing the scale Using the viridis scale Manually setting a continuous scale Using a divergent scale Using a discrete scale Plotting points on the map Interactive maps with plotly This post is about using the ggplot2 package to make simple maps using R.
R packages used Choosing a portfolio Getting the data (from Yahoo Finance) Breaking down the code above Getting monthly returns Filtering the data Calculating returns Adding portfolio weights Ok, now for some useful calculations Standard Deviation of returns Table of Downside Risks Table of Drawdowns SemiDeviation Sharpe ratio Returns summary Visualizing portfolio performance Combining all returns into one object Plotting portfolio returns Visualizing each component compared to the portfolio Using highcharter Computing rolling standard deviations.
Reading data and data manipulation Plot 1: Histogram of ticket issue times (hour) Adding new variables Making the plot Annotating the plot Plot 2: Day of week details Finding total tickets per day Making a faceted line chart Alternative way to make a very similar chart Making an interactive chart for the web with plotly Plot 3: Mapping parking violations by zip code Calculating tickets per day and grouping zip codes Reading in shapefile (.