Linear Regression in R and Python
The goal of this presentation is to showcase the power of WebAssembly (WASM) in data science education by allowing real-time code execution, visualization, and exercises directly within the slide deck.
We do this by exploring the concept of linear regression using both R and Python code snippets.
Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables.
This presentation will cover:
Linear regression aims to find the best-fitting straight line through the data points.
The general form of a simple linear regression model is:
\[Y = \beta_0 + \beta_1X + \epsilon\]
Where:
Let’s look at how to implement linear regression in R and Python by first simulating some data
Try to fit a linear regression model by hand through manipulating coefficients below:
The linear regression with \(\beta_0 =\) and \(\beta_1 =\) is:
import {Tangle} from "@mbostock/tangle"
// Setup Tangle reactive inputs
viewof beta_0 = Inputs.input(0);
viewof beta_1 = Inputs.input(1);
beta_0_Tgl = Inputs.bind(Tangle({min: -30, max: 300, minWidth: "1em", step: 1}), viewof beta_0);
beta_1_Tgl = Inputs.bind(Tangle({min: -5, max: 5, minWidth: "1em", step: 0.25}), viewof beta_1);
// draw plot in R
regression_plot(beta_0, beta_1)
Now that we have our data, let’s fit a linear regression model to it:
We can visualize the data and the regression line to see how well the model fits the data using ggplot2 in R and Matplotlib in Python.
We can use our linear regression model to make predictions on new data:
01:30
Create a new data frame with x
values 10, 30, and 60, then use the model to predict the corresponding y values.
We can evaluate the performance of our linear regression model using various metrics:
Linear regression relies on several assumptions:
Let’s look at some diagnostic plots: