Demo: Data Science Education with WebAssembly

Linear Regression in R and Python

Overview

The goal of this presentation is to showcase the power of WebAssembly (WASM) in data science education by allowing real-time code execution, visualization, and exercises directly within the slide deck.

We do this by exploring the concept of linear regression using both R and Python code snippets.

Introduction

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables.

This presentation will cover:

Basic Concepts
Implementation in R and Python
Model Evaluation
Assumptions and Diagnostics

Basic Concepts

Linear regression aims to find the best-fitting straight line through the data points.

The general form of a simple linear regression model is:

\[Y = \beta_0 + \beta_1X + \epsilon\]

Where:

\(Y\) is the dependent variable
\(X\) is the independent variable
\(\beta_0\) is the y-intercept
\(\beta_1\) is the slope
\(\epsilon\) is the error term

Generating Data

Let’s look at how to implement linear regression in R and Python by first simulating some data

R
Python

Guessing the Coefficients

Try to fit a linear regression model by hand through manipulating coefficients below:

The linear regression with \(\beta_0 =\) and \(\beta_1 =\) is:

import {Tangle} from "@mbostock/tangle"

// Setup Tangle reactive inputs
viewof beta_0 = Inputs.input(0);
viewof beta_1 = Inputs.input(1);
beta_0_Tgl = Inputs.bind(Tangle({min: -30, max: 300, minWidth: "1em", step: 1}), viewof beta_0);
beta_1_Tgl = Inputs.bind(Tangle({min: -5, max: 5, minWidth: "1em", step: 0.25}), viewof beta_1);

// draw plot in R
regression_plot(beta_0, beta_1)

Fit Linear Regression Model

Now that we have our data, let’s fit a linear regression model to it:

R
Python

Visualize the Results

We can visualize the data and the regression line to see how well the model fits the data using ggplot2 in R and Matplotlib in Python.

R
Python

Predicting New Values

We can use our linear regression model to make predictions on new data:

R
Python

Your Turn: Predict New Values!

01:30

Create a new data frame with x values 10, 30, and 60, then use the model to predict the corresponding y values.

R
Python

Model Evaluation

We can evaluate the performance of our linear regression model using various metrics:

R
Python

Assumptions

Linear regression relies on several assumptions:

Linearity
Independence
Homoscedasticity
Normality of residuals

Checking Assumptions with Diagnostics Plots

Let’s look at some diagnostic plots:

R
Python

Conclusion

Linear regression is a powerful tool for modeling relationships between variables.
Both R and Python offer robust implementations and diagnostic tools.
Always check assumptions and perform diagnostics to ensure the validity of your model.
Consider more advanced techniques (e.g., multiple regression, polynomial regression) for complex relationships.