Trees
Tree based methods are a well known modeling technique that are used for both regression and classification. The general idea is that we segment the feature space into individual subspaces. The rules for segmenting the data into their respective subspaces is summarized by the tree, and is why tree methods are sometimes called decision tree methods. Tree methods have numerous different approaches, such as bagging, boosting and random forests.
A Brief History
Tree based methods were first published in the early 1960’s, and have since exploded into a remarkable diversity of techniques and approaches that was aided by the growth of free software and cheaper hardware to implement computations that were challenging to do by hand, but relatively easier for computers. They found themselves sometimes enhancing traditional models such as least squares and logistic regression. If you are interested in a technical overview of the various approaches and a more in depth history, see (2014, Loh).
Code Examples
All of the code examples are written in Python, unless otherwise noted. |
Containers
These are code examples in the form of Jupyter notebooks running in a container that come with all the data, libraries, and code you’ll need to run it. Click here to learn why you should be using containers, along with how to do so. |
Boosting
Explore gradient boosting, a tree-based method, using XGBoost to analyze hotel customer data.
Quickstart: Download Docker, then run the commands below in a terminal. |
#pull container, only needs to be run once
docker pull ghcr.io/thedatamine/starter-guides:boosting
#run container
docker run -p 8888:8888 -it ghcr.io/thedatamine/starter-guides:boosting
Need help implementing any of this code? Feel free to reach out to datamine-help@purdue.edu and we can help!
Resources
All resources are chosen by Data Mine staff to be of decent quality, and most if not all content is free.
Books
Introduction to Statistical Learning (Also known as the "machine learning bible", see Chapter 8 for tree based methods)