Chapter 4 Random Forest

4.1 Model 1

The third model is Random Forest. This model takes decision trees and averages the outcomes of a lot of different decision trees to make the model.

##             is.public.domain
## is.highlight False True
##        False  843   648
##        True   303  1180

## [1] "The accuracy on the training dataset is: 0.835238735709482"

## [1] "The accuracy on the testing dataset is: 0.719086021505376"

The random forest model improves on the accuracy of both the decision tree and the OneR model. This model has an accuracy of .83 on the training set, and .73 on the test set.

This plot shows the bounds of the MSE or error rate on the prediction for all 500 calculated trees. The red is for the lower bound, while the green is for the upper bound.

4.2 Feature Importance

How is random forest making these decisions?

Using the feature importance function, which calculates the mean decrease in accuracy when a feature is permuted (randomly scrambled), we find that the department is the most important factor in whether an artwork is a highlight, followed by classification and public domain. This is different from our previous models, because they focused on public domain and classification as the major decisions up until this point.

4.3 Model 2

Random Forest - Smaller, Cleaned Dataset

## [1] "The accuracy on the training dataset is: 0.915265635507734"

## [1] "The accuracy on the testing dataset is: 0.866935483870968"

4.4 LIME

The description of this artpiece is as follows: This is one of a small group of distinctive vessels from the Central Piedmont region of North Carolina. With its exuberant slip decoration and ample size, it is one of the finest of the known surviving pots from that area

The description of this artwork is as follows: Baule peoples and their neighbors to the West, the Guro, are famous as weavers, and are known for their fine indigo-and-white cotton fabrics. Used on the traditional narrow-band loom, heddle pulleys are functional objects used to ease the movements of the heddles while separating the warp threads and allowing the shuttle to seamlessly pass through the layers of thread. Like many other carved objects used in everyday activities among the Baule, these pulleys were often embellished for the weaver’s delight. Scholars have suggested that the prominent display of pulleys, hanging over the weaver’s loom in the public place, afforded artists their best opportunity to showcase their carving skills, in the hope to attract commissions for figures and masks. This example, distinctive for its Janus representation and conical hairdos, demonstrates the efforts put by Baule carvers into beautifying the simplest functional object.

It is interesting to compare these two artworks. The first, the American sugarpot, is a highlight, while the second, the Baule sculpture are both old, sculpted works in the Met collection. Both of these artworks do not have a value for the city and unknown artists. The major aspect of the classification that differs between these works, as we can tell from the LIME graphs, is the age. The first artwork is made in the 1820s-1840s, yet the second has a much larger range of time it could’ve been made in: from 1800 to 2000.

While this does not directly show a discrimination based on region, such as American art vs African art, it definitely shows a discrimination based on perceived age and range of age. This is interesting because societies with less specific record-keeping of art age or where the Met cannot tell the age might be a predictor of the artwork not being a highlight.

References:
- https://www.r-bloggers.com/2021/04/random-forest-in-r/
- https://stats.stackexchange.com/questions/493714/random-forest-variable-importance-plot-discrepancy