Curious Nacho

Predicting genres of 45,000 Project Gutenberg books using NLP - BoW Approach

Project Gutenberg is a website that offers more than 58,000 free eBooks for which U.S. copyright have expired. It is very interesting text data for Natural Language Processing (NLP), as it is a huge body of text with pretty reliable labeling such as genre, author, publication year etc… Here, I’ll attempt to process approximately 45,000 English books from Project Gutenberg in order to find patterns between words and the genre of the books using a Bag-of-Words (BoW) approach.

Jupyter Notebook driven blog post

Jupyter notebook is a great tool for data science, and with effective use of nbconvert it’s also great for blogging static pages such as by using Jekyll.

How are people riding Divvy bikes in Chicago on a Monday? (Part 2)

In part 1, we looked at some basic stats for Divvy stations and how the bike counts change over time on Monday, October 8th, 2018.

How are people riding Divvy bikes in Chicago on a Monday? (Part 1)

Divvy is Chicago’s bike share program which works the same way as NYC’s Citi Bike and Washington DC’s Capital Bikeshare. You undock a bike from a station, ride it for 30 minutes until you dock it back into any station, with a fee.

How I got my Gmail inbox count down by 50% using Python

My Gmail inbox was counting close to 30,000. Not a pretty number at all, so I decided I should do something about it.