9:00 am: Yo!
9:20 am: Unicode. Is it really a nightmare? Nah. Well maybe a little bit.
9:40 am: Pickling: Save your python objects, as easy as 1-2-3.
9:50 am: Short break
10:00 am: More Web Scraping: Drive your browser with Selenium, fill forms, click buttons, go crazy
10:30 am: Pandas Challenges: Learn pandas by getting your hands dirty with it (One on ones continue from here on as well)
12:00 pm: Hunger Games: Eating Frenzy
1:30 pm: Continue with Pandas challenges and scraping data
Tentative: Linear Regression packages in python
(depending on pandas challenge progression, we'll have this later today or tomorrow)
5:00 pm: The curtain falls
Python HOW TO tutorial for dealing with unicode
Unicode in Wikipedia
w2d2_Web_Scraping_2_Selenium_Webdriver.ipynb (9.5 KB) Here is the ipython notebook for webscraping with selenium webdriver. Open it and we will follow along (you won't be typing along)
To start with selenium, you need to install it:
pip install selenium
You will also need to put the chromedriver file in the same directory as the ipython notebook. Download the corresponding zip file (chromedriver_mac32.zip) for most of you, unzip it and move it to the same directory
You can use this XPATH selector tutorial when you need to construct an xpath selector.
You can also look here.
(source): There are various strategies to locate elements in a page. You can use the most appropriate one for your case. Selenium provides the following methods to locate elements in a page:
<button class="wmd-button" id="wmd-quote-post" title="Quote whole post"></button>
button = driver.find_element_by_id('wmd-quote-post')
w2d2_Pickling_Python_Objects.ipynb (2.5 KB)
Pandas: just getting started? read this guide - [10 minutes to pandas] (http://pandas.pydata.org/pandas-docs/stable/10min.html)
You don't have movie data to work on yet?
Here, you can use some data on the top grossing 100 movies from 2013 (scraped from box office mojo):
2013_movies.csv (7.6 KB)
Challenge 1: plot domestic total gross over time
Challenge 2: plot runtime vs domestic total gross
Challenge 3: group your data by Rating and find the average runtime and domestic total gross at each level of Rating
Challenge 4: make one figure with (N=the number of MPAA ratings there are) subplots, and in each plot the release date vs the domestic total gross
Challenge 5: what director in your dataset has the highest gross per movie?
Challenge 6: bin your dataset into years (if applicable) and make a bar graph with error bars of gross each year