Web Scraping Tutorial | William Miller's Projects

Sat 14 April 2018
Web Scraping
William Miller
#tutorial, #web scraping, #BeautifulSoup, #Politifact.com, #Trump, #fact checking

In preparation for a project comparing the veracity of U.S politicians, I have scraped some data from www.PolitiFact.com. Since this is a fairly basic data scraping project, I thought I would take a bit to make this into a tutorial on the subject. My goal here is to write some code that will:

Take the name of any politician on PolitiFact.com in a list
Retrieve the data from Politifact for: the date they made a statement, the text of that statement, the Politifact rating of that statement
Store the resulting data in a dataframe and export to a CSV file

Libraries

First, there are several libraries that I need to import for just about any data scraping project.

"Beautiful Soup" makes navigating HTML much more convenient and is nearly indispensible for this purpose.
I will be using the "get" function from the "requests" library to retrieve the HTML from specific URLs.
It is highly likely in any web scraping project is going to require searching or processing text using regular expressions, so importing the "re" library is also necessary.
Any time I am requesting lot of data from a website, it is necessary to space out the requests you are making in order to avoid looking like I'm instigating a DDoS attack, getting my IP address banned temporarily. I will import the sleep function from the "time" library and "randint" from "random" to let me wait for random time intervals between requests.
As with basically any data project, it is very likely, if not definite, that I will need functions from the "pandas" and "numpy" libraries. I will go ahead and import both.
While this is specific to this particular project, I know that I will be parsing some dates as a part of this, so I will go ahead and load the "parser" function from the "dateutil" library.
Another library I'm importing that's not always necessary is "sys" for the "stdout.write" function. This is due to a quirk of Jupyter notebooks which will not execute a carriage return ('\r') within a print statement. I'm just using this for the sake of convenience, in that I wish to make a progress tracking function that will not gradually fill my screen with progress statements, but will instead overwrite the last progress statement.

from bs4 import BeautifulSoup
import requests
import re

from time import sleep
from random import randint

import numpy as np
import pandas as pd

from dateutil import parser
import sys

Function to show progress

I mentioned above that I will be retrieving a lot of data from a specific website, waiting random amounts of time between requests. This means that there will be a fair amount of waiting time involved in this progress as I step through lists of URLs to retrieve our data. Because of this, I'm going to set up a quick progress-tracking function. It simply takes a list and an element in that list as input, and prints the number of the element entered compared to the whole. For instance if I passed in "c" and "[a,b,c,d,e], this function will print "3 of 5". This give a rough idea of how long the scraping process will take and lets us know it's doing something.

def show_progress(part, whole, addstr = '', **kwargs):
    """
    Input:
    part = element of list
    whole = list
    ---------
    Function:
    Find the number of element "part" is within "whole"
    ---------
    Output:
    Return the string "[nth list element] of [list length]"
    """
    path = len(whole)
    step = whole.index(part) + 1
    progress = str(step) + " of " + str(path)
    return progress

Recognize and exploit patterns in URLs

At this point, I had to go through a process I cannot really show here. I went to Politifact website and viewed which contemporary politicians had fact checking data on their website, and I selected a few of these. I then investigated how to get total number of fact checks for each person, and looked for patterns in the URLs I would need to request.

For instance, I noticed if I clicked on "personalities", then "Donald Trump", and then selected "See all" for statements by Trump, it returned 28 pages of fact checks. I noticed that clicking through these pages returned different URLs, so page 3 had a URL of "http://www.politifact.com/personalities/donald-trump/statements/?page=3&list=speaker". I ensured that I could plug in any page number to the right of "page=", and it would go to that page.

Looking at other politicians, I noticed that their fact-check pages followed the same convention regarding their names, where Donald Trump's page contained "personalities/donald-trump", Barack Obama's page contained "personalities/barack-obama". "Firstname Lastname" was consistently formatted as "firstname-lastname". I can combine this information with the above to return a complete list of fact-checks for any politician listed on PolitiFact.

Format politican names for URLs, set up URL lookup data organization

To use this info, I will make a list of the names of the politician's I want ratings from. My aim is to do this in such a way that I could add any name of any person with fact checks on PolitiFact, and it will retrieve the data on them. I will then write some code to format that list in accordance with that URL name convention I mentioned.

One thing that can be immensely useful in any data scraping project is to set up a dictionary to store information in based on what you're looking up. I will therefore create the dictionary "person_lookup_dict" with each politican's name as the key. I will then initialize a dictionary stored under each of their names that I can the url-formatted names to. Later, I will add additional lookup-related information to this dictionary.

While it can be somewhat trickier at times to add and retrieve information from a dictionary rather than a bunch of separate lists, this will help immensely to keep all of the information I need to use retrieve the correct URLs organized. It will also help ensure that I can add names to the list of people whose fact-check data I wish to retrieve without modifying my code.

# People to retrieve fact check data for from PolitiFact.com.
person_list = ['Donald Trump', 'Barack Obama', 'Mike Pence', 'Paul Ryan',
               'Nancy Pelosi', 'Mitch McConnell', 'Charles Schumer']

# Initialize a dictionary with the names above as keys.
person_lookup_dict = dict.fromkeys(person_list, {})

# Initialize a dictionary under each of these keys containing the URL formatting of each name.
for person in person_lookup_dict:
    person_lookup = person.lower()
    person_lookup = person_lookup.replace(" ", "-")

    person_lookup_dict[person] = {'urlname':person_lookup}

# Show the result.
person_lookup_dict

{'Barack Obama': {'urlname': 'barack-obama'},
 'Charles Schumer': {'urlname': 'charles-schumer'},
 'Donald Trump': {'urlname': 'donald-trump'},
 'Mike Pence': {'urlname': 'mike-pence'},
 'Mitch McConnell': {'urlname': 'mitch-mcconnell'},
 'Nancy Pelosi': {'urlname': 'nancy-pelosi'},
 'Paul Ryan': {'urlname': 'paul-ryan'}}

Retrieve number of pages per person for URL lookup

At this point, I need to delve into the HTML data on page 1 for at least a couple of the people I'm retrieving data on, with the aim of writing code to scrape the total number of pages of fact-checks each person has. This will prevent me from having to manually enter this number for each person, and manually update that number any time I wish to run this code again that the number of pages may have changed.

I start by pulling the HTML source from page 1 for each person. Looking at one of these webpages in my browser, at the bottom of the screen I can see the text "Page 1 of ??" on the bottom of it, and it's a safe bet that is encoded in the HTML and can be retrieved using BeautifulSoup. Looking at the HTML data from page directly (it's easiest to visit the URL in a browser, right-click, and then hit "View Page Source"), and searching for that text, I find that it is contained in a tag "step-links__current". Searching for this, I can find that there are two instance of this per page. Checking other pages, I find that this is consistently the case.

The plan therefore becomes the following:

Request the data for page 1 for each person in our list. (Waiting between each request.)
Find the string "Page 1 of ??"
Process that string to get only the integer for the maximum number of pages.
Store that value for each person in the dictionary, in the dictionary under each name.

for person in person_lookup_dict:
    # Request data from page 1 for each person in list.
    person_url=person_lookup_dict[person]['urlname']
    start_page = requests.get("http://www.politifact.com/personalities/" + person_url + "/statements/?page=1&list=speaker")
    start_soup = BeautifulSoup(start_page.text, 'html.parser')

    # Wait a random amount of time between 10 and 20 seconds.
    #If an error is returned, state the status code and break the loop.
    sleep(randint(10,20))
    if start_page.status_code != 200:
        print('We may have a problem here.', start_page.status_code)
        break

    # Find the string "Page 1 of ??" (contained within tags "class_=...").
    # Process down to integer value of max pages.
    num_page_str = start_soup.find(class_="step-links__current").find_next(class_="step-links__current")
    num_page_sub_str = re.search( r'(\d+) of (\d+)', str(num_page_str), re.M)
    person_lookup_dict[person]['urlpages']= int(num_page_sub_str.group(2))

    # Show progress.
    sys.stdout.write('\r'+ 'step ' + show_progress(person, list(person_lookup_dict.keys())))    

# Show results.
person_lookup_dict

step 7 of 7




{'Barack Obama': {'urlname': 'barack-obama', 'urlpages': 31},
 'Charles Schumer': {'urlname': 'charles-schumer', 'urlpages': 1},
 'Donald Trump': {'urlname': 'donald-trump', 'urlpages': 28},
 'Mike Pence': {'urlname': 'mike-pence', 'urlpages': 3},
 'Mitch McConnell': {'urlname': 'mitch-mcconnell', 'urlpages': 2},
 'Nancy Pelosi': {'urlname': 'nancy-pelosi', 'urlpages': 2},
 'Paul Ryan': {'urlname': 'paul-ryan', 'urlpages': 5}}

Generate URL list

I now have everything I need to generate a list of URLs for each person in the list. I can now iterate through each key (politician name) in the dictionary, retrieve the URL-formatted name and the number of pages for each person, and then generate a list of URLs from that info. I will store the URLs in the same person_lookup_dict that the other data is stored in, just to keep it all consistent and neat.

# For each key in person_lookup_dict, retrieve that data to build correct URLs, then build URLs.
for person in person_lookup_dict:
    person_lookup_dict[person]['urllist'] = []
    person_name_url = person_lookup_dict[person]['urlname']
    for i in range(1, person_lookup_dict[person]['urlpages'] + 1):
        url = "http://www.politifact.com/personalities/" + person_name_url\
        + "/statements/?page="+ str(i) +"&list=speaker"

        # Store URLs in person_lookup_dict.
        person_lookup_dict[person]['urllist'].append(url)

#Show results.
person_lookup_dict

{'Barack Obama': {'urllist': ['http://www.politifact.com/personalities/barack-obama/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=2&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=3&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=4&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=5&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=6&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=7&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=8&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=9&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=10&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=11&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=12&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=13&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=14&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=15&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=16&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=17&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=18&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=19&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=20&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=21&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=22&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=23&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=24&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=25&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=26&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=27&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=28&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=29&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=30&list=speaker',
   'http://www.politifact.com/personalities/barack-obama/statements/?page=31&list=speaker'],
  'urlname': 'barack-obama',
  'urlpages': 31},
 'Charles Schumer': {'urllist': ['http://www.politifact.com/personalities/charles-schumer/statements/?page=1&list=speaker'],
  'urlname': 'charles-schumer',
  'urlpages': 1},
 'Donald Trump': {'urllist': ['http://www.politifact.com/personalities/donald-trump/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=2&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=3&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=4&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=5&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=6&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=7&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=8&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=9&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=10&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=11&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=12&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=13&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=14&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=15&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=16&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=17&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=18&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=19&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=20&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=21&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=22&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=23&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=24&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=25&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=26&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=27&list=speaker',
   'http://www.politifact.com/personalities/donald-trump/statements/?page=28&list=speaker'],
  'urlname': 'donald-trump',
  'urlpages': 28},
 'Mike Pence': {'urllist': ['http://www.politifact.com/personalities/mike-pence/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/mike-pence/statements/?page=2&list=speaker',
   'http://www.politifact.com/personalities/mike-pence/statements/?page=3&list=speaker'],
  'urlname': 'mike-pence',
  'urlpages': 3},
 'Mitch McConnell': {'urllist': ['http://www.politifact.com/personalities/mitch-mcconnell/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/mitch-mcconnell/statements/?page=2&list=speaker'],
  'urlname': 'mitch-mcconnell',
  'urlpages': 2},
 'Nancy Pelosi': {'urllist': ['http://www.politifact.com/personalities/nancy-pelosi/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/nancy-pelosi/statements/?page=2&list=speaker'],
  'urlname': 'nancy-pelosi',
  'urlpages': 2},
 'Paul Ryan': {'urllist': ['http://www.politifact.com/personalities/paul-ryan/statements/?page=1&list=speaker',
   'http://www.politifact.com/personalities/paul-ryan/statements/?page=2&list=speaker',
   'http://www.politifact.com/personalities/paul-ryan/statements/?page=3&list=speaker',
   'http://www.politifact.com/personalities/paul-ryan/statements/?page=4&list=speaker',
   'http://www.politifact.com/personalities/paul-ryan/statements/?page=5&list=speaker'],
  'urlname': 'paul-ryan',
  'urlpages': 5}}

Retrieve and parse data

Here is the real substance to what I'm doing. For the sake of organization, I created a function that parses the data I want to retrieve from the HTML I requested using the generated URLs. This part of the process involves more HTML investigation like that done for finding the page number data above, except here I'm looking for tags that allow be to locate the date a statement was made, the text of that statement, and the truth-rating assigned to it.

I found that the tag "class_='statement__source'" containing the politician's name (the version in the list prior to URL formatting) contained all the data I was look for. Below, I put the HTML containing that data into a list with each statement as an element. I then pass that list to a function that steps through it, scrapes the data I want, and appends only that data to a dictionary (truth_data).

# See code below before parsing function code.
def truth_extractor(person, fact_checks, truth_data):
    """
    Input:
    person = Name of person making the statement in question.

    fact-checks = list of HTML elements containing that fact-check data to be scraped

    truth_data = dictionary of data scraped so far to be appended and returned.
    ---------
    Function:
    Step through list of HTML elements containing desired data, locate data, parse into desired format,
    append to dictionary.
    ---------
    Output:
    truth_data = dictionary with scraped data appended
    """
    # Iterate over items stored in fact_checks list.
    for check in fact_checks:
        #Within this item, located the tag "class_='statement'.
        statement = check.find_parent(class_='statement')

        #Locate the the data for statement date, truth rating (under meter), and text using associated tags.
        statement_date = statement.find_all("span", class_="article__meta")
        statement_meter = statement.find_all(class_="meter")
        statement_text = statement.find_all(class_="statement__text")

        #Perform first parsing of each string retrieved above. Text needs no parsing.
        parse_date_1 = statement_date[0].text
        parse_meter_1 = re.findall( r'(\"(.+?)\")', str(statement_meter[0]), re.M)
        parse_text_final = statement_text[0].text

        #Perform further parsing of date, final parsing of truth rating string.
        parse_date_2 = parse_date_1.replace("on ", "")
        parse_meter_final = str(parse_meter_1[2][0].replace('"', ''))

        #Perform final parsing of date
        parse_date_final = parser.parse(parse_date_2).date()

        #Append scraped data to dictionary initialized below.
        truth_data['Person'].append(person)
        truth_data['Date'].append(parse_date_final)
        truth_data['Veracity'].append(parse_meter_final)
        truth_data['Text'].append(parse_text_final)

    return truth_data

# Set up a dictionary to contain scraped data.
truth_data = {'Person':[], 'Date':[], 'Veracity':[], 'Text':[]}

# create a list of politician names from the lookup dictionary.
person_list = list(person_lookup_dict.keys())

# for each item in the list, return the HTML element containing the data I wish to scrape.
for person in person_list:
    # Retrieve the list of URLs generated earlier of each person in the list
    url_list = person_lookup_dict[person]['urllist']

    # Iterate through the list of URLs.
    for url in url_list:
        # Retrieve the webpage for each URL.
        page = requests.get(url)

        # Wait a random amount of time between 10 and 20 seconds.
        # If an error is returned, state the status code and break the loop.
        sleep(randint(10,20))
        if page.status_code != 200:
            print('We may have a problem here.', page.status_code)
            break

        # Parse the HTML using BeatifulSoup.
        soup = BeautifulSoup(page.text, 'html.parser')

        """In Politifact.com HTML tags Nancy Pelosi's name is formatted with two spaces between "Nancy" and "Pelosi".
        Because of this, I had to create a special case when searching for statements made by her."""
        if person == 'Nancy Pelosi':
            fact_checks = soup.find_all(class_="statement__source", text='Nancy  Pelosi')
        else:
            fact_checks = soup.find_all(class_="statement__source", text=person)

        # Pass that list to the fuction "truth_extractor".
        truth_extractor(person, fact_checks, truth_data)   

        # Show progress.
        sys.stdout.write('\r'+ 'step ' + show_progress(person,person_list) + ', substep: '\
                         + show_progress(url,url_list) + '         ')

step 7 of 7, substep: 1 of 1

# Convert dictionary of scraped data to dataframe.
truth_df = pd.DataFrame.from_dict(truth_data, orient="columns")

# Show final product.
truth_df

	Date	Person	Text	Veracity
0	2018-04-10	Donald Trump	\nEPA administrator Scott Pruitt's short-term ...	Mostly False
1	2018-04-10	Donald Trump	\nSays Scott Pruitt’s security spending was "s...	Mostly False
2	2018-04-09	Donald Trump	\n"When a car is sent to the United States fro...	Mostly True
3	2018-04-09	Donald Trump	\n"This will be the last time — April — that y...	Mostly False
4	2018-04-06	Donald Trump	\n"In many places, like California, the same p...	Pants on Fire!
5	2018-04-04	Donald Trump	\n"We’ve started building the wall."\r\n\n	Mostly False
6	2018-04-02	Donald Trump	\nSays caravans of people are coming to cross ...	Half-True
7	2018-04-02	Donald Trump	\nMexico has "very strong border laws -- ours ...	Mostly False
8	2018-04-02	Donald Trump	\n"Only fools, or worse, are saying that our m...	False
9	2018-03-28	Donald Trump	\n"Last year we lost $500 billion on trade wit...	Mostly False
10	2018-03-22	Donald Trump	\nSays Conor Lamb "ran on a campaign that said...	False
11	2018-03-21	Donald Trump	\nRobert Mueller’s investigative team has "13 ...	Half-True
12	2018-03-16	Donald Trump	\nSays Democratic obstruction is the reason wh...	Half-True
13	2018-03-15	Donald Trump	\nIn Japan, "they take a bowling ball from 20 ...	False
14	2018-03-15	Donald Trump	\n"We do have a Trade Deficit with Canada, as ...	Mostly False
15	2018-03-14	Donald Trump	\nSays China and Singapore impose the death pe...	True
16	2018-03-13	Donald Trump	\n"The state of California is begging us to bu...	Pants on Fire!
17	2018-03-13	Donald Trump	\nSays the U.S. steel and aluminum industry is...	Mostly True
18	2018-03-12	Donald Trump	\nThe last private rocket launch "cost $80 mil...	Half-True
19	2018-03-09	Donald Trump	\nAmerican aluminum and steel "are vital to ou...	Half-True
20	2018-03-09	Donald Trump	\n"When I was campaigning, I was talking about...	False
21	2018-03-07	Donald Trump	\n"Democrats are nowhere to be found on DACA."...	False
22	2018-03-06	Donald Trump	\nThe 2018 Academy Awards show was the "lowest...	True
23	2018-03-01	Donald Trump	\n"You take Pulse nightclub. If you had one pe...	False
24	2018-02-20	Donald Trump	\n"I have been much tougher on Russia than Oba...	Mostly False
25	2018-02-19	Donald Trump	\n"I never said Russia did not meddle in the e...	Pants on Fire!
26	2018-02-08	Donald Trump	\n"The Democrats are pushing for Universal Hea...	Mostly False
27	2018-02-07	Donald Trump	\nMany gang members have taken advantage of "g...	Half-True
28	2018-02-06	Donald Trump	\nAt the State of the Union address, Democrats...	Pants on Fire!
29	2018-02-02	Donald Trump	\n"Instead of two for one, we have cut 22 burd...	Mostly False
...	...	...	...	...
1359	2013-05-28	Mitch McConnell	\n\r\n\tSays Health and Human Services Secreta...	Mostly True
1360	2010-06-14	Mitch McConnell	\n"A major part" of the climate change bill sp...	False
1361	2010-04-20	Mitch McConnell	\nNew financial regulation "actually guarantee...	False
1362	2010-02-19	Mitch McConnell	\nThe stimulus includes "$219,000 to study t...	Half-True
1363	2010-02-19	Mitch McConnell	\n"$100,000 in stimulus funds (were) used for...	Mostly True
1364	2010-02-01	Mitch McConnell	\nOn a bipartisan task force on ways to improv...	Full Flop
1365	2009-12-02	Mitch McConnell	\nThe Senate health care bill does not contain...	True
1366	2009-06-01	Mitch McConnell	\n"The Department of Justice, under the Obama ...	Half-True
1367	2009-05-19	Mitch McConnell	\nA public option for health care would end pr...	Mostly False
1368	2009-03-03	Mitch McConnell	\n"In just one month, the Democrats have spent...	False
1369	2009-02-03	Mitch McConnell	\n To give the proposed economic stimulus plan...	True
1370	2009-01-05	Mitch McConnell	\nIf Obama's economic plan creates 600,000 new...	Mostly True
1371	2017-03-27	Charles Schumer	\n"In fact, if you add up the net wealth of hi...	Mostly True
1372	2017-10-08	Charles Schumer	\nTrump’s tax plan is "completely focused on t...	False
1373	2017-10-06	Charles Schumer	\n"The Republicans are proposing to pay for th...	Half-True
1374	2017-07-25	Charles Schumer	\n"When the price for oil goes up on the marke...	False
1375	2017-05-18	Charles Schumer	\n"President Obama became the first president ...	True
1376	2017-03-27	Charles Schumer	\n"In fact, if you add up the net wealth of hi...	Mostly True
1377	2017-01-27	Charles Schumer	\nSays Rex "Tillerson won't divest from Exxon....	Pants on Fire!
1378	2017-01-10	Charles Schumer	\nSays Donald Trump campaigned on not cutting ...	Mostly True
1379	2016-06-15	Charles Schumer	\nLast year, "244 suspected terrorists walked ...	Mostly True
1380	2015-05-18	Charles Schumer	\n"It is simply a fact that insufficient fundi...	Half-True
1381	2015-03-08	Charles Schumer	\n\r\n"The State Department asked all secretar...	Mostly False
1382	2014-12-04	Charles Schumer	\nIn 2010, uninsured voters made up "about 5 p...	True
1383	2014-05-08	Charles Schumer	\nIf you work 40 hours a week at the proposed ...	Half-True
1384	2013-10-08	Charles Schumer	\nBecause of the 2011 debt ceiling fight, "the...	Mostly True
1385	2010-08-04	Charles Schumer	\n\r\n\t"Eight of the nine justices in the Sup...	Mostly True
1386	2010-04-13	Charles Schumer	\n"No one questioned that she (Judge Sotomayor...	False
1387	2010-01-22	Charles Schumer	\n"With a stroke of a pen, the (U.S. Supreme C...	Mostly False
1388	2009-03-12	Charles Schumer	\n\r\n\t"No Bridge to Nowhere could occur."\n	False

1389 rows × 4 columns

# Save dataframe to CSV file.
truth_df.to_csv('politic_truth.csv', index=False)