William Miller's Projects

Searching for Patterns that Matter

Web Scraping Tutorial


In preparation for a project comparing the veracity of U.S politicians, I have scraped some data from www.PolitiFact.com. Since this is a fairly basic data scraping project, I thought I would take a bit to make this into a tutorial on the subject. My goal here is to write some code that will:

  1. Take the name of any politician on PolitiFact.com in a list
  2. Retrieve the data from Politifact for: the date they made a statement, the text of that statement, the Politifact rating of that statement
  3. Store the resulting data in a dataframe and export to a CSV file
    1. Libraries

      First, there are several libraries that I need to import for just about any data scraping project.

      • "Beautiful Soup" makes navigating HTML much more convenient and is nearly indispensible for this purpose.
      • I will be using the "get" function from the "requests" library to retrieve the HTML from specific URLs.
      • It is highly likely in any web scraping project is going to require searching or processing text using regular expressions, so importing the "re" library is also necessary.
      • Any time I am requesting lot of data from a website, it is necessary to space out the requests you are making in order to avoid looking like I'm instigating a DDoS attack, getting my IP address banned temporarily. I will import the sleep function from the "time" library and "randint" from "random" to let me wait for random time intervals between requests.
      • As with basically any data project, it is very likely, if not definite, that I will need functions from the "pandas" and "numpy" libraries. I will go ahead and import both.
      • While this is specific to this particular project, I know that I will be parsing some dates as a part of this, so I will go ahead and load the "parser" function from the "dateutil" library.
      • Another library I'm importing that's not always necessary is "sys" for the "stdout.write" function. This is due to a quirk of Jupyter notebooks which will not execute a carriage return ('\r') within a print statement. I'm just using this for the sake of convenience, in that I wish to make a progress tracking function that will not gradually fill my screen with progress statements, but will instead overwrite the last progress statement.

      from bs4 import BeautifulSoup
      import requests
      import re
      
      from time import sleep
      from random import randint
      
      import numpy as np
      import pandas as pd
      
      from dateutil import parser
      import sys
      

      Function to show progress

      I mentioned above that I will be retrieving a lot of data from a specific website, waiting random amounts of time between requests. This means that there will be a fair amount of waiting time involved in this progress as I step through lists of URLs to retrieve our data. Because of this, I'm going to set up a quick progress-tracking function. It simply takes a list and an element in that list as input, and prints the number of the element entered compared to the whole. For instance if I passed in "c" and "[a,b,c,d,e], this function will print "3 of 5". This give a rough idea of how long the scraping process will take and lets us know it's doing something.

      def show_progress(part, whole, addstr = '', **kwargs):
          """
          Input:
          part = element of list
          whole = list
          ---------
          Function:
          Find the number of element "part" is within "whole"
          ---------
          Output:
          Return the string "[nth list element] of [list length]"
          """
          path = len(whole)
          step = whole.index(part) + 1
          progress = str(step) + " of " + str(path)
          return progress
      

      Recognize and exploit patterns in URLs

      At this point, I had to go through a process I cannot really show here. I went to Politifact website and viewed which contemporary politicians had fact checking data on their website, and I selected a few of these. I then investigated how to get total number of fact checks for each person, and looked for patterns in the URLs I would need to request.

      For instance, I noticed if I clicked on "personalities", then "Donald Trump", and then selected "See all" for statements by Trump, it returned 28 pages of fact checks. I noticed that clicking through these pages returned different URLs, so page 3 had a URL of "http://www.politifact.com/personalities/donald-trump/statements/?page=3&list=speaker". I ensured that I could plug in any page number to the right of "page=", and it would go to that page.

      Looking at other politicians, I noticed that their fact-check pages followed the same convention regarding their names, where Donald Trump's page contained "personalities/donald-trump", Barack Obama's page contained "personalities/barack-obama". "Firstname Lastname" was consistently formatted as "firstname-lastname". I can combine this information with the above to return a complete list of fact-checks for any politician listed on PolitiFact.

      Format politican names for URLs, set up URL lookup data organization

      To use this info, I will make a list of the names of the politician's I want ratings from. My aim is to do this in such a way that I could add any name of any person with fact checks on PolitiFact, and it will retrieve the data on them. I will then write some code to format that list in accordance with that URL name convention I mentioned.

      One thing that can be immensely useful in any data scraping project is to set up a dictionary to store information in based on what you're looking up. I will therefore create the dictionary "person_lookup_dict" with each politican's name as the key. I will then initialize a dictionary stored under each of their names that I can the url-formatted names to. Later, I will add additional lookup-related information to this dictionary.

      While it can be somewhat trickier at times to add and retrieve information from a dictionary rather than a bunch of separate lists, this will help immensely to keep all of the information I need to use retrieve the correct URLs organized. It will also help ensure that I can add names to the list of people whose fact-check data I wish to retrieve without modifying my code.

      # People to retrieve fact check data for from PolitiFact.com.
      person_list = ['Donald Trump', 'Barack Obama', 'Mike Pence', 'Paul Ryan',
                     'Nancy Pelosi', 'Mitch McConnell', 'Charles Schumer']
      
      # Initialize a dictionary with the names above as keys.
      person_lookup_dict = dict.fromkeys(person_list, {})
      
      # Initialize a dictionary under each of these keys containing the URL formatting of each name.
      for person in person_lookup_dict:
          person_lookup = person.lower()
          person_lookup = person_lookup.replace(" ", "-")
      
          person_lookup_dict[person] = {'urlname':person_lookup}
      
      # Show the result.
      person_lookup_dict
      
      {'Barack Obama': {'urlname': 'barack-obama'},
       'Charles Schumer': {'urlname': 'charles-schumer'},
       'Donald Trump': {'urlname': 'donald-trump'},
       'Mike Pence': {'urlname': 'mike-pence'},
       'Mitch McConnell': {'urlname': 'mitch-mcconnell'},
       'Nancy Pelosi': {'urlname': 'nancy-pelosi'},
       'Paul Ryan': {'urlname': 'paul-ryan'}}
      

      Retrieve number of pages per person for URL lookup

      At this point, I need to delve into the HTML data on page 1 for at least a couple of the people I'm retrieving data on, with the aim of writing code to scrape the total number of pages of fact-checks each person has. This will prevent me from having to manually enter this number for each person, and manually update that number any time I wish to run this code again that the number of pages may have changed.

      I start by pulling the HTML source from page 1 for each person. Looking at one of these webpages in my browser, at the bottom of the screen I can see the text "Page 1 of ??" on the bottom of it, and it's a safe bet that is encoded in the HTML and can be retrieved using BeautifulSoup. Looking at the HTML data from page directly (it's easiest to visit the URL in a browser, right-click, and then hit "View Page Source"), and searching for that text, I find that it is contained in a tag "step-links__current". Searching for this, I can find that there are two instance of this per page. Checking other pages, I find that this is consistently the case.

      The plan therefore becomes the following:

      1. Request the data for page 1 for each person in our list. (Waiting between each request.)
      2. Find the string "Page 1 of ??"
      3. Process that string to get only the integer for the maximum number of pages.
      4. Store that value for each person in the dictionary, in the dictionary under each name.

      for person in person_lookup_dict:
          # Request data from page 1 for each person in list.
          person_url=person_lookup_dict[person]['urlname']
          start_page = requests.get("http://www.politifact.com/personalities/" + person_url + "/statements/?page=1&list=speaker")
          start_soup = BeautifulSoup(start_page.text, 'html.parser')
      
          # Wait a random amount of time between 10 and 20 seconds.
          #If an error is returned, state the status code and break the loop.
          sleep(randint(10,20))
          if start_page.status_code != 200:
              print('We may have a problem here.', start_page.status_code)
              break
      
          # Find the string "Page 1 of ??" (contained within tags "class_=...").
          # Process down to integer value of max pages.
          num_page_str = start_soup.find(class_="step-links__current").find_next(class_="step-links__current")
          num_page_sub_str = re.search( r'(\d+) of (\d+)', str(num_page_str), re.M)
          person_lookup_dict[person]['urlpages']= int(num_page_sub_str.group(2))
      
          # Show progress.
          sys.stdout.write('\r'+ 'step ' + show_progress(person, list(person_lookup_dict.keys())))    
      
      # Show results.
      person_lookup_dict
      
      step 7 of 7
      
      
      
      
      {'Barack Obama': {'urlname': 'barack-obama', 'urlpages': 31},
       'Charles Schumer': {'urlname': 'charles-schumer', 'urlpages': 1},
       'Donald Trump': {'urlname': 'donald-trump', 'urlpages': 28},
       'Mike Pence': {'urlname': 'mike-pence', 'urlpages': 3},
       'Mitch McConnell': {'urlname': 'mitch-mcconnell', 'urlpages': 2},
       'Nancy Pelosi': {'urlname': 'nancy-pelosi', 'urlpages': 2},
       'Paul Ryan': {'urlname': 'paul-ryan', 'urlpages': 5}}
      

      Generate URL list

      I now have everything I need to generate a list of URLs for each person in the list. I can now iterate through each key (politician name) in the dictionary, retrieve the URL-formatted name and the number of pages for each person, and then generate a list of URLs from that info. I will store the URLs in the same person_lookup_dict that the other data is stored in, just to keep it all consistent and neat.

      # For each key in person_lookup_dict, retrieve that data to build correct URLs, then build URLs.
      for person in person_lookup_dict:
          person_lookup_dict[person]['urllist'] = []
          person_name_url = person_lookup_dict[person]['urlname']
          for i in range(1, person_lookup_dict[person]['urlpages'] + 1):
              url = "http://www.politifact.com/personalities/" + person_name_url\
              + "/statements/?page="+ str(i) +"&list=speaker"
      
              # Store URLs in person_lookup_dict.
              person_lookup_dict[person]['urllist'].append(url)
      
      #Show results.
      person_lookup_dict
      
      {'Barack Obama': {'urllist': ['http://www.politifact.com/personalities/barack-obama/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=2&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=3&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=4&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=5&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=6&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=7&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=8&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=9&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=10&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=11&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=12&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=13&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=14&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=15&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=16&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=17&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=18&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=19&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=20&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=21&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=22&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=23&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=24&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=25&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=26&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=27&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=28&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=29&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=30&list=speaker',
         'http://www.politifact.com/personalities/barack-obama/statements/?page=31&list=speaker'],
        'urlname': 'barack-obama',
        'urlpages': 31},
       'Charles Schumer': {'urllist': ['http://www.politifact.com/personalities/charles-schumer/statements/?page=1&list=speaker'],
        'urlname': 'charles-schumer',
        'urlpages': 1},
       'Donald Trump': {'urllist': ['http://www.politifact.com/personalities/donald-trump/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=2&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=3&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=4&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=5&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=6&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=7&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=8&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=9&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=10&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=11&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=12&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=13&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=14&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=15&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=16&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=17&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=18&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=19&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=20&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=21&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=22&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=23&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=24&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=25&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=26&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=27&list=speaker',
         'http://www.politifact.com/personalities/donald-trump/statements/?page=28&list=speaker'],
        'urlname': 'donald-trump',
        'urlpages': 28},
       'Mike Pence': {'urllist': ['http://www.politifact.com/personalities/mike-pence/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/mike-pence/statements/?page=2&list=speaker',
         'http://www.politifact.com/personalities/mike-pence/statements/?page=3&list=speaker'],
        'urlname': 'mike-pence',
        'urlpages': 3},
       'Mitch McConnell': {'urllist': ['http://www.politifact.com/personalities/mitch-mcconnell/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/mitch-mcconnell/statements/?page=2&list=speaker'],
        'urlname': 'mitch-mcconnell',
        'urlpages': 2},
       'Nancy Pelosi': {'urllist': ['http://www.politifact.com/personalities/nancy-pelosi/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/nancy-pelosi/statements/?page=2&list=speaker'],
        'urlname': 'nancy-pelosi',
        'urlpages': 2},
       'Paul Ryan': {'urllist': ['http://www.politifact.com/personalities/paul-ryan/statements/?page=1&list=speaker',
         'http://www.politifact.com/personalities/paul-ryan/statements/?page=2&list=speaker',
         'http://www.politifact.com/personalities/paul-ryan/statements/?page=3&list=speaker',
         'http://www.politifact.com/personalities/paul-ryan/statements/?page=4&list=speaker',
         'http://www.politifact.com/personalities/paul-ryan/statements/?page=5&list=speaker'],
        'urlname': 'paul-ryan',
        'urlpages': 5}}
      

      Retrieve and parse data

      Here is the real substance to what I'm doing. For the sake of organization, I created a function that parses the data I want to retrieve from the HTML I requested using the generated URLs. This part of the process involves more HTML investigation like that done for finding the page number data above, except here I'm looking for tags that allow be to locate the date a statement was made, the text of that statement, and the truth-rating assigned to it.

      I found that the tag "class_='statement__source'" containing the politician's name (the version in the list prior to URL formatting) contained all the data I was look for. Below, I put the HTML containing that data into a list with each statement as an element. I then pass that list to a function that steps through it, scrapes the data I want, and appends only that data to a dictionary (truth_data).

      # See code below before parsing function code.
      def truth_extractor(person, fact_checks, truth_data):
          """
          Input:
          person = Name of person making the statement in question.
      
          fact-checks = list of HTML elements containing that fact-check data to be scraped
      
          truth_data = dictionary of data scraped so far to be appended and returned.
          ---------
          Function:
          Step through list of HTML elements containing desired data, locate data, parse into desired format,
          append to dictionary.
          ---------
          Output:
          truth_data = dictionary with scraped data appended
          """
          # Iterate over items stored in fact_checks list.
          for check in fact_checks:
              #Within this item, located the tag "class_='statement'.
              statement = check.find_parent(class_='statement')
      
              #Locate the the data for statement date, truth rating (under meter), and text using associated tags.
              statement_date = statement.find_all("span", class_="article__meta")
              statement_meter = statement.find_all(class_="meter")
              statement_text = statement.find_all(class_="statement__text")
      
              #Perform first parsing of each string retrieved above. Text needs no parsing.
              parse_date_1 = statement_date[0].text
              parse_meter_1 = re.findall( r'(\"(.+?)\")', str(statement_meter[0]), re.M)
              parse_text_final = statement_text[0].text
      
              #Perform further parsing of date, final parsing of truth rating string.
              parse_date_2 = parse_date_1.replace("on ", "")
              parse_meter_final = str(parse_meter_1[2][0].replace('"', ''))
      
              #Perform final parsing of date
              parse_date_final = parser.parse(parse_date_2).date()
      
              #Append scraped data to dictionary initialized below.
              truth_data['Person'].append(person)
              truth_data['Date'].append(parse_date_final)
              truth_data['Veracity'].append(parse_meter_final)
              truth_data['Text'].append(parse_text_final)
      
          return truth_data
      
      # Set up a dictionary to contain scraped data.
      truth_data = {'Person':[], 'Date':[], 'Veracity':[], 'Text':[]}
      
      # create a list of politician names from the lookup dictionary.
      person_list = list(person_lookup_dict.keys())
      
      # for each item in the list, return the HTML element containing the data I wish to scrape.
      for person in person_list:
          # Retrieve the list of URLs generated earlier of each person in the list
          url_list = person_lookup_dict[person]['urllist']
      
          # Iterate through the list of URLs.
          for url in url_list:
              # Retrieve the webpage for each URL.
              page = requests.get(url)
      
              # Wait a random amount of time between 10 and 20 seconds.
              # If an error is returned, state the status code and break the loop.
              sleep(randint(10,20))
              if page.status_code != 200:
                  print('We may have a problem here.', page.status_code)
                  break
      
              # Parse the HTML using BeatifulSoup.
              soup = BeautifulSoup(page.text, 'html.parser')
      
              """In Politifact.com HTML tags Nancy Pelosi's name is formatted with two spaces between "Nancy" and "Pelosi".
              Because of this, I had to create a special case when searching for statements made by her."""
              if person == 'Nancy Pelosi':
                  fact_checks = soup.find_all(class_="statement__source", text='Nancy  Pelosi')
              else:
                  fact_checks = soup.find_all(class_="statement__source", text=person)
      
              # Pass that list to the fuction "truth_extractor".
              truth_extractor(person, fact_checks, truth_data)   
      
              # Show progress.
              sys.stdout.write('\r'+ 'step ' + show_progress(person,person_list) + ', substep: '\
                               + show_progress(url,url_list) + '         ')
      
      step 7 of 7, substep: 1 of 1
      
      # Convert dictionary of scraped data to dataframe.
      truth_df = pd.DataFrame.from_dict(truth_data, orient="columns")
      
      # Show final product.
      truth_df
      
      Date Person Text Veracity
      0 2018-04-10 Donald Trump \nEPA administrator Scott Pruitt's short-term ... Mostly False
      1 2018-04-10 Donald Trump \nSays Scott Pruitt’s security spending was "s... Mostly False
      2 2018-04-09 Donald Trump \n"When a car is sent to the United States fro... Mostly True
      3 2018-04-09 Donald Trump \n"This will be the last time — April — that y... Mostly False
      4 2018-04-06 Donald Trump \n"In many places, like California, the same p... Pants on Fire!
      5 2018-04-04 Donald Trump \n"We’ve started building the wall."\r\n\n Mostly False
      6 2018-04-02 Donald Trump \nSays caravans of people are coming to cross ... Half-True
      7 2018-04-02 Donald Trump \nMexico has "very strong border laws -- ours ... Mostly False
      8 2018-04-02 Donald Trump \n"Only fools, or worse, are saying that our m... False
      9 2018-03-28 Donald Trump \n"Last year we lost $500 billion on trade wit... Mostly False
      10 2018-03-22 Donald Trump \nSays Conor Lamb "ran on a campaign that said... False
      11 2018-03-21 Donald Trump \nRobert Mueller’s investigative team has "13 ... Half-True
      12 2018-03-16 Donald Trump \nSays Democratic obstruction is the reason wh... Half-True
      13 2018-03-15 Donald Trump \nIn Japan, "they take a bowling ball from 20 ... False
      14 2018-03-15 Donald Trump \n"We do have a Trade Deficit with Canada, as ... Mostly False
      15 2018-03-14 Donald Trump \nSays China and Singapore impose the death pe... True
      16 2018-03-13 Donald Trump \n"The state of California is begging us to bu... Pants on Fire!
      17 2018-03-13 Donald Trump \nSays the U.S. steel and aluminum industry is... Mostly True
      18 2018-03-12 Donald Trump \nThe last private rocket launch "cost $80 mil... Half-True
      19 2018-03-09 Donald Trump \nAmerican aluminum and steel "are vital to ou... Half-True
      20 2018-03-09 Donald Trump \n"When I was campaigning, I was talking about... False
      21 2018-03-07 Donald Trump \n"Democrats are nowhere to be found on DACA."... False
      22 2018-03-06 Donald Trump \nThe 2018 Academy Awards show was the "lowest... True
      23 2018-03-01 Donald Trump \n"You take Pulse nightclub. If you had one pe... False
      24 2018-02-20 Donald Trump \n"I have been much tougher on Russia than Oba... Mostly False
      25 2018-02-19 Donald Trump \n"I never said Russia did not meddle in the e... Pants on Fire!
      26 2018-02-08 Donald Trump \n"The Democrats are pushing for Universal Hea... Mostly False
      27 2018-02-07 Donald Trump \nMany gang members have taken advantage of "g... Half-True
      28 2018-02-06 Donald Trump \nAt the State of the Union address, Democrats... Pants on Fire!
      29 2018-02-02 Donald Trump \n"Instead of two for one, we have cut 22 burd... Mostly False
      ... ... ... ... ...
      1359 2013-05-28 Mitch McConnell \n\r\n\tSays Health and Human Services Secreta... Mostly True
      1360 2010-06-14 Mitch McConnell \n"A major part" of the climate change bill sp... False
      1361 2010-04-20 Mitch McConnell \nNew financial regulation "actually guarantee... False
      1362 2010-02-19 Mitch McConnell \nThe stimulus includes "$219,000 to study t... Half-True
      1363 2010-02-19 Mitch McConnell \n"$100,000 in stimulus funds (were) used for... Mostly True
      1364 2010-02-01 Mitch McConnell \nOn a bipartisan task force on ways to improv... Full Flop
      1365 2009-12-02 Mitch McConnell \nThe Senate health care bill does not contain... True
      1366 2009-06-01 Mitch McConnell \n"The Department of Justice, under the Obama ... Half-True
      1367 2009-05-19 Mitch McConnell \nA public option for health care would end pr... Mostly False
      1368 2009-03-03 Mitch McConnell \n"In just one month, the Democrats have spent... False
      1369 2009-02-03 Mitch McConnell \n To give the proposed economic stimulus plan... True
      1370 2009-01-05 Mitch McConnell \nIf Obama's economic plan creates 600,000 new... Mostly True
      1371 2017-03-27 Charles Schumer \n"In fact, if you add up the net wealth of hi... Mostly True
      1372 2017-10-08 Charles Schumer \nTrump’s tax plan is "completely focused on t... False
      1373 2017-10-06 Charles Schumer \n"The Republicans are proposing to pay for th... Half-True
      1374 2017-07-25 Charles Schumer \n"When the price for oil goes up on the marke... False
      1375 2017-05-18 Charles Schumer \n"President Obama became the first president ... True
      1376 2017-03-27 Charles Schumer \n"In fact, if you add up the net wealth of hi... Mostly True
      1377 2017-01-27 Charles Schumer \nSays Rex "Tillerson won't divest from Exxon.... Pants on Fire!
      1378 2017-01-10 Charles Schumer \nSays Donald Trump campaigned on not cutting ... Mostly True
      1379 2016-06-15 Charles Schumer \nLast year, "244 suspected terrorists walked ... Mostly True
      1380 2015-05-18 Charles Schumer \n"It is simply a fact that insufficient fundi... Half-True
      1381 2015-03-08 Charles Schumer \n\r\n"The State Department asked all secretar... Mostly False
      1382 2014-12-04 Charles Schumer \nIn 2010, uninsured voters made up "about 5 p... True
      1383 2014-05-08 Charles Schumer \nIf you work 40 hours a week at the proposed ... Half-True
      1384 2013-10-08 Charles Schumer \nBecause of the 2011 debt ceiling fight, "the... Mostly True
      1385 2010-08-04 Charles Schumer \n\r\n\t"Eight of the nine justices in the Sup... Mostly True
      1386 2010-04-13 Charles Schumer \n"No one questioned that she (Judge Sotomayor... False
      1387 2010-01-22 Charles Schumer \n"With a stroke of a pen, the (U.S. Supreme C... Mostly False
      1388 2009-03-12 Charles Schumer \n\r\n\t"No Bridge to Nowhere could occur."\n False

      1389 rows × 4 columns

      # Save dataframe to CSV file.
      truth_df.to_csv('politic_truth.csv', index=False)