materi-praktikum/Praktikum Python Code/00-Python-Text-Basics/04-Python-Text-Basics-Assessment-Solutions.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Python Text Basics Assessment - Solutions\n",
    "\n",
    "Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## f-Strings\n",
    "#### 1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NLP stands for Natural Language Processing\n"
     ]
    }
   ],
   "source": [
    "abbr = 'NLP'\n",
    "full_text = 'Natural Language Processing'\n",
    "\n",
    "# Enter your code here:\n",
    "print(f'{abbr} stands for {full_text}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Files\n",
    "#### 2. Create a file in the current working directory called `contacts.txt` by running the cell below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting contacts.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile contacts.txt\n",
    "First_Name Last_Name, Title, Extension, Email"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Open the file and use .read() to save the contents of the file to a string called `fields`.  Make sure the file is closed at the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'First_Name Last_Name, Title, Extension, Email'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Write your code here:\n",
    "with open('contacts.txt') as c:\n",
    "    fields = c.read()\n",
    "\n",
    "    \n",
    "# Run fields to see the contents of contacts.txt:\n",
    "fields"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with PDF Files\n",
    "#### 4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AUTHORS:\n",
      " \n",
      "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
      " \n",
      "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
      " \n",
      "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
      " \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Perform import\n",
    "import PyPDF2\n",
    "\n",
    "# Open the file as a binary object\n",
    "f = open('Business_Proposal.pdf','rb')\n",
    "\n",
    "# Use PyPDF2 to read the text of the file\n",
    "pdf_reader = PyPDF2.PdfFileReader(f)\n",
    "\n",
    "\n",
    "# Get the text from page 2 (CHALLENGE: Do this in one step!)\n",
    "page_two_text = pdf_reader.getPage(1).extractText()\n",
    "\n",
    "\n",
    "\n",
    "# Close the file\n",
    "f.close()\n",
    "\n",
    "# Print the contents of page_two_text\n",
    "print(page_two_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.\n",
    "\n",
    "#### CHALLENGE: See if you can remove the word \"AUTHORS:\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First_Name Last_Name, Title, Extension, EmailAUTHORS:\n",
      " \n",
      "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
      " \n",
      "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
      " \n",
      "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
      " \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Simple Solution:\n",
    "with open('contacts.txt','a+') as c:\n",
    "    c.write(page_two_text)\n",
    "    c.seek(0)\n",
    "    print(c.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First_Name Last_Name, Title, Extension, Email\n",
      " \n",
      "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
      " \n",
      "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
      " \n",
      "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
      " \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):\n",
    "with open('contacts.txt','a+') as c:\n",
    "    c.write(page_two_text[8:])\n",
    "    c.seek(0)\n",
    "    print(c.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regular Expressions\n",
    "#### 6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['abaker@ourcompany.com',\n",
       " 'cdonaldson@ourcompany.com',\n",
       " 'efreeman@ourcompany.com']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "# Enter your regex pattern here. This may take several tries!\n",
    "pattern = r'\\w+@\\w+.\\w{3}'\n",
    "\n",
    "re.findall(pattern, page_two_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Great job!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}