{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", " \n", "___" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Python Text Basics Assessment - Solutions\n", "\n", "Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## f-Strings\n", "#### 1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NLP stands for Natural Language Processing\n" ] } ], "source": [ "abbr = 'NLP'\n", "full_text = 'Natural Language Processing'\n", "\n", "# Enter your code here:\n", "print(f'{abbr} stands for {full_text}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Files\n", "#### 2. Create a file in the current working directory called `contacts.txt` by running the cell below:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting contacts.txt\n" ] } ], "source": [ "%%writefile contacts.txt\n", "First_Name Last_Name, Title, Extension, Email" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'First_Name Last_Name, Title, Extension, Email'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Write your code here:\n", "with open('contacts.txt') as c:\n", " fields = c.read()\n", "\n", " \n", "# Run fields to see the contents of contacts.txt:\n", "fields" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with PDF Files\n", "#### 4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AUTHORS:\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# Perform import\n", "import PyPDF2\n", "\n", "# Open the file as a binary object\n", "f = open('Business_Proposal.pdf','rb')\n", "\n", "# Use PyPDF2 to read the text of the file\n", "pdf_reader = PyPDF2.PdfFileReader(f)\n", "\n", "\n", "# Get the text from page 2 (CHALLENGE: Do this in one step!)\n", "page_two_text = pdf_reader.getPage(1).extractText()\n", "\n", "\n", "\n", "# Close the file\n", "f.close()\n", "\n", "# Print the contents of page_two_text\n", "print(page_two_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.\n", "\n", "#### CHALLENGE: See if you can remove the word \"AUTHORS:\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First_Name Last_Name, Title, Extension, EmailAUTHORS:\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# Simple Solution:\n", "with open('contacts.txt','a+') as c:\n", " c.write(page_two_text)\n", " c.seek(0)\n", " print(c.read())" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First_Name Last_Name, Title, Extension, Email\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):\n", "with open('contacts.txt','a+') as c:\n", " c.write(page_two_text[8:])\n", " c.seek(0)\n", " print(c.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regular Expressions\n", "#### 6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['abaker@ourcompany.com',\n", " 'cdonaldson@ourcompany.com',\n", " 'efreeman@ourcompany.com']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "\n", "# Enter your regex pattern here. This may take several tries!\n", "pattern = r'\\w+@\\w+.\\w{3}'\n", "\n", "re.findall(pattern, page_two_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Great job!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }