{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", " \n", "___" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Python Text Basics Assessment\n", "\n", "Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells.
\n", "You can compare your answers to the Solutions notebook provided in this folder." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## f-Strings\n", "#### 1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NLP stands for Natural Language Processing\n" ] } ], "source": [ "abbr = 'NLP'\n", "full_text = 'Natural Language Processing'\n", "\n", "# Enter your code here:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Files\n", "#### 2. Create a file in the current working directory called `contacts.txt` by running the cell below:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting contacts.txt\n" ] } ], "source": [ "%%writefile contacts.txt\n", "First_Name Last_Name, Title, Extension, Email" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'First_Name Last_Name, Title, Extension, Email'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Write your code here:\n", "\n", "\n", "\n", " \n", "# Run fields to see the contents of contacts.txt:\n", "fields" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with PDF Files\n", "#### 4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AUTHORS:\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# Perform import\n", "\n", "\n", "# Open the file as a binary object\n", "\n", "\n", "# Use PyPDF2 to read the text of the file\n", "\n", "\n", "\n", "# Get the text from page 2 (CHALLENGE: Do this in one step!)\n", "page_two_text = \n", "\n", "\n", "\n", "# Close the file\n", "\n", "\n", "# Print the contents of page_two_text\n", "print(page_two_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.\n", "\n", "#### CHALLENGE: See if you can remove the word \"AUTHORS:\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First_Name Last_Name, Title, Extension, EmailAUTHORS:\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# Simple Solution:\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First_Name Last_Name, Title, Extension, Email\n", " \n", "Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n", " \n", "Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n", " \n", "Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n", " \n", "\n" ] } ], "source": [ "# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regular Expressions\n", "#### 6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['abaker@ourcompany.com',\n", " 'cdonaldson@ourcompany.com',\n", " 'efreeman@ourcompany.com']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "\n", "# Enter your regex pattern here. This may take several tries!\n", "pattern = \n", "\n", "re.findall(pattern, page_two_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Great job!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }