materi-praktikum/Praktikum Python Code/00-Python-Text-Basics/04-Python-Text-Basics-Assessment-Solutions.ipynb

296 lines
6.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
"___"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Python Text Basics Assessment - Solutions\n",
"\n",
"Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## f-Strings\n",
"#### 1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NLP stands for Natural Language Processing\n"
]
}
],
"source": [
"abbr = 'NLP'\n",
"full_text = 'Natural Language Processing'\n",
"\n",
"# Enter your code here:\n",
"print(f'{abbr} stands for {full_text}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Files\n",
"#### 2. Create a file in the current working directory called `contacts.txt` by running the cell below:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting contacts.txt\n"
]
}
],
"source": [
"%%writefile contacts.txt\n",
"First_Name Last_Name, Title, Extension, Email"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'First_Name Last_Name, Title, Extension, Email'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Write your code here:\n",
"with open('contacts.txt') as c:\n",
" fields = c.read()\n",
"\n",
" \n",
"# Run fields to see the contents of contacts.txt:\n",
"fields"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with PDF Files\n",
"#### 4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AUTHORS:\n",
" \n",
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
" \n",
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
" \n",
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
" \n",
"\n"
]
}
],
"source": [
"# Perform import\n",
"import PyPDF2\n",
"\n",
"# Open the file as a binary object\n",
"f = open('Business_Proposal.pdf','rb')\n",
"\n",
"# Use PyPDF2 to read the text of the file\n",
"pdf_reader = PyPDF2.PdfFileReader(f)\n",
"\n",
"\n",
"# Get the text from page 2 (CHALLENGE: Do this in one step!)\n",
"page_two_text = pdf_reader.getPage(1).extractText()\n",
"\n",
"\n",
"\n",
"# Close the file\n",
"f.close()\n",
"\n",
"# Print the contents of page_two_text\n",
"print(page_two_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.\n",
"\n",
"#### CHALLENGE: See if you can remove the word \"AUTHORS:\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"First_Name Last_Name, Title, Extension, EmailAUTHORS:\n",
" \n",
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
" \n",
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
" \n",
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
" \n",
"\n"
]
}
],
"source": [
"# Simple Solution:\n",
"with open('contacts.txt','a+') as c:\n",
" c.write(page_two_text)\n",
" c.seek(0)\n",
" print(c.read())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"First_Name Last_Name, Title, Extension, Email\n",
" \n",
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
" \n",
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
" \n",
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
" \n",
"\n"
]
}
],
"source": [
"# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):\n",
"with open('contacts.txt','a+') as c:\n",
" c.write(page_two_text[8:])\n",
" c.seek(0)\n",
" print(c.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regular Expressions\n",
"#### 6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['abaker@ourcompany.com',\n",
" 'cdonaldson@ourcompany.com',\n",
" 'efreeman@ourcompany.com']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import re\n",
"\n",
"# Enter your regex pattern here. This may take several tries!\n",
"pattern = r'\\w+@\\w+.\\w{3}'\n",
"\n",
"re.findall(pattern, page_two_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Great job!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}