296 lines
6.9 KiB
Plaintext
296 lines
6.9 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"___\n",
|
|
"\n",
|
|
"<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
|
|
"___"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"source": [
|
|
"# Python Text Basics Assessment - Solutions\n",
|
|
"\n",
|
|
"Welcome to your assessment! Complete the tasks described in bold below by typing the relevant code in the cells."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## f-Strings\n",
|
|
"#### 1. Print an f-string that displays `NLP stands for Natural Language Processing` using the variables provided."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"NLP stands for Natural Language Processing\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"abbr = 'NLP'\n",
|
|
"full_text = 'Natural Language Processing'\n",
|
|
"\n",
|
|
"# Enter your code here:\n",
|
|
"print(f'{abbr} stands for {full_text}')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Files\n",
|
|
"#### 2. Create a file in the current working directory called `contacts.txt` by running the cell below:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Overwriting contacts.txt\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%%writefile contacts.txt\n",
|
|
"First_Name Last_Name, Title, Extension, Email"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### 3. Open the file and use .read() to save the contents of the file to a string called `fields`. Make sure the file is closed at the end."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'First_Name Last_Name, Title, Extension, Email'"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Write your code here:\n",
|
|
"with open('contacts.txt') as c:\n",
|
|
" fields = c.read()\n",
|
|
"\n",
|
|
" \n",
|
|
"# Run fields to see the contents of contacts.txt:\n",
|
|
"fields"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Working with PDF Files\n",
|
|
"#### 4. Use PyPDF2 to open the file `Business_Proposal.pdf`. Extract the text of page 2."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"AUTHORS:\n",
|
|
" \n",
|
|
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
|
|
" \n",
|
|
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
|
|
" \n",
|
|
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
|
|
" \n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Perform import\n",
|
|
"import PyPDF2\n",
|
|
"\n",
|
|
"# Open the file as a binary object\n",
|
|
"f = open('Business_Proposal.pdf','rb')\n",
|
|
"\n",
|
|
"# Use PyPDF2 to read the text of the file\n",
|
|
"pdf_reader = PyPDF2.PdfFileReader(f)\n",
|
|
"\n",
|
|
"\n",
|
|
"# Get the text from page 2 (CHALLENGE: Do this in one step!)\n",
|
|
"page_two_text = pdf_reader.getPage(1).extractText()\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"# Close the file\n",
|
|
"f.close()\n",
|
|
"\n",
|
|
"# Print the contents of page_two_text\n",
|
|
"print(page_two_text)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### 5. Open the file `contacts.txt` in append mode. Add the text of page 2 from above to `contacts.txt`.\n",
|
|
"\n",
|
|
"#### CHALLENGE: See if you can remove the word \"AUTHORS:\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"First_Name Last_Name, Title, Extension, EmailAUTHORS:\n",
|
|
" \n",
|
|
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
|
|
" \n",
|
|
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
|
|
" \n",
|
|
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
|
|
" \n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Simple Solution:\n",
|
|
"with open('contacts.txt','a+') as c:\n",
|
|
" c.write(page_two_text)\n",
|
|
" c.seek(0)\n",
|
|
" print(c.read())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"First_Name Last_Name, Title, Extension, Email\n",
|
|
" \n",
|
|
"Amy Baker, Finance Chair, x345, abaker@ourcompany.com\n",
|
|
" \n",
|
|
"Chris Donaldson, Accounting Dir., x621, cdonaldson@ourcompany.com\n",
|
|
" \n",
|
|
"Erin Freeman, Sr. VP, x879, efreeman@ourcompany.com\n",
|
|
" \n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# CHALLENGE Solution (re-run the %%writefile cell above to obtain an unmodified contacts.txt file):\n",
|
|
"with open('contacts.txt','a+') as c:\n",
|
|
" c.write(page_two_text[8:])\n",
|
|
" c.seek(0)\n",
|
|
" print(c.read())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Regular Expressions\n",
|
|
"#### 6. Using the `page_two_text` variable created above, extract any email addresses that were contained in the file `Business_Proposal.pdf`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['abaker@ourcompany.com',\n",
|
|
" 'cdonaldson@ourcompany.com',\n",
|
|
" 'efreeman@ourcompany.com']"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import re\n",
|
|
"\n",
|
|
"# Enter your regex pattern here. This may take several tries!\n",
|
|
"pattern = r'\\w+@\\w+.\\w{3}'\n",
|
|
"\n",
|
|
"re.findall(pattern, page_two_text)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Great job!"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|