\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mmyfile\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'whoops.txt'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'whoops.txt'"
]
}
],
"source": [
"myfile = open('whoops.txt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To avoid this error, make sure your .txt file is saved in the same location as your notebook. To check your notebook location, use **pwd**:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'C:\\\\Users\\\\Mike\\\\NLP-Bootcamp\\\\00-Python-Text-Basics'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pwd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Alternatively, to grab files from any location on your computer, simply pass in the entire file path. **\n",
"\n",
"For Windows you need to use double \\ so python doesn't treat the second \\ as an escape character, a file path is in the form:\n",
"\n",
" myfile = open(\"C:\\\\Users\\\\YourUserName\\\\Home\\\\Folder\\\\myfile.txt\")\n",
"\n",
"For MacOS and Linux you use slashes in the opposite direction:\n",
"\n",
" myfile = open(\"/Users/YourUserName/Folder/myfile.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Open the text.txt file we created earlier\n",
"my_file = open('test.txt')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`my_file` is now an open file object held in memory. We'll perform some reading and writing exercises, and then we have to close the file to free up memory.\n",
"\n",
"### .read() and .seek()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Hello, this is a quick test file.\\nThis is the second line of the file.'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can now read the file\n",
"my_file.read()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"''"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# But what happens if we try to read it again?\n",
"my_file.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This happens because you can imagine the reading \"cursor\" is at the end of the file after having read it. So there is nothing left to read. We can reset the \"cursor\" like this:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Seek to the start of file (index 0)\n",
"my_file.seek(0)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Hello, this is a quick test file.\\nThis is the second line of the file.'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Now read again\n",
"my_file.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### .readlines()\n",
"You can read a file line by line using the readlines method. Use caution with large files, since everything will be held in memory. We will learn how to iterate over large files later in the course."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Hello, this is a quick test file.\\n', 'This is the second line of the file.']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Readlines returns a list of the lines in the file\n",
"my_file.seek(0)\n",
"my_file.readlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When you have finished using a file, it is always good practice to close it."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"my_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Writing to a File\n",
"\n",
"By default, the `open()` function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# Add a second argument to the function, 'w' which stands for write.\n",
"# Passing 'w+' lets us read and write to the file\n",
"\n",
"my_file = open('test.txt','w+')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Use caution!**
\n",
"Opening a file with 'w' or 'w+' *truncates the original*, meaning that anything that was in the original file **is deleted**!
"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"24"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Write to the file\n",
"my_file.write('This is a new first line')"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is a new first line'"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read the file\n",
"my_file.seek(0)\n",
"my_file.read()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"my_file.close() # always do this when you're done with a file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appending to a File\n",
"Passing the argument `'a'` opens the file and puts the pointer at the end, so anything written is appended. Like `'w+'`, `'a+'` lets us read and write to a file. If the file does not exist, one will be created."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"23"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_file = open('test.txt','a+')\n",
"my_file.write('\\nThis line is being appended to test.txt')\n",
"my_file.write('\\nAnd another line here.')"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is a new first line\n",
"This line is being appended to test.txt\n",
"And another line here.\n"
]
}
],
"source": [
"my_file.seek(0)\n",
"print(my_file.read())"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"my_file.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Appending with `%%writefile`\n",
"Jupyter notebook users can do the same thing using IPython cell magic:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Appending to test.txt\n"
]
}
],
"source": [
"%%writefile -a test.txt\n",
"\n",
"This is more text being appended to test.txt\n",
"And another line here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add a blank space if you want the first line to begin on its own line, as Jupyter won't recognize escape sequences like `\\n`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Aliases and Context Managers\n",
"You can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a context manager:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is a new first line\n",
"\n"
]
}
],
"source": [
"with open('test.txt','r') as txt:\n",
" first_line = txt.readlines()[0]\n",
" \n",
"print(first_line)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the `with ... as ...:` context manager automatically closed `test.txt` after assigning the first line of text to first_line:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "I/O operation on closed file.",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mtxt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: I/O operation on closed file."
]
}
],
"source": [
"txt.read()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iterating through a File"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This is a new first line\n",
"This line is being appended to test.txt\n",
"And another line here.\n",
"This is more text being appended to test.txt\n",
"And another line here."
]
}
],
"source": [
"with open('test.txt','r') as txt:\n",
" for line in txt:\n",
" print(line, end='') # the end='' argument removes extra linebreaks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! Now you should be familiar with formatted string literals and working with text files.\n",
"## Next up: Working with PDF Text"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}