284 lines
7.2 KiB
Plaintext
284 lines
7.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"source": [
|
|
"___\n",
|
|
"\n",
|
|
"<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
|
|
"___"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Sentiment Analysis Assessment - Solution\n",
|
|
"\n",
|
|
"## Task #1: Perform vector arithmetic on your own words\n",
|
|
"Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Import spaCy and load the language library. Remember to use a larger model!\n",
|
|
"import spacy\n",
|
|
"nlp = spacy.load('en_core_web_md')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Choose the words you wish to compare, and obtain their vectors\n",
|
|
"word1 = nlp.vocab['wolf'].vector\n",
|
|
"word2 = nlp.vocab['dog'].vector\n",
|
|
"word3 = nlp.vocab['cat'].vector"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Import spatial and define a cosine_similarity function\n",
|
|
"from scipy import spatial\n",
|
|
"\n",
|
|
"cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Write an expression for vector arithmetic\n",
|
|
"# For example: new_vector = word1 - word2 + word3\n",
|
|
"new_vector = word1 - word2 + word3"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"['maned', 'wolfs', 'wolf', 'lynx', 'wolve', 'yotes', 'canids', 'boars', 'foxes', 'wolfdogs']\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# List the top ten closest vectors in the vocabulary to the result of the expression above\n",
|
|
"computed_similarities = []\n",
|
|
"\n",
|
|
"for word in nlp.vocab:\n",
|
|
" if word.has_vector:\n",
|
|
" if word.is_lower:\n",
|
|
" if word.is_alpha:\n",
|
|
" similarity = cosine_similarity(new_vector, word.vector)\n",
|
|
" computed_similarities.append((word, similarity))\n",
|
|
"\n",
|
|
"computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])\n",
|
|
"\n",
|
|
"print([w[0].text for w in computed_similarities[:10]])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def vector_math(a,b,c):\n",
|
|
" new_vector = nlp.vocab[a].vector - nlp.vocab[b].vector + nlp.vocab[c].vector\n",
|
|
" computed_similarities = []\n",
|
|
"\n",
|
|
" for word in nlp.vocab:\n",
|
|
" if word.has_vector:\n",
|
|
" if word.is_lower:\n",
|
|
" if word.is_alpha:\n",
|
|
" similarity = cosine_similarity(new_vector, word.vector)\n",
|
|
" computed_similarities.append((word, similarity))\n",
|
|
"\n",
|
|
" computed_similarities = sorted(computed_similarities, key=lambda item: -item[1])\n",
|
|
"\n",
|
|
" return [w[0].text for w in computed_similarities[:10]]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['king',\n",
|
|
" 'queen',\n",
|
|
" 'commoner',\n",
|
|
" 'highness',\n",
|
|
" 'prince',\n",
|
|
" 'sultan',\n",
|
|
" 'maharajas',\n",
|
|
" 'princes',\n",
|
|
" 'kumbia',\n",
|
|
" 'kings']"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Test the function on known words:\n",
|
|
"vector_math('king','man','woman')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Task #2: Perform VADER Sentiment Analysis on your own review\n",
|
|
"Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Import SentimentIntensityAnalyzer and create an sid object\n",
|
|
"from nltk.sentiment.vader import SentimentIntensityAnalyzer\n",
|
|
"\n",
|
|
"sid = SentimentIntensityAnalyzer()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Write a review as one continuous string (multiple sentences are ok)\n",
|
|
"review = 'This movie portrayed real people, and was based on actual events.'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Obtain the sid scores for your review\n",
|
|
"sid.polarity_scores(review)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### CHALLENGE: Write a function that takes in a review and returns a score of \"Positive\", \"Negative\" or \"Neutral\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def review_rating(string):\n",
|
|
" scores = sid.polarity_scores(string)\n",
|
|
" if scores['compound'] == 0:\n",
|
|
" return 'Neutral'\n",
|
|
" elif scores['compound'] > 0:\n",
|
|
" return 'Positive'\n",
|
|
" else:\n",
|
|
" return 'Negative'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Neutral'"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Test the function on your review above:\n",
|
|
"review_rating(review)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Great job!"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.7"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|