Create an AI project, focusing on Natural Language Processing (NLP). We’ll build a sentiment analysis model using Python and the popular NLP library, NLTK (Natural Language Toolkit).
1. Project Setup:
- Create a new Python project or script.
- Install necessary libraries:
pip install nltk
2. Data Preparation:
- Download a dataset for sentiment analysis. You can use a pre-existing dataset or a tool like NLTK to fetch one. For simplicity, let’s use the NLTK movie reviews dataset:
import nltk
from nltk.corpus import movie_reviews
# Download the movie reviews dataset
nltk.download('movie_reviews')
# Get the movie reviews and their labels
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
3. Data Preprocessing:
- Preprocess the data by extracting features and splitting it into training and testing sets:
import random
# Shuffle the documents
random.shuffle(documents)
# Define feature extraction function
def document_features(document):
words = set(document)
features = {word: (word in words) for word in word_features}
return features
# Extract features for each document
featuresets = [(document_features(d), c) for (d, c) in documents]
# Split the dataset into training and testing sets
train_set, test_set = featuresets[:1600], featuresets[1600:]
4. Model Training:
- Train a simple Naive Bayes classifier using NLTK:
from nltk import NaiveBayesClassifier
# Train Naive Bayes classifier
classifier = NaiveBayesClassifier.train(train_set)
5. Model Evaluation:
- Evaluate the model on the test set:
# Check the accuracy of the classifier
accuracy = nltk.classify.accuracy(classifier, test_set)
print(f"Accuracy: {accuracy * 100:.2f}%")
6. Prediction:
- Use the trained model to analyze the sentiment of new text:
# Example: Predict sentiment for a new text
new_text = "This movie is fantastic!"
features = document_features(new_text.split())
sentiment = classifier.classify(features)
print(f"Predicted Sentiment: {sentiment}")
7. Project Conclusion:
- Summarize the project’s goals, outcomes, and potential improvements.
- Include any insights gained from analyzing the results.
This project provides a simple sentiment analysis model using NLTK. You can explore more advanced NLP techniques, use larger datasets, and experiment with different classifiers for sentiment analysis. Additionally, visualizations and in-depth analysis of misclassifications could be added for a more comprehensive understanding of the model’s performance.