Initial commit - NLP text classification project

This commit is contained in:
202210715288 FATAH SABILA ROSYAD 2026-01-19 18:18:20 +07:00
commit d53f7ec5a6
5 changed files with 54 additions and 0 deletions

20
README.md Normal file
View File

@ -0,0 +1,20 @@
# Klasifikasi Topik Berita Menggunakan NLP
Project ini mengimplementasikan Natural Language Processing (NLP)
untuk mengklasifikasikan teks berita Bahasa Indonesia ke dalam
tiga kategori: Politik, Olahraga, dan Teknologi.
## Metode
- Preprocessing teks
- TF-IDF
- Multinomial Naive Bayes
## Tools
- Python
- Scikit-learn
- Streamlit
## Cara Menjalankan
1. Install dependency
pip install -r requirements.txt
2. Jalankan aplikasi
streamlit run app.py

28
app.py Normal file
View File

@ -0,0 +1,28 @@
import streamlit as st
import joblib
import re
# Load model & vectorizer
model = joblib.load("model_nb.pkl")
vectorizer = joblib.load("tfidf_vectorizer.pkl")
st.title("📰 Klasifikasi Topik Berita (NLP)")
st.write("Masukkan teks berita berbahasa Indonesia")
text = st.text_area("Teks Berita", height=200)
def preprocess_text(text):
text = text.lower()
text = re.sub(r"http\S+", "", text)
text = re.sub(r"[^a-zA-Z\s]", " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
if st.button("Klasifikasikan"):
if text.strip() == "":
st.warning("Teks tidak boleh kosong!")
else:
clean_text = preprocess_text(text)
text_tfidf = vectorizer.transform([clean_text])
prediction = model.predict(text_tfidf)[0]
st.success(f"Prediksi Topik: **{prediction}**")

BIN
model_nb.pkl Normal file

Binary file not shown.

6
requirements.txt Normal file
View File

@ -0,0 +1,6 @@
streamlit
scikit-learn
pandas
numpy
joblib
Sastrawi

BIN
tfidf_vectorizer.pkl Normal file

Binary file not shown.