CS3TM: Text Mining and Natural Language Processing

Website: CS3TM

This repository contains the coursework for the CS3TM module. The coursework consists of two parts: a report and a jupyter notebook. The report contains the analysis of the results and the conclusion of the project. The jupyter notebook contains the code for Data Gathering and Processing, Building the Logistic Regression Classifier and Data Analysis at the end. Data used in this project is from fetch_20newsgroups from sklearn.datasets. The dataset contains 20 different newsgroups, each representing a different topic. The goal of this project is to build a logistic regression classifier to classify the newsgroups based on the text data. However, as per the coursework requirements, I will only use 2 newsgroups for the classification task. The two newsgroups used in this project are comp.sys.ibm.pc.hardware and alt.atheism.

Resources