Skip to content

Natural Language Processing projects with text generation, sentiment analysis, and fundamental NLP techniques using TensorFlow and NLTK.

License

Notifications You must be signed in to change notification settings

Uzi-gpu/nlp-projects

Repository files navigation

πŸ’¬ NLP Projects

Python TensorFlow NLTK License

A collection of Natural Language Processing projects demonstrating expertise in Text Generation, Sequence Modeling, and Language Understanding using TensorFlow, NLTK, and modern NLP techniques.


πŸ“‹ Table of Contents


πŸš€ Projects Overview

# Project Task Notebook Technique
1 Text Generator Language Modeling 01_text_generator.ipynb RNN/LSTM Sequence Generation
2 NLP Final Project Comprehensive NLP 02_nlp_final_project.ipynb Multiple NLP Tasks

πŸ› οΈ Technologies Used

Core NLP Libraries

  • TensorFlow/Keras - Deep learning for NLP
  • NLTK - Natural Language Toolkit
  • spaCy - Industrial-strength NLP
  • Transformers - State-of-the-art models (optional)

Text Processing

  • Tokenization - Word and sentence splitting
  • Lemmatization & Stemming - Word normalization
  • Stop Words Removal - Text cleaning
  • Word Embeddings - Word2Vec, GloVe

Deep Learning for NLP

  • RNNs - Recurrent Neural Networks
  • LSTMs - Long Short-Term Memory
  • GRUs - Gated Recurrent Units
  • Attention Mechanisms - Focus on relevant parts

πŸ“¦ Installation

Prerequisites

  • Python 3.8 or higher

Setup Instructions

  1. Clone the repository

    git clone https://github.com/uzi-gpu/nlp-projects.git
    cd nlp-projects
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\\Scripts\\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Download NLTK data (if needed)

    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')
    nltk.download('wordnet')
  5. Launch Jupyter Notebook

    jupyter notebook

πŸ“Š Project Details

1. πŸ“ Text Generator

File: 01_text_generator.ipynb

Objective: Build a character-level or word-level text generator using Recurrent Neural Networks

Task: Language Modeling & Text Generation

Architecture:

  • Input: Sequences of characters/words
  • Model: LSTM/GRU layers
  • Output: Next character/word prediction

Implementation:

1. Data Preprocessing:

  • βœ… Text corpus loading
  • βœ… Tokenization (character or word-level)
  • βœ… Sequence creation
  • βœ… Vocabulary building
  • βœ… One-hot encoding or embeddings

2. Model Architecture:

Model: Sequential
β”œβ”€β”€ Embedding Layer (word-level) OR Input Layer (char-level)
β”œβ”€β”€ LSTM/GRU Layers (stacked)
β”œβ”€β”€ Dropout (regularization)
β”œβ”€β”€ Dense Layer
└── Softmax (probability distribution)

3. Training:

  • βœ… Teacher forcing
  • βœ… Cross-entropy loss
  • βœ… Adam optimizer
  • βœ… Perplexity tracking

4. Text Generation:

  • βœ… Seed text input
  • βœ… Sampling strategies (greedy, temperature, top-k)
  • βœ… Beam search (optional)
  • βœ… Diverse output generation

Key Features:

  • Character-level generation for creative text
  • Word-level generation for coherent sentences
  • Temperature-controlled creativity
  • Sequence padding and batching

Applications:

  • Creative writing assistance
  • Code generation
  • Poetry/story generation
  • Chatbot responses

2. πŸŽ“ NLP Final Project

File: 02_nlp_final_project.ipynb

Objective: Comprehensive NLP project covering multiple language processing tasks

Tasks Covered:

1. Text Preprocessing Pipeline:

  • βœ… Tokenization
  • βœ… Lowercasing
  • βœ… Stop words removal
  • βœ… Punctuation handling
  • βœ… Lemmatization/Stemming
  • βœ… Text normalization

2. Feature Extraction:

  • βœ… Bag of Words (BoW)
  • βœ… TF-IDF (Term Frequency-Inverse Document Frequency)
  • βœ… N-grams
  • βœ… Word embeddings (Word2Vec, GloVe)

3. NLP Tasks:

  • Text Classification
  • Sentiment Analysis
  • Named Entity Recognition (NER)
  • Part-of-Speech (POS) Tagging
  • Text Summarization
  • Language Translation (if applicable)

4. Advanced Techniques:

  • βœ… Sequence-to-Sequence models
  • βœ… Attention mechanisms
  • βœ… Transfer learning with pre-trained models
  • βœ… Fine-tuning BERT/GPT (optional)

Pipeline:

Raw Text β†’ Preprocessing β†’ Feature Extraction β†’ Model Training β†’ Evaluation β†’ Deployment

Evaluation Metrics:

  • Classification: Accuracy, Precision, Recall, F1-Score
  • Generation: BLEU, ROUGE, Perplexity
  • NER: Entity-level F1

πŸ“š Key NLP Concepts Demonstrated

Text Preprocessing

  1. Tokenization - Breaking text into words/sentences
  2. Normalization - Lowercasing, stemming, lemmatization
  3. Stop Words - Removing common words
  4. Special Characters - Cleaning punctuation

Feature Engineering

  1. Bag of Words - Simple word frequency
  2. TF-IDF - Term importance weighting
  3. Word Embeddings - Dense vector representations
  4. Contextual Embeddings - BERT, ELMo

Sequence Modeling

  1. RNNs - Recurrent architectures
  2. LSTMs - Long-term dependencies
  3. GRUs - Gated mechanisms
  4. Bidirectional RNNs - Context from both directions

Advanced NLP

  1. Attention Mechanisms - Focus on relevant parts
  2. Transformer Architecture - Self-attention
  3. Transfer Learning - Pre-trained models
  4. Fine-tuning - Task-specific adaptation

πŸ† Results

Text Generator

  • Perplexity: Achieved low perplexity indicating good language modeling
  • Coherence: Generated text shows grammatical structure
  • Creativity: Temperature parameter controls diversity
  • Quality: Longer sequences maintain context

NLP Final Project

  • Classification Accuracy: High performance on text classification tasks
  • Feature Engineering: TF-IDF outperforms BoW
  • Model Comparison: Deep learning models excel on complex tasks
  • Pipeline: End-to-end NLP workflow successfully implemented

πŸŽ“ Learning Outcomes

Through these projects, I have demonstrated proficiency in:

  1. NLP Fundamentals

    • Text preprocessing and cleaning
    • Tokenization strategies
    • Feature extraction techniques
    • Vocabulary management
  2. Deep Learning for NLP

    • Recurrent architectures (RNN, LSTM, GRU)
    • Sequence-to-sequence models
    • Attention mechanisms
    • Loss functions for language tasks
  3. Practical NLP

    • Data pipeline creation
    • Model training and evaluation
    • Text generation strategies
    • Real-world application development
  4. Advanced Topics

    • Transfer learning in NLP
    • Word embeddings
    • Language modeling
    • Evaluation metrics (BLEU, perplexity)

πŸ“§ Contact

Uzair Mubasher - BSAI Graduate

LinkedIn Email GitHub


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • NLTK and spaCy communities
  • TensorFlow/Keras documentation
  • NLP course instructors and resources

⭐ If you found this repository helpful, please consider giving it a star!

About

Natural Language Processing projects with text generation, sentiment analysis, and fundamental NLP techniques using TensorFlow and NLTK.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published