Skip to content

Bitcoin Price Prediction

Python TensorFlow scikit-learn XGBoost NLTK

GitHub


The Problem

Cryptocurrency markets are notoriously volatile and influenced by both quantitative financial indicators and qualitative sentiment from news and social media. Pure time-series models miss the sentiment signal, while pure NLP approaches ignore the financial fundamentals. An effective prediction system needs both.

The Approach

This project builds an NLP-driven Bitcoin price prediction system that combines sentiment analysis on news headlines with financial indicators to forecast price movements.

Sentiment Pipeline

News headlines are processed through NLP models to extract sentiment scores. These scores capture the market mood — fear, optimism, uncertainty — that often precedes price movements.

Feature Engineering

Financial indicators (moving averages, RSI, MACD, trading volume) are computed alongside the sentiment features, creating a rich multi-modal feature set for each prediction window.

Model Benchmarking

The system benchmarks 18 different ML models across classical and deep learning approaches:

  • Classical ML — Linear Regression, Random Forest, XGBoost, LightGBM, SVM, KNN, and more
  • Deep Learning — CNN and LSTM architectures designed for sequential financial data
  • Ensemble methods — combining multiple model predictions

Key Features

  • 18 ML models benchmarked head-to-head on the same dataset
  • NLP sentiment analysis from real news headlines using NLTK
  • Multi-modal features — financial indicators + sentiment scores
  • CNN/LSTM architectures for capturing temporal patterns
  • Comprehensive evaluation with multiple regression metrics

Architecture

graph TD
    A[News Headlines] --> B[Sentiment Analysis]
    C[Price History] --> D[Financial Indicators]
    B --> E[Feature Matrix]
    D --> E
    E --> F[18 ML Models]
    E --> G[CNN/LSTM]
    F --> H[Ensemble Prediction]
    G --> H

Tech Stack

Component Technology
NLP NLTK
Deep Learning TensorFlow, Keras
Classical ML scikit-learn, XGBoost
Data Processing Pandas, NumPy
Visualization Matplotlib, Seaborn