Symptom-Disease Healthcare Chatbot

Project Description

The Symptom-Disease Healthcare Chatbot is a Python-based tool designed to predict potential diseases based on user-reported symptoms. Using a Naive Bayes classifier, the chatbot provides immediate diagnosis suggestions, helping users understand possible health conditions. The project involves data preprocessing, model training, and evaluation, all managed through a simple command-line interface.

Role and Contributions
  • Developed the idea for a healthcare chatbot to assist users in identifying possible diseases based on reported symptoms.
  • Collected and curated a dataset of symptoms and associated diseases.
  • Performed data preprocessing to clean and prepare the data for model training.
  • Implemented a Naive Bayes classifier to train the model on the symptom-disease dataset.
  • Created Python scripts for data loading, model training, and prediction functionalities.
  • Developed a command-line interface to interact with the chatbot and provide symptom-based predictions.
  • Conducted rigorous testing to ensure the accuracy and reliability of the model predictions.
Outcomes and Results
  • Successfully developed a chatbot that provides disease predictions based on user-reported symptoms.
  • Integrated a Naive Bayes classifier to analyze symptoms and suggest possible diseases with reasonable accuracy.
  • Achieved an improved accuracy rate with the final model, demonstrating the effectiveness of the implemented classification algorithm.
Technologies Used
  • Python: Used for developing the chatbot, implementing machine learning models, and data processing.
  • scikit-learn: For building and evaluating the Naive Bayes classification model.
  • pandas: For data manipulation and preprocessing.
  • NumPy: For numerical operations and handling arrays.
  • pytest: Used for writing and running unit tests to validate the chatbot’s functionality.
Challenges Faced and Solutions
  • Challenge: The initial dataset used for training the model had limited examples and did not cover a broad range of symptoms and diseases, leading to poor model performance and inaccurate predictions.
    Solution: Expanded the dataset by including a more comprehensive list of symptoms and diseases to provide the model with a richer set of examples.Improved data preprocessing to clean and standardize the data, making it more suitable for training.

  • Challenge: The Naive Bayes model initially showed low accuracy, as evidenced by its performance metrics and confusion matrix.
    Solution: Fine-tuned the model parameters and re-evaluated the training process to improve accuracy.Investigated other classification algorithms, such as Logistic Regression or Support Vector Machines (SVM), to potentially enhance performance.