Skip to content

Scala CLI app to query COVID data, visualized with Tableau & Zeppelin. Features a secure user/admin system via Spark SQL and bCrypt encryption.

Notifications You must be signed in to change notification settings

NewyorkMengHer/COVID-19-Data-Visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-19 Data Analysis & Visualization

Introduction

The "COVID-19 Data Analysis & Visualization" project is a comprehensive Spark application designed to provide deep insights into the vast datasets related to the COVID-19 pandemic. With an interactive Command Line Interface (CLI), users can seamlessly query and analyze the data, uncovering trends, patterns, and correlations that shed light on the global impact of the virus.

Objective

The primary goal of this project is to offer a tool that facilitates a deeper understanding of COVID-19 data. By identifying trends and patterns, we aim to provide a clearer picture of the pandemic's progression and its multifaceted impacts. Our team has crafted 10 analytical queries to delve into various aspects of the data, aiming to uncover meaningful insights.

Features

  • Agile Scrum: Implemented the Agile Scrum methodology for project work. We had a Scrum Master who served as the team lead, conducted daily scrum meetings, and reported any blockers or tasks completed at the end of each day.
  • Interactive CLI: A user-friendly interface to query and analyze COVID-19 data.
  • User/Admin System: A robust user and admin system integrated within the Scala console, ensuring data security and facilitating CRUD operations. Passwords are securely encrypted using bCrypt.
  • Visualization: Leveraging tools like Zeppelin (or Tableau), the project visualizes the analyzed data, making it easier to interpret and understand.
  • Analytical Queries: Our team developed 10 specific analytical queries to dive deep into the data. These queries can be found here. Some of the queries include:

Analytical Queries

Our team developed 10 specific analytical queries to dive deep into the data. These queries aim to uncover meaningful insights into various aspects of the COVID-19 pandemic. You can explore each query in detail using the links below:

Each query provides a unique perspective on the data, offering insights that can aid in understanding the pandemic's progression and impact.

Challenges

  • Data Cleaning: One of the significant challenges faced was cleaning the extensive dataset, which comprised over 200,000 rows. Ensuring accuracy and relevance was paramount to the project's success.

Technologies Used

  • Apache Spark
  • Spark SQL
  • YARN
  • HDFS
  • Scala 2.12.10
  • Git + GitHub
  • Zeppelin (or Tableau)

Conclusion

This project serves as a testament to the power of data analysis and visualization in understanding complex scenarios like a global pandemic. Whether you're a researcher, data analyst, or someone keen on understanding the nuances of COVID-19, this tool provides a comprehensive platform for exploration and discovery.

Contributors

This project was made possible thanks to the dedicated efforts of the following contributors:

We appreciate the hard work and collaboration of each team member in bringing this project to life.

About

Scala CLI app to query COVID data, visualized with Tableau & Zeppelin. Features a secure user/admin system via Spark SQL and bCrypt encryption.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published