COVID-19 Data Analysis & Visualization

Introduction

The "COVID-19 Data Analysis & Visualization" project is a comprehensive Spark application designed to provide deep insights into the vast datasets related to the COVID-19 pandemic. With an interactive Command Line Interface (CLI), users can seamlessly query and analyze the data, uncovering trends, patterns, and correlations that shed light on the global impact of the virus.

Objective

The primary goal of this project is to offer a tool that facilitates a deeper understanding of COVID-19 data. By identifying trends and patterns, we aim to provide a clearer picture of the pandemic's progression and its multifaceted impacts. Our team has crafted 10 analytical queries to delve into various aspects of the data, aiming to uncover meaningful insights.

Features

Agile Scrum: Implemented the Agile Scrum methodology for project work. We had a Scrum Master who served as the team lead, conducted daily scrum meetings, and reported any blockers or tasks completed at the end of each day.
Interactive CLI: A user-friendly interface to query and analyze COVID-19 data.
User/Admin System: A robust user and admin system integrated within the Scala console, ensuring data security and facilitating CRUD operations. Passwords are securely encrypted using bCrypt.
Visualization: Leveraging tools like Zeppelin (or Tableau), the project visualizes the analyzed data, making it easier to interpret and understand.
Analytical Queries: Our team developed 10 specific analytical queries to dive deep into the data. These queries can be found here. Some of the queries include:

Analytical Queries

Our team developed 10 specific analytical queries to dive deep into the data. These queries aim to uncover meaningful insights into various aspects of the COVID-19 pandemic. You can explore each query in detail using the links below:

Each query provides a unique perspective on the data, offering insights that can aid in understanding the pandemic's progression and impact.

Challenges

Data Cleaning: One of the significant challenges faced was cleaning the extensive dataset, which comprised over 200,000 rows. Ensuring accuracy and relevance was paramount to the project's success.

Technologies Used

Apache Spark
Spark SQL
YARN
HDFS
Scala 2.12.10
Git + GitHub
Zeppelin (or Tableau)

Conclusion

This project serves as a testament to the power of data analysis and visualization in understanding complex scenarios like a global pandemic. Whether you're a researcher, data analyst, or someone keen on understanding the nuances of COVID-19, this tool provides a comprehensive platform for exploration and discovery.

Contributors

This project was made possible thanks to the dedicated efforts of the following contributors:

Jaceguai De Magalhaes - Scrum Master / Data Visualization with Zepplin
Newyork Her - Analytical Queries
Brandon Cho - Data Cleaning / Encryption
Jack Nguyen - User/Admin System
Aaron Schomer - Data Visualization with Tableau

We appreciate the hard work and collaboration of each team member in bringing this project to life.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
documents		documents
images		images
maindb		maindb
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Data Analysis & Visualization

Introduction

Objective

Features

Analytical Queries

Challenges

Technologies Used

Conclusion

Contributors

About

Releases

Packages

Contributors 6

Languages

NewyorkMengHer/COVID-19-Data-Visualization

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Data Analysis & Visualization

Introduction

Objective

Features

Analytical Queries

Challenges

Technologies Used

Conclusion

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages