ELT for New York City (NYC) Collision Dataset.
Please, visit my profile 😀
This project was originally conceived in the year 2022, when I was employed by a data analytics company, but I was unable to complete it because of other commitments.
I’ve been considering finishing it a lot lately, so I’ve moved forward to do so.
It fascinates me and serves as a good illustration of how a straightforward data integration may be carried out.
I sincerely hope you find this as fascinating as I do, and any help would be appreciated.
First and foremost, I am aware that a simple Jupyter notebook would have sufficed for this project. However, the objective of this project revolves around developing a more intricate data integration process.
Although I have previous experience in a data analysis environment, I must admit that I do not possess extensive knowledge about data integration. Therefore, I recommend delving into additional resources to obtain a comprehensive understanding of this subject! :).
When it comes to designing this project, the utilization of either ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) approaches was taken into consideration.
The key distinction lies in the fact that ELT carries out data transformations directly within the data warehouse. In contrast, ETL involves staging procedures before the data can be transmitted to the warehouse.
Given my preference to avoid managing multiple systems for data storage, I decided to stick to the ELT approach.
I’ll be using Docker to set up the development environment since I’m used to it and I very like it.
I would love your contributions and I’ll do my best to provide you with mentorship and support. If you are looking for an issue to tackle, take a look at issues labeled Good first issue.
Get more details in the Contributing Guide.
Please, do not create a regular Issue for reporting a Security issue.
See the Security Policy to known more about the procedure details.
June 17th, 2023.