Data Warehouse:
proposal & implementation
- Architect
- Data Engineer
In today's data-driven world, organizations rely heavily on accurate and accessible data to make informed decisions and gain a competitive edge. As the only Data Scientist at Sonova AC US, I embarked on a groundbreaking data engineering project that aimed to revolutionize how data was managed and utilized within the Alpaca Audiology Brand of Sonova AC US. This endeavor addressed critical challenges surrounding data reliability, integrity, accessibility, and Alpaca's reliance on a third-party analytics provider, ultimately resulting in a robust data warehouse solution.
Before embarking on this project, Sonova AC US, particularly the Alpaca Audiology brand, faced many issues that hindered its ability to leverage data effectively. Data was scattered across multiple sources and systems, making the integration and consolidation process complex and error-prone. Moreover, the organization heavily relied on external partners for data analytics and business intelligence, leading to limited control and frequent delays in accessing crucial insights.
Recognizing the pressing need for a robust data infrastructure, I built a comprehensive data warehouse to revolutionize how Alpaca Audiology managed and utilized its data. The overarching objective was to establish a centralized repository to serve as the authoritative source of truth, ensuring data reliability, integrity, and accessibility throughout the organization.
Play-by-play
January / 2023
Planning future Analytics for US Marketing Latest
After the success with the Data Warehouse and the integrations with global dashboards, I began to work with the Marketing Team in the US to develop new processes that would allow us to create on-demand dashboards that are always up-to-date by being integrated directly with the Data Warehouse.
Integrations with Global Dashboards
After deploying the resources, we began integrating the new Data Warehouse with the global Sonova infrastructure to allow for on-demand dashboards.
Cloud Access
After many months of waiting, I finally got the required permissions to deploy the Data Warehouse onto Sonova's Azure account.
November / 2022
Building
With access to the data, I started implementing schemas and the code to process the data using dbt, Prefect, and Airbyte.
October / 2022
Partial Access
After requesting access, I received access to the data but without the needed infrastructure to fully deploy the planned data warehouse.
September / 2022
Requesting access to the data and infrastrucutre
To kickstart the project, a crucial step was requesting access to the necessary data sources and infrastructure. Given that our data in the US was stored and managed by our external partners. I had to request access to the data and get the appropriate access to Azure to implement the planned changes.
August / 2022
Research
Guided by the goal of building a scalable and efficient Data Infrastructure, with the restriction that it had to be hosted through Microsoft Azure, I began researching how to best create a Data Warehouse given our limitations.
After careful evaluation, I determined that implementing an Extract-Load-Transform (ELT) process through the combination of dbt, Prefect, Airbyte, Azure Data Factory, and Azure Synapse would provide the best solution. This approach offered several advantages, including developing reliable, self-documented code and establishing robust data replication mechanisms with the ability to iterate on Machine Learning models and Dashboards quickly through Synapse Analytics.
Delays, and Unreliable data
After repeated delays, many bugs found, recieving documentationless data, and finding issues with the integrity and quality of the data we received from our third-party analytics providers. I decided to take the initiative and work on centralizing, cleaning, and operationalizing our data.