Data Engineer (ASDE2)
Key Responsibilities
- Data Collection
- Data Storage
- Data Transformation
- Data Quality Assurance
Summary
- Developed Python and Selenium scripts for scraping Amazon Retail and Advertising Data to automate reporting.
- Built Data pipelines on GCP using PySpark, BigQuery and Cloud Composer for efficient data transformation.
- Optimized PySpark jobs and cluster resource allocation to reduce job runtime, decreasing Dataproc costs by 17%.
- Improved BigQuery performance with data partitioning and clustering, lowering query costs by 30%.
- Created a rule-based data validation framework with PySpark, saving 15 hours weekly manual validation efforts.
- Implemented a Looker dashboard to monitor data quality, enabling timely defect resolution and meet SLAs.
- Led knowledge sharing sessions and created workflow documentation on Confluence, simplifying the onboarding process for new team members.