Data Engineer (ASDE2)


Key Responsibilities


  • Data Collection
  • Data Storage
  • Data Transformation
  • Data Quality Assurance

Summary


  • Developed Python and Selenium scripts for scraping Amazon Retail and Advertising Data to automate reporting.
  • Built Data pipelines on GCP using PySpark, BigQuery and Cloud Composer for efficient data transformation.
  • Optimized PySpark jobs and cluster resource allocation to reduce job runtime, decreasing Dataproc costs by 17%.
  • Improved BigQuery performance with data partitioning and clustering, lowering query costs by 30%.
  • Created a rule-based data validation framework with PySpark, saving 15 hours weekly manual validation efforts.
  • Implemented a Looker dashboard to monitor data quality, enabling timely defect resolution and meet SLAs.
  • Led knowledge sharing sessions and created workflow documentation on Confluence, simplifying the onboarding process for new team members.