Looking for a talented Data Engineer to help us in our data-driven mission to reshape the world of music and the spoken word. You will work in a highly collaborative team of engineers, alongside data scientists and analysts, to distill existing data processes, import new external data sources, and create complex data mashups. Your work will provide valuable insights and power important music data products. Expect to build high throughput data pipelines and improve the existing big data infrastructure. You will also improve performance, squash bugs, and increase visibility across the data ecosystem. You will have end-to-end ownership of your code, though ideally you also relish reviewing a good pull request. If you enjoy working with large sets of data, and the challenges associated with them, this is the role for you!
Working in an Agile development methodology and owning data driven solutions end-to-end
Experimenting with various frameworks in the Big Data ecosystem to identify the optimal approach for extracting insights from our datasets
Identifying performance bottlenecks in data pipelines and architecting faster, more efficient solutions when necessary
Creating new data warehouse solutions as well as defining and demonstrating best practices in schema and table design in varied databases like Hive, Redshift, Spectrum etc.
Developing end-to-end batch and real-time pipelines for large data sets to our Hadoop/Spark clusters, and bringing summarized results back into a data warehouse for downstream business analysis
Increasing efficiency and automating processes by collaborating with our SRE team to update existing data infrastructure (data model, hardware, cloud services, etc.)
Designing, building, launching and maintaining efficient and reliable data pipelines in production
Designing, developing, and owning new systems and tools to enable our consumers to understand and analyze the data more quickly
Experience ingesting, processing, storing, and querying large datasets
The ability to write well-abstracted, reusable code components in Python, Scala or similar language(s)
The ability to investigate data issues across a large and complex system by working alongside multiple departments and systems
The drive to own the products and pipelines you develop
Experience working in a Hadoop/Spark ecosystem, especially with AWS technologies (S3, Redshift, EC2, RDS, EMR, Dynamo), is a plus
Experience with configuration management tools (Ansible, Chef, Puppet, etc.), is a plus
Experience with Spark, Kafka, similar is a plus