Looking for a talented Data Engineer to help us in our data-driven mission to reshape the world of music and the spoken word. You will work in a highly collaborative team of engineers, alongside data scientists and analysts, to distill existing data processes, import new external data sources, and create complex data mashups. Your work will provide valuable insights and power important music data products. Expect to build high throughput data pipelines and improve the existing big data infrastructure. You will also improve performance, squash bugs, and increase visibility across the data ecosystem. You will have end-to-end ownership of your code, though ideally you also relish reviewing a good pull request. If you enjoy working with large sets of data, and the challenges associated with them, this is the role for you!
You Like:

Working in an Agile development methodology and owning data driven solutions end-to-end

Experimenting with various frameworks in the Big Data ecosystem to identify the optimal approach for extracting insights from our datasets

Identifying performance bottlenecks in data pipelines and architecting faster, more efficient solutions when necessary

Creating new data warehouse solutions as well as defining and demonstrating best practices in schema and table design in varied databases like Hive, Redshift, Spectrum etc.

Developing end-to-end batch and real-time pipelines for large data sets to our Hadoop/Spark clusters, and bringing summarized results back into a data warehouse for downstream business analysis

Increasing efficiency and automating processes by collaborating with our SRE team to update existing data infrastructure (data model, hardware, cloud services, etc.)

Designing, building, launching and maintaining efficient and reliable data pipelines in production

Designing, developing, and owning new systems and tools to enable our consumers to understand and analyze the data more quickly

You Have:

Experience ingesting, processing, storing, and querying large datasets

The ability to write well-abstracted, reusable code components in Python, Scala or similar language(s)

The ability to investigate data issues across a large and complex system by working alongside multiple departments and systems

The drive to own the products and pipelines you develop

Experience working in a Hadoop/Spark ecosystem, especially with AWS technologies (S3, Redshift, EC2, RDS, EMR, Dynamo), is a plus

Experience with configuration management tools (Ansible, Chef, Puppet, etc.), is a plus

Experience with Spark, Kafka, similar is a plus