Scrapinghub is looking for a Senior Backend Engineer to develop and grow a new web crawling and extraction SaaS.
The new SaaS will include our recently releasedAutoExtractwhich provides an API for automated e-commerce and article extraction from web pages using Machine Learning. AutoExtract is a distributed application written in Java, Scala and Python; components communicate via Apache Kafka and HTTP, and orchestrated using Kubernetes.
You will be designing and implementing distributed systems: large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. – this is going to be a challenging journey for any backend engineer!
As a Senior Backend Engineer, you will have a large impact on the system we’re building, the new SaaS is still in the early stages of development.
Job Responsibilities:
Work on the core platform: develop and troubleshoot Kafka-based distributed application, write and change components implemented in Java, Scala and Python.
Work on new features, including design and implementation. You should be able to own and be responsible for the complete lifecycle of your features and code.
Solve distributed systems problems, such as scalability, transparency, failure handling, security, multi-tenancy.
Requirements
3+ years of experience building large scale data processing systems or high load services
Strong background in algorithms and data structures.
Strong track record in at least two of these technologies: Java, Scala, Python, C++. 3+ years of experience with at least one of them.
Experience working with Linux and Docker.
Good communication skills in English.
Computer Science or other engineering degree.
Bonus points for:
Kubernetes experience
Apache Kafka experience
Experience building event-driven architectures
Understanding of web browser internals
Good knowledge of at least one RDBMS.
Knowledge of today’s cloud provider offerings: GCP, Amazon AWS, etc.
Web data extraction experience: web crawling, web scraping.
Experience with web data processing tasks: finding similar items, mining data streams, link analysis, etc.
History of open source contributions

