Java Engineer For Bigdata Search

last updated June 20, 2021 1:05 UTC

HQ: Remote

Full-Time
Full-Stack Programming

more jobs in this category:

Spinn3r is a social media and analytics company looking for a talented Java big data Engineer. We’re primarily interested in someone with experience delivering high quality and accurate data derived from web content.

Spinn3r provides high quality weblog and social media data for analytics, search, and social media monitoring companies. We’ve been in business for over 7 years now and just recently completed a large business pivot and a relaunch.

We’re in the process of shipping new products so it’s an exciting time to jump on board!

Remote Work

This is a remote position. The main team operates from UTC-08:00 in San Francisco, CA in the west coast of the United States. You must be willingto work with at least 4 hours overlap.

This ideally works for anyone in central Europe and South America as it’s closerto our main timezone.

Additionally you must be highly motivated and able to work independently.

Ideal Candidate

We’re interested in someone comfortable with a generalist and devops role. You should be knowledgeable of standard system administration tasks and have a firm understanding of the role of load balancers and cluster architecture. It’s 100x harder to write code if you don’t know how the underlying operating system works.

We’re looking for someone with a legitimate passion for technology, big data, and analyzing vast amounts of content.

We are also looking for people outside of the U.S. and Canada to maximize our time zone distribution.Ideally there should beleast a 4 houroverlap with the Pacific Standard Time Zone (PST / UTC-8). We’re based out of San Francisco but are migrating to the international level. If you don’t have a natural time overlap with UTC-8 you should be willing to work evenings to be able to communicate easily with the rest of the team.

Culturally, we’re a remote company and want to embrace it as a way to reward our employees.We are fine with you working in remote locations as long as you’re generally available for communication and are productive.

We want someone to come in full time as a contractor role. I suspect we will need about 40 hours from you per week.

Job Responsibilities:

Understanding our crawler infrastructure and ensuring top quality metadata for our customers. There’s a significant batch job component to analyze the output from the crawl to ensure top quality data.

Making sure our infrastructure is fast, reliable, fault tolerant, etc. At times this may involve diving into the source of tools like ActiveMQ, Cassandra and understand how the internals work. We contribute a LOT to Open Source development if our changes need to be given back to the community.

Building out new products and technology that will directly interface with customers. This includes cool features like full text search, analytics, etc. It’s extremely rewarding to build something from ground up and push it to customers directly.

Architecture:

Our infrastructure consists of Java on Linux (Debian/Ubuntu) with the stack running on ActiveMQ, Cassandra, Zookeeper, and Jetty. We use Ansible to manage our boxes. We have a full-text search engine based on Elasticsearch and store our firehose API data within Cassandra.

We have a totally new stack and infrastructure at this point. We recently did a full-stack rewrite and moved all the old code to our new infrastructure. This means we have very little legacy cruft to deal with.

Here’s all the cool stuff you get to play with:

Large Linux / Ubuntu cluster running with the OS versioned using both Ansible and our own debian packages for software distribution.

Massive amount of data indexed from the web and social media. We index from 5-20TB of data per month and want to expand to 100TB of data per month.

Large Cassandra install on SSD.

SOLR / Elastic Search migration / install. We’re experimenting with bringing this up now so it would be valuable to get your feedback.

Technical Skills:

Here’s where you shine! we’re looking for someone with a number of the following requirements:

Linux. Linux. Linux. Did I say Linux? We like Linux.

Experience in modern Java development and associated tools. Maven, IntelliJ IDEA, Guice (dependency injection)

A passion for testing, continuous integration, and continuous delivery.

Cassandra. Stores content indexed by our crawler.

ActiveMQ. Powers our queue server for scheduling crawl work.

A general understanding and passion for distributed systems.

Ansible or equivalent experience with configuration management.
Standard web API use and design. (HTTP, JSON, XML, HTML, etc).

Cultural Fit:

We’re a lean startup and very driven by our interaction with customers, as well as their happiness and satisfaction. Our philosophy is that you shouldn’t be afraid to throw away a week’s worth of work if our customersaren’t interested in moving in that direction.

We hold the position that our customers are 1000x smarter than we are and we try to listen to them intently, and consistently.

Proficiency in English is a requirement. Since you will have colleagues in various countries with various primary language skills we all need to use English as our common company language. You must also be able to work with email, draft proposals, etc. Internally we work as a large distributed Open Source project and use tools like email, slack, Google Hangouts, and Skype.

Familiarity working with a remote team and ability (and desire) to work for a virtual company. Should have a home workstation, fast Internet access, etc.

Must be able to mange your own time and your own projects. Self-motivated employees will fit in well with the rest of the team.

It goes without saying but being friendly and a team player is very important.

Compensation:

Salary based on experience. We’re willing to be competitive and a great company to work for.
Ability to work remotely at home. Live work balance is a must.