Web Scraping Engineer
Signalor bridges the gap between culture and intelligence — aggregating public sentiment from Reddit, TikTok, YouTube, and beyond into actionable insights for investors, brands, and agencies. We're a small, fast-moving team and we're building the data infrastructure that powers it all. If you care deeply about the mechanics of how data moves across the internet and want to see your code run against millions of real posts, this is the role.
What you'll do
- Design and maintain scrapers for Reddit (PRAW), TikTok (Apify), YouTube Data API, and other platforms — handling rate limits, retries, deduplication, and quota management at scale
- Build and extend headless browser automations using Playwright for platforms without public APIs
- Normalize and ingest raw scraped data into our PostgreSQL schema and BullMQ job pipeline — ensuring clean, deduplicated, schema-consistent records across 10+ platforms
- Build monitoring and alerting so scraper downtime is caught before it becomes customer churn
- Model per-customer API cost at scale — Reddit, TikTok, and YouTube all have billing implications and you'll own the cost model for our data layer
- Evaluate and integrate new data sources as the platform expands — Instagram, news sites, forums, Glassdoor, and more
What you'll learn
- How web scraping works at production scale — real rate limits, anti-bot measures, session management, and proxy strategies
- How to build and own your own scraping toolchain from scratch — not just use existing libraries
- Data infrastructure design: PostgreSQL schema design, Redis caching, job queues (BullMQ), and vector storage (Pinecone)
- How real AI pipelines consume and process unstructured text at scale — you'll work directly adjacent to our sentiment and NLP layer
Required Skills
- Python (proficient)
- Web scraping (hands-on experience)
- SQL / PostgreSQL
- Databases and data modeling
Nice to haves
- Playwright or Selenium
- FastAPI or similar async Python
- Redis / job queues
- Experience with Reddit, YouTube, or TikTok APIs