Incremental Web Crawler
Jan 1, 2025
·
1 min read

Project Overview
During my time at JLL, I developed a robust data pipeline to monitor commercial asset data in real-time.
Key Achievements
- High-Volume Data Collection: Developed Python-based web crawlers to collect over 20,000+ data points from various real estate sources.
- Real-time Monitoring: Implemented Redis for incremental weekly updates and real-time monitoring, ensuring data freshness.
- Data Quality Control: Built an automated cleaning program that identified limitations in token-based classification and proposed a geo-coordinate cross-verification improvement.
- Impact: Successfully corrected over 700+ inaccuracies in the company database of 13,000+ entries.
Technologies Used
- Python (Scrapy, Selenium, Pandas)
- Redis
- SQL