[1] Data Collection Using Python Selenium and Beautiful Soup libraries to build web scraping tools, our team collected from the web over 600k unique resumes and 1.1 million salary records representing over 5500 cities across 30 years. Link to Code [2] Data Transformation Data ingestion pipelines were created to standardize the data inputs from the various sources. Jobs Title Job titles vary widely from resume to resume, so we created a heuristic to process each job title into a standard format.
Read More