The Models



[1] Document Similarity Model

We provide career recommendations in a ranked order based on the similarity of a job seeker’s current job experiences. Our model is powered by resume information across the US. This product is a proof of concept and contains a small subset of possible jobs (114 unique job titles). The list of jobs is derived from the availability of salary and resume data. We selected all jobs that contained at least 100 salary records in the past 5 years and 500 resume job summaries in the past 10 years.

Link to Code


Model Parameters

Number of Classes:
    114

Train-Test Split:
    222,978 / 11,736 (95% / 5%)

TD-IDF Vectorizer:
    Min_df: 5
    Ngram_range: [1,3]

Undersampling:
    More common jobs were undersampled to 2500 records based on job recency.

Oversampling:
    Less common jobs were oversampled to 2500 records using the regular Synthetic Minority Over-sampling Technique (SMOTE).

Multinomial Naive Bayes Classifier:
    Alpha: 0.02


Model Performance


Performance

Metric Result
Accuracy 48.86%
F1 Score 46.14%
Precision Score 47.60%
Recall Score 47.55%


Lowest/Highest Precision

Class Precision Support
consultant 9.09% 131
business consultant 11.76% 48
director 16.67% 119
manager 21.21% 127
it specialist 22.22% 64
build and release engineer 82.22% 131
database administrator 83.47% 123
caregiver 86.67% 66
android engineer 88.89% 61
ios engineer 89.58% 52


Lowest/Highest Recall

Class Recall Support
consultant 1.53% 131
business consultant 4.17% 48
manager 5.51% 127
engineer 7.63% 118
director 10.92% 119
build and release engineer 84.73% 131
salesforce engineer 85.09% 114
android engineer 91.80% 61
net engineer 91.82% 110
technical recruiter 94.95% 99


Lowest/Highest F1-Score

Class F1-Score Support
consultant 2.61% 131
business consultant 6.15% 48
manager 8.75% 127
engineer 12.77% 118
director 13.20% 119
database administrator 82.79% 123
build and release engineer 83.46% 131
technical recruiter 85.45% 99
ios engineer 86.00% 52
android engineer 90.32% 61


Most Confused

Actual Class Predicted Class Cases
accountant staff accountant 38
marketing director marketing manager 35
java software engineer j2ee engineer 32
associate cashier 31
network administrator network engineer 30
manager general manager 28
assistant administrative assistant 26
design engineer mechanical design engineer 26
process engineer manufacturing engineer 25
project manager it project manager 25


PCA Analysis

We performed a PCA analysis to test the clustering results from the TD-IDF vectorization of job summaries. The vectors were reduced to a 3-dimensional space and put into the Embedding Projector website. The below results show that different job summaries for the same job share a lot of the same skills and language.


  1. Below are the 500 closest job summaries to a randomly selected Android Engineer job summary. Many of these job summaries have the job title Android Engineer indicating that Android Engineers view and describe their roles similarly.

   

  1. Below are the 500 closest job summaries to a randomly selected Consultant job summary. Many different job titles are represented in this group. It appears the role of a Consultant is much more variable than that of an Android Engineer.

   


  1. Just for fun, we selected a random Data Scientist job summary. The Data Scientist role appears to be at the intersection of data analysis, engineering, product management, and research. ;)

   

To play with the data yourself, you can input our vector file and metadata file to this website: projector.tensorflow.org


Example Output




[2] Job Skills Model

To produce useful skills information, we created a model like the Document Similarity Model with a few differences.

Link to Code


Model Parameter Differences

Train-Test Split:
    325,787 / 17,147

TD-IDF Vectorizer:
    Ngram_range: [3,3]

Undersampling:
    Common job undersampling to 5000 records

Oversampling:
    SMOTE oversampling to 5000 records


Model Performance

Metric Result
Accuracy 39.30%
F1 Score 35.57%
Precision Score 36.70%
Recall Score 36.12%


Example Outputs




[3] Time-Based Model (Not Productionized)

The resumes we collected contained a large number of software engineering roles, so we thought it would be interesting to create a classification model to derive the most important software engineering skills throughout time. Below are the highest weighted skills for each time period. This model is not included in the app.

Link to Code


Model Parameters

Number of Classes:
    5

Train-Test Split:
    62,440 / 3287 (95% / 5%)

TD-IDF Vectorizer:
    Min_df: 0
    Ngram_range: [2,4]

Oversampling:
    Classes were balanced using the regular Synthetic Minority Over-sampling Technique (SMOTE).

Multinomial Naive Bayes Classifier:
    Alpha: 0.1


Model Performance

Class Precision Recall F1-Score Support
Earlier-1998 67.66% 57.38% 62.10% 474
1998-2003 46.68% 42.27% 44.37% 466
2003-2008 45.39% 40.85% 43.00% 639
2008-2013 43.86% 56.57% 49.41% 776
2013-2018 70.46% 67.06% 68.72% 932


Results

Technology is one of the fastest changing industries. From this model, we can see the rise to prominence and the fall of different software engineering tools throughout time.

Earlier-1998 1998-2003 2003-2008 2008-2013 2013-2018
development team database design unit testing management system version control
test plan rational rose crystal report html cs javascript unit testing
window nt client server java j2ee test plan web application using
software design vb net net asp net data access visual studio
customer support crystal report web base front end full stack
tcp ip using visual basic software development sql query develop maintain
programmer analyst store procedure oracle 9i design implement store procedure
cobol ii software application html cs java j2ee technology use
ibm mainframe tcp ip develop test visual studio new feature
client server pl sql store procedure new feature code review
system use technology use role responsibility using asp net agile scrum
assembly language user interface window xp unit testing rest api
user interface data warehouse team member application use design implement
device driver development team design implementation user interface using asp net
develop implement html javascript business logic web base entity framework
using visual basic responsibility include visual basic software engineer management system
vax vms operating system management system business logic development team
pl sql technical support user interface code review software engineer
responsibility include web service net sql server responsibility involve sql server




[4] Next Steps

  1. Currently, these models consider similar ngrams like accounts payable balance and organize accounts payable to be completely different phrases. We want to explore different vectorization models, like Doc2Vec and OpenAI Transformer, to place similar ngrams closer to each other in the vector space.

  2. We will expand the number of classes used in the model.

  3. We are interested in incorporating a time component to the app. Can we model changes in job skills over time?

Back to How It Works

TEST OUR PRODUCT

Access iCareer Climber