Project: Applying Machine Learning to Improve Employment-Based Visa Decisions

scottshultz87
6 hours ago
11 min read

Abstract

Employment-based visa certification is a high-stakes administrative decision process in which errors impose asymmetric costs on applicants, employers, and regulators. As application volumes increase, manual review systems face persistent challenges related to scalability, consistency, and decision latency. This project, EasyVisa, investigates the application of supervised machine learning as a decision-support mechanism for employment-based visa certification, grounded in principles of statistical decision theory and cost-sensitive learning.

Using historical visa application data, the task is formulated as a binary classification problem characterized by class imbalance and non-uniform misclassification costs. Multiple tree-based and ensemble models are evaluated within reproducible preprocessing pipelines designed to prevent data leakage and preserve training–serving parity. Model selection emphasizes decision-theoretic criteria by optimizing precision–recall metrics and enforcing recall constraints for positive (certified) outcomes, reflecting the higher regulatory and societal cost associated with false denials. Feature engineering includes wage normalization, derived employer attributes, and explicit feature schema definitions to ensure comparability and stability across cases.

Beyond predictive performance, the study addresses system-level considerations critical to deploying machine learning in regulated decision environments. A human-in-the-loop framework is proposed in which probabilistic model outputs support triage and prioritization rather than automate determinations. A reference production architecture and lifecycle model illustrate how monitoring, drift detection, and periodic retraining can sustain decision quality over time. The results identify key economic and organizational factors associated with certification outcomes and demonstrate how cost-aware machine learning can augment administrative decision processes while preserving transparency, accountability, and human judgment.

Summary

This paper presents a production-oriented machine learning system for employment-based visa decision support, emphasizing system architecture, model governance, and operational economics rather than algorithmic novelty. The EasyVisa framework applies supervised classification with rigorous pipeline discipline, imbalance-aware evaluation, and human-in-the-loop controls to improve the throughput, consistency, and audit-ability of visa case review processes.

The intended audience includes technical leaders, system architects, data scientists, and platform engineers responsible for designing, deploying, and governing machine learning systems in regulated environments. This work was completed as part of the University of Texas Post Graduate Program in Artificial Intelligence and Machine Learning.

Business Context and Problem

Market and Regulatory Environment

The U.S. Immigration and Nationality Act permits employers to hire foreign workers when qualified domestic labor is unavailable, while protecting U.S. workers through wage and labor safeguards. These programs are administered by the Office of Foreign Labor Certification (OFLC).

Application volumes have grown steadily year over year, reaching hundreds of thousands of cases annually. Each case must be evaluated against statutory requirements, labor market conditions, and employer attributes. This creates three structural challenges:

Scalability: Human review does not scale linearly with application volume.
Consistency: Similar cases can receive different outcomes due to reviewer variability.
Latency: Long processing times increase uncertainty for employers and applicants.

Business Objective

EasyVisa was engaged to determine whether historical data could be used to:

Improve the efficiency of visa case review.
Identify key factors that materially influence certification outcomes.
Support analysts and adjudicators with explainable, data-driven recommendations.

The goal is not automated approval or denial, but decision support that prioritizes effort and highlights risk.

Data Foundation

Data Scope

The dataset contains historical employment-based visa applications, including attributes of both employers and applicants. Key feature categories include:

Applicant characteristics (education, experience, training requirements)
Employer characteristics (company size, age)
Job characteristics (region, wage level, full-time status)
Outcome label (certified vs. denied)

Workforce Distribution by Continent

Claim supported: Asia dominates the dataset; geography is structurally imbalanced.

Interpretation: Asia accounts for ~66% of cases, with Europe and North America far smaller. This explains why continent-level signals exist but are secondary and must be treated carefully to avoid proxy bias.

Distribution of applicant workforce by continent, illustrating strong regional dominance and motivating cautious interpretation of geographic features.

Education Distribution

Claim supported: Dataset is heavily skewed toward higher education.

Interpretation: ~78% of applicants hold Bachelor’s or Master’s degrees, reinforcing why education emerges as a strong predictor and why ordinal treatment is appropriate.

Education level distribution of applicants, showing concentration in undergraduate and graduate degrees.

Region of Employment Distribution

Claim supported: Geographic variation exists but is diffuse.

Interpretation: Regional employment is relatively balanced across major U.S. regions, explaining moderate but non-dominant geographic effects.

Regional distribution of employment locations across U.S. regions.

Certification Probability by Education Level

Claim supported: Monotonic gradient in education.

Interpretation: Certification probability increases consistently with education level—from ~34% (High School) to ~87% (Doctorate). This is a textbook example of a stable, interpretable signal.

Certification probability by education level, demonstrating a strong monotonic relationship.

Feature Engineering

Several transformations were applied to make the data analytically useful and policy-aligned:

Wage normalization to annualized values across hourly, weekly, monthly, and yearly units.
Log transformations for skewed monetary variables.
Derived employer age from year of establishment.
Explicit separation of numeric, categorical, and binary features to prevent leakage.

These steps ensure comparability across cases and support robust model training.

Analytical Approach

Modeling Philosophy

EasyVisa follows three core principles:

Imbalance-aware learning: Visa denials are less frequent than certifications. Models were selected and evaluated using precision-recall metrics rather than accuracy alone.
Pipeline integrity: All preprocessing, sampling, and modeling steps are encapsulated in reproducible pipelines to avoid data leakage.
Explainability by design: Models are evaluated not only on performance but on interpretability and stability.

Class Imbalance in Employment-Based Visa Outcomes

Illustrative class imbalance in employment-based visa outcomes, motivating the use of imbalance-aware learning and precision–recall–based evaluation.

What it shows:

A heavily imbalanced outcome distribution (≈85% certified, ≈15% denied).

Why it matters:

Visually justifies why accuracy is misleading and why cost-sensitive, PR-based evaluation is required.

“Binary classification problem characterized by class imbalance and non-uniform misclassification costs.”

Model Candidates

Multiple supervised classification algorithms were evaluated, including:

Decision Trees
Random Forests
Bagging and Boosting ensembles
Gradient Boosting and XGBoost

Hyperparameter tuning was performed using cross-validation with a primary focus on precision–recall AUC, ensuring performance under class imbalance.

Evaluation Framework

Models were compared using a consistent evaluation stack:

Precision, recall, F1 score
ROC-AUC and PR-AUC
Confusion matrices for operational interpretation

Relative suitability of common evaluation metrics under class imbalance, highlighting the importance of PR-AUC in cost-sensitive decision contexts.

What it shows:

Relative importance of Accuracy vs ROC-AUC vs PR-AUC under imbalance.

Why it matters:

Reinforces the decision-theoretic framing and explains metric choice without equations.

“Model selection emphasizes decision-theoretic criteria by optimizing precision–recall metrics and enforcing recall constraints…”

Selection constraints ensured that recall for certified cases did not fall below defined thresholds, aligning with policy sensitivity.

Key Insights

Across models, several drivers consistently influenced certification likelihood:

Prevailing wage level: Higher, market-aligned wages strongly correlate with certification.
Employer maturity and size: Established employers with larger workforces show higher approval rates.
Education and experience alignment: Applicants whose education and experience match role expectations perform better.
Full-time positions: Full-time roles exhibit materially higher certification likelihood.

These insights are as valuable as the predictions themselves, informing employer guidance and policy review.

Governance and Risk Considerations

Human-in-the-Loop Design

EasyVisa is designed as a decision-support system:

Final determinations remain with human reviewers.
Model outputs are probabilistic, not deterministic.
Explanations accompany predictions to support review.

5.2 Bias and Fairness

Explicit safeguards are embedded to reduce unintended bias:

No protected personal attributes are used.
Feature selection is reviewed for proxy risk.
Performance is monitored across segments to detect drift.

Audit-ability

All model runs are reproducible, versioned, and auditable—critical for regulatory environments.

Business Value

Operational Efficiency

Faster triage of high- and low-risk cases.
Reduced manual review burden.
Improved throughput without proportional staffing increases.

Consistency and Transparency

Standardized evaluation across cases.
Clear articulation of decision drivers.
Improved trust among stakeholders.

Strategic Insight

Data-driven guidance for employers on application quality.
Evidence-based policy analysis for regulators.
Foundation for continuous improvement as new data arrives.

Production Architecture and Operating Model

From Concept to Production

The EasyVisa notebook is intentionally structured to mirror production concerns rather than purely exploratory analysis. Several code-level patterns directly support operationalization:

Explicit feature contracts: Categorical, numeric, and binary columns are enumerated to prevent schema drift and silent leakage.
Pipeline encapsulation: Preprocessing, sampling (SMOTE / undersampling), and model training are composed into unified pipelines, ensuring training–serving parity.
Deterministic execution: Centralized random-state control and configuration dictionaries support reproducibility and audit-ability.
Metric-driven selection: Model choice is governed by imbalance-aware metrics (PR-AUC, recall thresholds) rather than headline accuracy.

These elements translate cleanly into a deployable service without refactoring core logic.

Reference Production Architecture

A representative production architecture consists of the following layers:

Ingestion Layer
- Secure intake of employer and applicant data from OFLC systems.
- Schema validation aligned to the feature contract used in training.
Feature Engineering Service
- Wage normalization and derived features (company age, annualized wages).
- Versioned transformations shared between training and inference.
Model Service
- Containerized inference endpoint hosting the selected classifier.
- Probabilistic outputs (certification likelihood) returned with confidence scores.
Explanation Layer
- Local explanation generation (e.g., feature contribution summaries) to support reviewer understanding.
- Logged alongside predictions for audit purposes.
Human Review Interface
- Queue-based triage prioritizing cases by predicted risk.
- Reviewer override and feedback captured as labeled data.
Monitoring and Governance
- Drift detection on feature distributions and outcomes.
- Periodic recertification of models against updated data.

This architecture preserves human authority while compressing decision latency.

Operating Model

Operationally, EasyVisa functions as a decision-support capability, not an automated adjudicator:

Data Science owns model evolution, evaluation, and bias monitoring.
Operations and Policy define thresholds, escalation rules, and acceptable error trade-offs.
Technology maintains shared pipelines, deployment, and observability.

Model retraining follows a controlled cadence (e.g., quarterly), with emergency retraining triggered by drift or policy change.

Quantifying Business Value and ROI

Baseline Assumptions

To illustrate value, consider conservative assumptions aligned to historical OFLC volumes:

Annual application volume: 800,000 cases
Average manual review time: 45 minutes per case
Fully loaded reviewer cost: $75 per hour

Baseline annual review cost:

800,000 × 0.75 hours × $75 ≈ $45M annually

Cycle-Time Reduction

If EasyVisa enables triage such that:

30% of cases are classified as low-risk and fast-tracked
Average review time for those cases drops by 50%

Then annual hours saved:

800,000 × 30% × 0.375 hours ≈ 90,000 hours

Cost savings:

90,000 × $75 ≈ $6.75M annually

Capacity and Throughput Gains

Alternatively, agencies can reinvest saved capacity:

Shorter queues and faster employer response times
More thorough review of high-risk cases
Absorption of volume growth without linear staffing increases

Secondary Value

Additional benefits compound ROI:

Reduced rework from inconsistent decisions
Improved employer guidance, lowering denial rates over time
Better policy insight from quantified decision drivers

Even under conservative assumptions, EasyVisa delivers a strong economic case while improving service quality.

System Mechanics and Control Surfaces

Training Loop (Pseudo-code)

INPUT: Historical labeled cases D
CONFIG: Feature schema, sampling policy, metric constraints

1. Validate schema(D) against feature contract
2. Split D → train / validation / test (stratified, locked test)
3. Build preprocessing pipeline:
   - Imputation
   - Encoding (categorical/binary)
   - Scaling (numeric)
4. Apply imbalance handling inside pipeline (SMOTE / undersample)
5. For each candidate model M:
     a. Perform cross-validation
     b. Optimize hyperparameters on PR-AUC
     c. Enforce recall constraints
6. Select best model M*
7. Calibrate probabilities (optional)
8. Persist artifacts:
     - Model weights
     - Preprocessor
     - Metrics
     - Feature metadata
OUTPUT: Versioned, deployable model bundle

Inference Path (Sequence)

INPUT: New visa application A

1. Validate A schema
2. Apply feature transformations (shared with training)
3. Score using deployed model
4. Produce:
     - P(certified)
     - Risk band (low / medium / high)
5. Generate local explanation
6. Route case to review queue based on threshold policy
OUTPUT: Scored case + explanation

Model Lifecycle and MLOps Integration

Lifecycle Flow

Stage	Key Objective	Why it matters
Data Ingestion	Collecting raw data from various sources (SQL, APIs, S3).	Ensures a steady, automated stream of information.
Feature Engineering	Transforming raw data into signals (e.g., scaling, encoding).	Versioned features ensure your training data matches your live production data exactly.
Model Training	Running algorithms to find patterns.	This is where the actual "learning" happens.
Validation & Checks	Testing against "hold-out" data and checking constraints.	Prevents "broken" models (like those with 0% accuracy or bias) from reaching users.
Deployment	Moving the model into a production environment via CI/CD.	Automates the transition from "it works on my machine" to "it works for the world."
Monitoring	Watching for Drift (data changing over time) and performance drops.	Notifies you the moment the model starts "decaying" or losing its edge.
Retraining Trigger	Closing the loop by sending the system back to the start.	Keeps the model fresh without manual intervention.

Alignment with Common MLOps Stacks

Kubeflow
- Pipelines: training, evaluation, retraining
- KFServing: model inference endpoints
AWS SageMaker
- Processing jobs for feature engineering
- Training jobs with built-in tuning
- Model Registry for version control
Azure ML
- Pipelines for end-to-end orchestration
- Managed online endpoints for inference
- Data drift and model monitoring services

The EasyVisa design maps cleanly to all three due to explicit pipelines and artifact boundaries.

Failure Modes and Mitigations

Data Drift

Risk: Wage distributions, employer mix, or regions shift over time.
Mitigation: Monitor feature statistics and PSI; trigger retraining on thresholds.

Label Delay

Risk: Certification outcomes lag application intake.
Mitigation: Use delayed supervision; separate training windows from inference windows.

Policy Change

Risk: Regulatory updates invalidate learned patterns.
Mitigation: Treat policy change as a hard retraining event; version policies alongside models.

Class Collapse

Risk: Extreme imbalance skews learning or evaluation.
Mitigation: Enforce minimum recall constraints and monitor class ratios continuously.

Thresholding and Operating Strategy

Static Probability Cutoffs

Simple to implement
Brittle under drift or volume changes

Operating Point Thresholds

Preferred approach:

Select thresholds on validation data to meet operational goals
Example:
- High recall for certified cases
- Controlled precision loss acceptable for triage

Thresholds are policy-controlled, not hard-coded, allowing adjustment without retraining.

Technical Appendix

Metrics

PR-AUC (primary): robust under imbalance
Recall constraint: protects against false denials
Secondary: F1, ROC-AUC for diagnostics

Sampling Rationale

SMOTE applied only within training folds
Prevents leakage and preserves test integrity
Undersampling used selectively for stress testing

Threshold Math

Let:

p = model probability output
τ = operating threshold

Decision rule:

If p ≥ τ → fast-track / low-risk
Else → standard or enhanced review

τ is chosen to satisfy:

Recall(p ≥ τ) ≥ R_min

Where R_min is policy-defined.

EasyVisa Winner Selection - Deterministic Outcome

Selected Model

Regime: smote
Model: Bagging (Decision Tree)
Validation PR AUC (tuned): 0.8759
BestCV PR AUC: 0.8723
Recall (validation): 0.8719
F1 (validation): 0.8228
ROC AUC (validation): 0.7819

Why This Model Wins

Highest validation pr_auc among tuned candidates (policy primary metric).
Recall comfortably exceeds the 0.60 constraint.
Cross-validation variance remains extremely low.
Deterministic selection rule based on sorted pr_auc_val_TUNED.

Delta Review

Δ pr_auc_val: +0.0000 (effectively unchanged from baseline)
Minor declines in f1, recall, precision, and accuracy.
Indicates the baseline Bagging configuration was already near optimal.

Competitive Context

Random Forest (SMOTE) and Gradient Boosting (SMOTE) showed stronger tuning lift.
However, neither exceeded Bagging's final validation pr_auc.
Performance spread is minimal (<0.002), indicating model space convergence.

Interpretation

The margin of victory is extremely small.
All three top candidates are statistically close.
Selection is justified under strict primary-metric determinism.

Next step: lock this model configuration and proceed to one-shot test evaluation under TEST_LOCK discipline.

Conclusion

This project demonstrates that supervised machine learning can be effectively applied to high-stakes, regulated decision processes when grounded in decision-theoretic principles, cost-sensitive evaluation, and strong governance controls. Rather than focusing solely on predictive performance, EasyVisa emphasizes the integration of production-grade modeling practices, reproducible pipelines, and human-in-the-loop oversight to address real-world constraints such as class imbalance, asymmetric error costs, and policy sensitivity.

The results indicate that probabilistic classification models, when evaluated using imbalance-aware metrics and constrained by domain-specific recall requirements, can provide meaningful decision support without supplanting human judgment. By incorporating explainability, auditability, and lifecycle management into the system design, the approach aligns technical rigor with institutional accountability. Collectively, these elements demonstrate how machine learning systems can scale complex administrative decisions while maintaining transparency, consistency, and regulatory compliance.

Generalized Approach for Application in Other Business Contexts

The methodological framework developed in EasyVisa is broadly applicable to other organizational and business contexts characterized by high decision volume, asymmetric risk, and governance requirements. Examples include credit underwriting, insurance claims adjudication, fraud detection, compliance review, healthcare triage, and procurement risk assessment. The generalized approach can be summarized as follows:

Problem Formulation as a Cost-Sensitive Decision Task
Define the decision problem in terms of expected risk rather than raw accuracy, explicitly identifying asymmetric costs associated with different types of errors. Translate these costs into metric selection, constraints, or thresholding strategies.
Data and Feature Governance
Establish explicit feature contracts, schema validation, and versioned transformations to prevent data leakage and ensure training–serving parity. Emphasize domain-aligned feature engineering that reflects institutional rules and economic incentives.
Imbalance-Aware Modeling and Evaluation
Select modeling techniques and evaluation metrics appropriate for skewed outcome distributions, such as precision–recall–based measures and constraint-driven selection criteria. Avoid overreliance on aggregate metrics that obscure minority-class performance.
Human-in-the-Loop System Design
Position machine learning outputs as probabilistic inputs into human workflows rather than automated decisions. Design interfaces and thresholds that support prioritization, escalation, and reviewer override.
Operationalization and Lifecycle Management
Integrate models into production architectures with monitoring for data drift, performance degradation, and policy change. Treat retraining and recertification as governance events, not purely technical updates.
Economic and Organizational Alignment
Quantify value in terms of capacity, cycle-time reduction, and decision consistency rather than solely financial savings. Align technical decisions with organizational objectives and regulatory obligations.

By abstracting machine learning systems as governed decision-support capabilities rather than prediction engines, this approach provides a reusable blueprint for responsibly deploying AI in complex business environments.

#MachineLearning #DecisionTheory #MLOps #ResponsibleAI #AppliedAI