top of page

Project: Applying Machine Learning to Improve Employment-Based Visa Decisions

  • scottshultz87
  • 6 hours ago
  • 11 min read

Abstract


Employment-based visa certification is a high-stakes administrative decision process in which errors impose asymmetric costs on applicants, employers, and regulators. As application volumes increase, manual review systems face persistent challenges related to scalability, consistency, and decision latency. This project, EasyVisa, investigates the application of supervised machine learning as a decision-support mechanism for employment-based visa certification, grounded in principles of statistical decision theory and cost-sensitive learning.


Using historical visa application data, the task is formulated as a binary classification problem characterized by class imbalance and non-uniform misclassification costs. Multiple tree-based and ensemble models are evaluated within reproducible preprocessing pipelines designed to prevent data leakage and preserve training–serving parity. Model selection emphasizes decision-theoretic criteria by optimizing precision–recall metrics and enforcing recall constraints for positive (certified) outcomes, reflecting the higher regulatory and societal cost associated with false denials. Feature engineering includes wage normalization, derived employer attributes, and explicit feature schema definitions to ensure comparability and stability across cases.


Beyond predictive performance, the study addresses system-level considerations critical to deploying machine learning in regulated decision environments. A human-in-the-loop framework is proposed in which probabilistic model outputs support triage and prioritization rather than automate determinations. A reference production architecture and lifecycle model illustrate how monitoring, drift detection, and periodic retraining can sustain decision quality over time. The results identify key economic and organizational factors associated with certification outcomes and demonstrate how cost-aware machine learning can augment administrative decision processes while preserving transparency, accountability, and human judgment.


Summary


This paper presents a production-oriented machine learning system for employment-based visa decision support, emphasizing system architecture, model governance, and operational economics rather than algorithmic novelty. The EasyVisa framework applies supervised classification with rigorous pipeline discipline, imbalance-aware evaluation, and human-in-the-loop controls to improve the throughput, consistency, and audit-ability of visa case review processes.


The intended audience includes technical leaders, system architects, data scientists, and platform engineers responsible for designing, deploying, and governing machine learning systems in regulated environments. This work was completed as part of the University of Texas Post Graduate Program in Artificial Intelligence and Machine Learning.


Business Context and Problem


Market and Regulatory Environment


The U.S. Immigration and Nationality Act permits employers to hire foreign workers when qualified domestic labor is unavailable, while protecting U.S. workers through wage and labor safeguards. These programs are administered by the Office of Foreign Labor Certification (OFLC).


Application volumes have grown steadily year over year, reaching hundreds of thousands of cases annually. Each case must be evaluated against statutory requirements, labor market conditions, and employer attributes. This creates three structural challenges:

  • Scalability: Human review does not scale linearly with application volume.

  • Consistency: Similar cases can receive different outcomes due to reviewer variability.

  • Latency: Long processing times increase uncertainty for employers and applicants.


Business Objective


EasyVisa was engaged to determine whether historical data could be used to:

  • Improve the efficiency of visa case review.

  • Identify key factors that materially influence certification outcomes.

  • Support analysts and adjudicators with explainable, data-driven recommendations.


The goal is not automated approval or denial, but decision support that prioritizes effort and highlights risk.


Data Foundation


Data Scope


The dataset contains historical employment-based visa applications, including attributes of both employers and applicants. Key feature categories include:

  • Applicant characteristics (education, experience, training requirements)

  • Employer characteristics (company size, age)

  • Job characteristics (region, wage level, full-time status)

  • Outcome label (certified vs. denied)


Workforce Distribution by Continent


Claim supported: Asia dominates the dataset; geography is structurally imbalanced.

Interpretation: Asia accounts for ~66% of cases, with Europe and North America far smaller. This explains why continent-level signals exist but are secondary and must be treated carefully to avoid proxy bias.


Distribution of applicant workforce by continent, illustrating strong regional dominance and motivating cautious interpretation of geographic features.


Education Distribution


Claim supported: Dataset is heavily skewed toward higher education.

Interpretation: ~78% of applicants hold Bachelor’s or Master’s degrees, reinforcing why education emerges as a strong predictor and why ordinal treatment is appropriate.


Education level distribution of applicants, showing concentration in undergraduate and graduate degrees.


Region of Employment Distribution


Claim supported: Geographic variation exists but is diffuse.

Interpretation: Regional employment is relatively balanced across major U.S. regions, explaining moderate but non-dominant geographic effects.


Regional distribution of employment locations across U.S. regions.



Certification Probability by Education Level


Claim supported: Monotonic gradient in education.

Interpretation: Certification probability increases consistently with education level—from ~34% (High School) to ~87% (Doctorate). This is a textbook example of a stable, interpretable signal.



Certification probability by education level, demonstrating a strong monotonic relationship.


Feature Engineering


Several transformations were applied to make the data analytically useful and policy-aligned:

  • Wage normalization to annualized values across hourly, weekly, monthly, and yearly units.

  • Log transformations for skewed monetary variables.

  • Derived employer age from year of establishment.

  • Explicit separation of numeric, categorical, and binary features to prevent leakage.


These steps ensure comparability across cases and support robust model training.


Analytical Approach


Modeling Philosophy


EasyVisa follows three core principles:

  1. Imbalance-aware learning: Visa denials are less frequent than certifications. Models were selected and evaluated using precision-recall metrics rather than accuracy alone.

  2. Pipeline integrity: All preprocessing, sampling, and modeling steps are encapsulated in reproducible pipelines to avoid data leakage.

  3. Explainability by design: Models are evaluated not only on performance but on interpretability and stability.


Class Imbalance in Employment-Based Visa Outcomes


Illustrative class imbalance in employment-based visa outcomes, motivating the use of imbalance-aware learning and precision–recall–based evaluation.


What it shows:

A heavily imbalanced outcome distribution (≈85% certified, ≈15% denied).


Why it matters:

Visually justifies why accuracy is misleading and why cost-sensitive, PR-based evaluation is required.

“Binary classification problem characterized by class imbalance and non-uniform misclassification costs.”

Model Candidates


Multiple supervised classification algorithms were evaluated, including:

  • Decision Trees

  • Random Forests

  • Bagging and Boosting ensembles

  • Gradient Boosting and XGBoost



Hyperparameter tuning was performed using cross-validation with a primary focus on precision–recall AUC, ensuring performance under class imbalance.


Evaluation Framework


Models were compared using a consistent evaluation stack:

  • Precision, recall, F1 score

  • ROC-AUC and PR-AUC

  • Confusion matrices for operational interpretation


Relative suitability of common evaluation metrics under class imbalance, highlighting the importance of PR-AUC in cost-sensitive decision contexts.


What it shows:

Relative importance of Accuracy vs ROC-AUC vs PR-AUC under imbalance.


Why it matters:

Reinforces the decision-theoretic framing and explains metric choice without equations.

“Model selection emphasizes decision-theoretic criteria by optimizing precision–recall metrics and enforcing recall constraints…”

Selection constraints ensured that recall for certified cases did not fall below defined thresholds, aligning with policy sensitivity.



Key Insights


Across models, several drivers consistently influenced certification likelihood:

  • Prevailing wage level: Higher, market-aligned wages strongly correlate with certification.

  • Employer maturity and size: Established employers with larger workforces show higher approval rates.

  • Education and experience alignment: Applicants whose education and experience match role expectations perform better.

  • Full-time positions: Full-time roles exhibit materially higher certification likelihood.


These insights are as valuable as the predictions themselves, informing employer guidance and policy review.


Governance and Risk Considerations


Human-in-the-Loop Design


EasyVisa is designed as a decision-support system:

  • Final determinations remain with human reviewers.

  • Model outputs are probabilistic, not deterministic.

  • Explanations accompany predictions to support review.


5.2 Bias and Fairness


Explicit safeguards are embedded to reduce unintended bias:

  • No protected personal attributes are used.

  • Feature selection is reviewed for proxy risk.

  • Performance is monitored across segments to detect drift.


Audit-ability


All model runs are reproducible, versioned, and auditable—critical for regulatory environments.


Business Value


Operational Efficiency


  • Faster triage of high- and low-risk cases.

  • Reduced manual review burden.

  • Improved throughput without proportional staffing increases.


Consistency and Transparency


  • Standardized evaluation across cases.

  • Clear articulation of decision drivers.

  • Improved trust among stakeholders.


Strategic Insight


  • Data-driven guidance for employers on application quality.

  • Evidence-based policy analysis for regulators.

  • Foundation for continuous improvement as new data arrives.


Production Architecture and Operating Model


From Concept to Production


The EasyVisa notebook is intentionally structured to mirror production concerns rather than purely exploratory analysis. Several code-level patterns directly support operationalization:

  • Explicit feature contracts: Categorical, numeric, and binary columns are enumerated to prevent schema drift and silent leakage.

  • Pipeline encapsulation: Preprocessing, sampling (SMOTE / undersampling), and model training are composed into unified pipelines, ensuring training–serving parity.

  • Deterministic execution: Centralized random-state control and configuration dictionaries support reproducibility and audit-ability.

  • Metric-driven selection: Model choice is governed by imbalance-aware metrics (PR-AUC, recall thresholds) rather than headline accuracy.


These elements translate cleanly into a deployable service without refactoring core logic.


Reference Production Architecture


A representative production architecture consists of the following layers:


  1. Ingestion Layer

    • Secure intake of employer and applicant data from OFLC systems.

    • Schema validation aligned to the feature contract used in training.

  2. Feature Engineering Service

    • Wage normalization and derived features (company age, annualized wages).

    • Versioned transformations shared between training and inference.

  3. Model Service

    • Containerized inference endpoint hosting the selected classifier.

    • Probabilistic outputs (certification likelihood) returned with confidence scores.

  4. Explanation Layer

    • Local explanation generation (e.g., feature contribution summaries) to support reviewer understanding.

    • Logged alongside predictions for audit purposes.

  5. Human Review Interface

    • Queue-based triage prioritizing cases by predicted risk.

    • Reviewer override and feedback captured as labeled data.

  6. Monitoring and Governance

    • Drift detection on feature distributions and outcomes.

    • Periodic recertification of models against updated data.


This architecture preserves human authority while compressing decision latency.


Operating Model


Operationally, EasyVisa functions as a decision-support capability, not an automated adjudicator:

  • Data Science owns model evolution, evaluation, and bias monitoring.

  • Operations and Policy define thresholds, escalation rules, and acceptable error trade-offs.

  • Technology maintains shared pipelines, deployment, and observability.


Model retraining follows a controlled cadence (e.g., quarterly), with emergency retraining triggered by drift or policy change.


Quantifying Business Value and ROI


Baseline Assumptions


To illustrate value, consider conservative assumptions aligned to historical OFLC volumes:

  • Annual application volume: 800,000 cases

  • Average manual review time: 45 minutes per case

  • Fully loaded reviewer cost: $75 per hour


Baseline annual review cost:

  • 800,000 × 0.75 hours × $75 ≈ $45M annually



Cycle-Time Reduction


If EasyVisa enables triage such that:

  • 30% of cases are classified as low-risk and fast-tracked

  • Average review time for those cases drops by 50%


Then annual hours saved:

  • 800,000 × 30% × 0.375 hours ≈ 90,000 hours


Cost savings:

  • 90,000 × $75 ≈ $6.75M annually


Capacity and Throughput Gains


Alternatively, agencies can reinvest saved capacity:


  • Shorter queues and faster employer response times

  • More thorough review of high-risk cases

  • Absorption of volume growth without linear staffing increases


Secondary Value


Additional benefits compound ROI:

  • Reduced rework from inconsistent decisions

  • Improved employer guidance, lowering denial rates over time

  • Better policy insight from quantified decision drivers


Even under conservative assumptions, EasyVisa delivers a strong economic case while improving service quality.


System Mechanics and Control Surfaces


Training Loop (Pseudo-code)

INPUT: Historical labeled cases D
CONFIG: Feature schema, sampling policy, metric constraints

1. Validate schema(D) against feature contract
2. Split D → train / validation / test (stratified, locked test)
3. Build preprocessing pipeline:
   - Imputation
   - Encoding (categorical/binary)
   - Scaling (numeric)
4. Apply imbalance handling inside pipeline (SMOTE / undersample)
5. For each candidate model M:
     a. Perform cross-validation
     b. Optimize hyperparameters on PR-AUC
     c. Enforce recall constraints
6. Select best model M*
7. Calibrate probabilities (optional)
8. Persist artifacts:
     - Model weights
     - Preprocessor
     - Metrics
     - Feature metadata
OUTPUT: Versioned, deployable model bundle

Inference Path (Sequence)

INPUT: New visa application A

1. Validate A schema
2. Apply feature transformations (shared with training)
3. Score using deployed model
4. Produce:
     - P(certified)
     - Risk band (low / medium / high)
5. Generate local explanation
6. Route case to review queue based on threshold policy
OUTPUT: Scored case + explanation

Model Lifecycle and MLOps Integration


Lifecycle Flow


Stage

Key Objective

Why it matters

Data Ingestion

Collecting raw data from various sources (SQL, APIs, S3).

Ensures a steady, automated stream of information.

Feature Engineering

Transforming raw data into signals (e.g., scaling, encoding).

Versioned features ensure your training data matches your live production data exactly.

Model Training

Running algorithms to find patterns.

This is where the actual "learning" happens.

Validation & Checks

Testing against "hold-out" data and checking constraints.

Prevents "broken" models (like those with 0% accuracy or bias) from reaching users.

Deployment

Moving the model into a production environment via CI/CD.

Automates the transition from "it works on my machine" to "it works for the world."

Monitoring

Watching for Drift (data changing over time) and performance drops.

Notifies you the moment the model starts "decaying" or losing its edge.

Retraining Trigger

Closing the loop by sending the system back to the start.

Keeps the model fresh without manual intervention.

Alignment with Common MLOps Stacks


  • Kubeflow

    • Pipelines: training, evaluation, retraining

    • KFServing: model inference endpoints

  • AWS SageMaker

    • Processing jobs for feature engineering

    • Training jobs with built-in tuning

    • Model Registry for version control

  • Azure ML

    • Pipelines for end-to-end orchestration

    • Managed online endpoints for inference

    • Data drift and model monitoring services


The EasyVisa design maps cleanly to all three due to explicit pipelines and artifact boundaries.


Failure Modes and Mitigations


Data Drift


  • Risk: Wage distributions, employer mix, or regions shift over time.

  • Mitigation: Monitor feature statistics and PSI; trigger retraining on thresholds.


Label Delay


  • Risk: Certification outcomes lag application intake.

  • Mitigation: Use delayed supervision; separate training windows from inference windows.


Policy Change


  • Risk: Regulatory updates invalidate learned patterns.

  • Mitigation: Treat policy change as a hard retraining event; version policies alongside models.


Class Collapse


  • Risk: Extreme imbalance skews learning or evaluation.

  • Mitigation: Enforce minimum recall constraints and monitor class ratios continuously.


Thresholding and Operating Strategy


Static Probability Cutoffs


  • Simple to implement

  • Brittle under drift or volume changes


Operating Point Thresholds


Preferred approach:

  • Select thresholds on validation data to meet operational goals

  • Example:

    • High recall for certified cases

    • Controlled precision loss acceptable for triage


Thresholds are policy-controlled, not hard-coded, allowing adjustment without retraining.


Technical Appendix


Metrics


  • PR-AUC (primary): robust under imbalance

  • Recall constraint: protects against false denials

  • Secondary: F1, ROC-AUC for diagnostics


Sampling Rationale


  • SMOTE applied only within training folds

  • Prevents leakage and preserves test integrity

  • Undersampling used selectively for stress testing


Threshold Math


Let:

  • p = model probability output

  • τ = operating threshold


Decision rule:

If p ≥ τ → fast-track / low-risk
Else → standard or enhanced review

τ is chosen to satisfy:

Recall(p ≥ τ) ≥ R_min

Where R_min is policy-defined.


EasyVisa Winner Selection - Deterministic Outcome


Selected Model

  • Regime: smote

  • Model: Bagging (Decision Tree)

  • Validation PR AUC (tuned): 0.8759

  • BestCV PR AUC: 0.8723

  • Recall (validation): 0.8719

  • F1 (validation): 0.8228

  • ROC AUC (validation): 0.7819


Why This Model Wins

  • Highest validation pr_auc among tuned candidates (policy primary metric).

  • Recall comfortably exceeds the 0.60 constraint.

  • Cross-validation variance remains extremely low.

  • Deterministic selection rule based on sorted pr_auc_val_TUNED.


Delta Review

  • Δ pr_auc_val: +0.0000 (effectively unchanged from baseline)

  • Minor declines in f1, recall, precision, and accuracy.

  • Indicates the baseline Bagging configuration was already near optimal.


Competitive Context

  • Random Forest (SMOTE) and Gradient Boosting (SMOTE) showed stronger tuning lift.

  • However, neither exceeded Bagging's final validation pr_auc.

  • Performance spread is minimal (<0.002), indicating model space convergence.


Interpretation

  • The margin of victory is extremely small.

  • All three top candidates are statistically close.

  • Selection is justified under strict primary-metric determinism.


Next step: lock this model configuration and proceed to one-shot test evaluation under TEST_LOCK discipline.


Conclusion


This project demonstrates that supervised machine learning can be effectively applied to high-stakes, regulated decision processes when grounded in decision-theoretic principles, cost-sensitive evaluation, and strong governance controls. Rather than focusing solely on predictive performance, EasyVisa emphasizes the integration of production-grade modeling practices, reproducible pipelines, and human-in-the-loop oversight to address real-world constraints such as class imbalance, asymmetric error costs, and policy sensitivity.


The results indicate that probabilistic classification models, when evaluated using imbalance-aware metrics and constrained by domain-specific recall requirements, can provide meaningful decision support without supplanting human judgment. By incorporating explainability, auditability, and lifecycle management into the system design, the approach aligns technical rigor with institutional accountability. Collectively, these elements demonstrate how machine learning systems can scale complex administrative decisions while maintaining transparency, consistency, and regulatory compliance.


Generalized Approach for Application in Other Business Contexts


The methodological framework developed in EasyVisa is broadly applicable to other organizational and business contexts characterized by high decision volume, asymmetric risk, and governance requirements. Examples include credit underwriting, insurance claims adjudication, fraud detection, compliance review, healthcare triage, and procurement risk assessment. The generalized approach can be summarized as follows:


  1. Problem Formulation as a Cost-Sensitive Decision Task

    Define the decision problem in terms of expected risk rather than raw accuracy, explicitly identifying asymmetric costs associated with different types of errors. Translate these costs into metric selection, constraints, or thresholding strategies.

  2. Data and Feature Governance

    Establish explicit feature contracts, schema validation, and versioned transformations to prevent data leakage and ensure training–serving parity. Emphasize domain-aligned feature engineering that reflects institutional rules and economic incentives.

  3. Imbalance-Aware Modeling and Evaluation

    Select modeling techniques and evaluation metrics appropriate for skewed outcome distributions, such as precision–recall–based measures and constraint-driven selection criteria. Avoid overreliance on aggregate metrics that obscure minority-class performance.

  4. Human-in-the-Loop System Design

    Position machine learning outputs as probabilistic inputs into human workflows rather than automated decisions. Design interfaces and thresholds that support prioritization, escalation, and reviewer override.

  5. Operationalization and Lifecycle Management

    Integrate models into production architectures with monitoring for data drift, performance degradation, and policy change. Treat retraining and recertification as governance events, not purely technical updates.

  6. Economic and Organizational Alignment

    Quantify value in terms of capacity, cycle-time reduction, and decision consistency rather than solely financial savings. Align technical decisions with organizational objectives and regulatory obligations.


By abstracting machine learning systems as governed decision-support capabilities rather than prediction engines, this approach provides a reusable blueprint for responsibly deploying AI in complex business environments.


Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page