v4 Breakthrough (Target 0.80+)

Kaggle Notebook

View code on Kaggle

Current Status

v3 LB: 0.78947
Goal: break the 0.80 threshold
Gap: ~0.01 (about 4–5 samples in test)

Candidate Optimization Directions

Direction 1: Fare=0 Outlier Handling (Expected +0.005)

# Some passengers in test have Fare=0, likely crew or free-ticket family members
# FareLog handling breaks (log1p(0)=0, but actual meaning is different)
full['FareIsZero'] = (full['Fare'] == 0).astype(int)
# Re-fill Fare=0 passengers with Pclass median

Direction 2: Model Swap — GradientBoosting (Expected +0.005~0.01)

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(
    n_estimators=200,
    max_depth=3,
    learning_rate=0.1,
    random_state=42
)

GBDT is sometimes more stable than RF on small datasets because sequential boosting is less sensitive to outliers.

Direction 3: Amplify Title=Master Strong Signal (Expected +0.003~0.005)

# Master (boys) have extremely high survival in train, but RF may not have learned it fully
# Try separating Master from Mr as an independent strong feature
full['IsMaster'] = (full['Title'] == 'Master').astype(int)

Direction 4: Pclass * HasCabin Interaction (Expected +0.003~0.005)

# First Class having Cabin is normal; Third Class having Cabin is very unusual
full['Pclass_HasCabin'] = full['Pclass'].astype(str) + '_' + full['HasCabin'].astype(str)

Direction 5: Multi-Model Hard Voting (Expected +0.005~0.01)

from sklearn.ensemble import VotingClassifier

voting = VotingClassifier(
    estimators=[
        ('rf', RandomForestClassifier(...)),
        ('gb', GradientBoostingClassifier(...)),
        ('xgb', XGBClassifier(...))
    ],
    voting='hard',  # Hard Voting is usually more stable than Soft on small data
    n_jobs=-1
)

Recommended Execution Order

Test Direction 2 alone first (swap to GBDT): if LB jumps to 0.80+, RF has hit a bottleneck on this feature set
Then try Directions 1+3+4 combined: feature fine-tuning, expected additional 0.005~0.01
Finally try Direction 5 (Hard Voting): if single models reach 0.80, ensemble may push to 0.81~0.82

Validation Strategy

Change one variable at a time, submit quickly to verify: - If improved → keep, continue with next optimization - If degraded → revert, record why it was ineffective

Goal: reach 0.80+ from 0.78947 in 2–3 iterations.