v4 Breakthrough (Target 0.80+)
Kaggle Notebook
Current Status
- v3 LB: 0.78947
- Goal: break the 0.80 threshold
- Gap: ~0.01 (about 4–5 samples in test)
Candidate Optimization Directions
Direction 1: Fare=0 Outlier Handling (Expected +0.005)
# Some passengers in test have Fare=0, likely crew or free-ticket family members
# FareLog handling breaks (log1p(0)=0, but actual meaning is different)
full['FareIsZero'] = (full['Fare'] == 0).astype(int)
# Re-fill Fare=0 passengers with Pclass median
Direction 2: Model Swap — GradientBoosting (Expected +0.005~0.01)
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(
n_estimators=200,
max_depth=3,
learning_rate=0.1,
random_state=42
)
GBDT is sometimes more stable than RF on small datasets because sequential boosting is less sensitive to outliers.
Direction 3: Amplify Title=Master Strong Signal (Expected +0.003~0.005)
# Master (boys) have extremely high survival in train, but RF may not have learned it fully
# Try separating Master from Mr as an independent strong feature
full['IsMaster'] = (full['Title'] == 'Master').astype(int)
Direction 4: Pclass * HasCabin Interaction (Expected +0.003~0.005)
# First Class having Cabin is normal; Third Class having Cabin is very unusual
full['Pclass_HasCabin'] = full['Pclass'].astype(str) + '_' + full['HasCabin'].astype(str)
Direction 5: Multi-Model Hard Voting (Expected +0.005~0.01)
from sklearn.ensemble import VotingClassifier
voting = VotingClassifier(
estimators=[
('rf', RandomForestClassifier(...)),
('gb', GradientBoostingClassifier(...)),
('xgb', XGBClassifier(...))
],
voting='hard', # Hard Voting is usually more stable than Soft on small data
n_jobs=-1
)
Recommended Execution Order
- Test Direction 2 alone first (swap to GBDT): if LB jumps to 0.80+, RF has hit a bottleneck on this feature set
- Then try Directions 1+3+4 combined: feature fine-tuning, expected additional 0.005~0.01
- Finally try Direction 5 (Hard Voting): if single models reach 0.80, ensemble may push to 0.81~0.82
Validation Strategy
Change one variable at a time, submit quickly to verify: - If improved → keep, continue with next optimization - If degraded → revert, record why it was ineffective
Goal: reach 0.80+ from 0.78947 in 2–3 iterations.