Blog — Deflated

it took four SFT rounds to teach IDK-1 to count to three

106M params, 4810 instruction pairs, val loss 1.3670. IDK-1-Instruct is live on HuggingFace.

40 million speakers, zero dedicated LMs. so i built one — 9M params, 122M tokens, nanoGPT from scratch.

CulturaX taught IDK-1 nothing. so i built Cleanesia — 17,761 docs, 19M tokens, open API.

the loss wasn't flat because of learning rate. it was flat because i was training on scrambled tokens.

47.5% done. loss still flat. ran inference for the first time. here's what came out.

val loss 10.7 → 7.79. free compute, real numbers, zero drama.

DFD-1 wasn't a failure. it was a tradeoff. here's the math.

630K params, character-level tokenizer, val loss 1.03. it works if you squint.

i'm a digital PR major. here's what three models taught me that no course would.

1966 Q&A pairs, 30 minutes on kaggle, and it now cites specific law numbers correctly.