2026-06-29

it took four SFT rounds to teach IDK-1 to count to three

106M params, 4810 instruction pairs, val loss 1.3670. IDK-1-Instruct is live on HuggingFace.

2026-06-25

i trained the first sundanese language model from scratch and it took 83 minutes

40 million speakers, zero dedicated LMs. so i built one — 9M params, 122M tokens, nanoGPT from scratch.

2026-06-23

i built a clean indonesian text corpus because every existing one is broken

CulturaX taught IDK-1 nothing. so i built Cleanesia — 17,761 docs, 19M tokens, open API.

2026-06-23

i cancelled 53,000 steps of training because of one line of code

the loss wasn't flat because of learning rate. it was flat because i was training on scrambled tokens.

2026-06-22

IDK-1 at 47,500 steps: what the output actually looks like

47.5% done. loss still flat. ran inference for the first time. here's what came out.

2026-06-21

IDK-1 week 1: what 8,000 steps of training actually looks like

val loss 10.7 → 7.79. free compute, real numbers, zero drama.

2026-06-20

why i went from 500M to 100M parameters (it's not what you think)

DFD-1 wasn't a failure. it was a tradeoff. here's the math.

2026-06-20

i trained a crisis PR model and it said 'bitch investigations' once

630K params, character-level tokenizer, val loss 1.03. it works if you squint.

2026-06-20

what i learned building an indonesian slm as a non-cs student

i'm a digital PR major. here's what three models taught me that no course would.

2026-06-20

i fine-tuned qwen3-4b on indonesian government documents and it actually works

1966 Q&A pairs, 30 minutes on kaggle, and it now cites specific law numbers correctly.