Training Guide

How to train your own model

Training a custom model lets VolkCell adapt to your cells, your imaging conditions, and your counting preferences. The whole flow takes about 10–15 minutes of your time, plus a few minutes of unattended GPU compute. Watch the walkthrough below, then follow the step-by-step guide.

Walkthrough video

~5 minutes

Video coming soon

The walkthrough will appear here

Tip: enable captions and play at 1.25× if you've already done a training run before.

The seven steps

Every training run follows the same flow. Each step has a clear "what you do" and "what's happening behind the scenes" — read the second one only if you want to know the why.

  1. 1

    Upload your images

    Pick 5–15 representative chamber images covering the range of densities, lighting, and cell shapes you actually deal with. More variety beats more images.

    Behind the scenes: VolkCell deduplicates and validates each file, then runs the default the default model model to give you a starting segmentation you can correct.

  2. 2

    Wait for segmentation

    The pipeline runs the base model on every uploaded image so you have masks to correct. This usually takes 30–90 seconds per image on the GPU.

    Behind the scenes: The same Modal A10G GPU that handles production inference. You'll see live progress; nothing else needs your attention.

  3. 3

    Correct the masks

    For each image, open the editor and fix what the base model got wrong: add missed cells, remove false positives, split touching cells. Click Save corrected mask when you're happy.

    Behind the scenes: Your corrections become the ground truth for training. Every saved mask is stored as a labeled instance map alongside the original image.

  4. 4

    Start training

    Once at least one corrected mask is saved (more is better), the Start Training button unlocks. Pick a name for your model and click go. You can close the tab — training continues in the background.

    Behind the scenes: VolkCell builds an approved-only training dataset, then fine-tunes the the default model backbone on it. A GPU job spins up on Modal — no setup, no Python.

  5. 5

    Test on new images

    When training finishes, run your new model on a fresh image (one it hasn't seen) and eyeball the result. Does it count what you'd count?

    Behind the scenes: The trained checkpoint is loaded into the same inference path your team uses in production — what you see here is what they'll see.

  6. 6

    Review & iterate

    If results look good, save the model. If not, go back to step 3, correct a few more masks (especially on the failure cases), and retrain. 2–3 iterations is normal.

    Behind the scenes: Each iteration uses every saved correction across the session, not just the new ones. Your dataset grows monotonically.

  7. 7

    Save & deploy

    Hit Save model. It's now selectable in the model dropdown on the analysis page — both for you and anyone else on your team.

    Behind the scenes: The checkpoint is registered in the model registry and copied to the shared models directory. Production traffic can use it immediately.

Tips for a strong model

Diversity beats volume

Ten images covering low / medium / high density and clean / messy fields will beat thirty near-identical ones every time.

Correct the failures

When iterating, prioritise correcting the images where the previous model did worst. That's where the model learns the most.

Be ruthless about edges

Touching cells are the hardest case. A few well-corrected clusters teach the model far more than a hundred isolated cells.

Don't over-correct

If a cell is genuinely ambiguous, leave it alone. Forcing a label you'd disagree with tomorrow makes the model worse, not better.

Help

How to read the validation benchmark

Start with the fresh test image overlay. If that overlay looks wrong, the model is not ready no matter what the numbers say. The benchmark scores are there to add precision, not replace judgment.

F1 balances misses and false positives. Higher is better.
IoU measures overlap with the corrected mask. Higher is better.
Precision tells you how many predicted cells/pixels were actually right. Higher is better.
Recall tells you how many real cells/pixels were found. Higher is better.
Count error shows how far the predicted count is from the corrected count. Lower is better.
Important: these scores currently benchmark the model on images you already corrected during training. They are useful for measuring improvement, but they are not as strong as a true held-out test set with fresh human corrections.

Common questions

How many images do I really need?
One is the minimum to unlock training, but in practice 5–10 corrected images gives you a usable model and 15–20 gives you a great one.
How long does training take?
On the default "fast" preset, 1–3 minutes for a small dataset. The "balanced" preset takes 5–10 minutes and usually generalises better.
Can I keep using the app while it trains?
Yes. Training runs on a separate GPU container — analysis traffic isn't blocked. You can even close the tab and come back later.
My new model is worse than the default. What now?
That usually means too few corrections, or that the corrections themselves have inconsistencies. Go back to step 3, audit your saved masks for mistakes, add a few more diverse images, and retrain.
Can I delete a model I no longer want?
Yes — head to the model registry from the dashboard and remove it. Existing analyses that already used it are unaffected.

Ready to start?

You can come back to this page any time from the training screen.

Start training