Skipilot Methodology

How we score every ski.

A note on the math behind Skipilot's recommendation engine: the twelve dimensions we score, how the weights shift to your inputs, and what we deliberately leave out.

Micah Gao·Founder·April 2026

The honest truth about ski reviews is that almost none of them are about you. A typical buyer's guide cycles through hundreds of skis a season. Each one gets a few hours with three or four testers, the notes get averaged, and the magazine crowns winners. The winners are the skis that did well across an averaged range of testers. That's a useful signal if you happen to be average. It's not very useful if you're five-foot-four, ski intermediate trees in Vermont, and got into the sport last year.

Skipilot was built around a different premise. Instead of asking which skis are best, we ask which ski is best for you. Not in a feature-flag, “filter by skill level” sense, but on twelve specific dimensions calibrated to your answers, your region, and your stated priorities.

The system has two layers. Today, Claude (Anthropic's Sonnet model) scores candidate skis against a twelve-dimension rubric encoded in our system prompt and returns the top three. Through late 2026 we're moving the deterministic parts of that scoring (regional weighting, priority weighting, skill-ceiling logic) out of the prompt and into pure functions that run before the model is called. The model's job will narrow to writing the reasoning paragraphs, not deciding the rank order. The piece below describes the combined system: the rubric that's live today, and the deterministic weighting layer landing this year.

The twelve dimensions

Every ski gets a 0 to 100 score on twelve dimensions, grouped into four categories.

Body fit covers length, waist width, flex pattern, and rocker profile. These four answer a single question: does the ski's geometry match the body that's going to ski it?

Performance covers edge grip, forgiveness, stability at speed, and damping. These answer how the ski actually behaves in motion.

Terrain match covers float in soft snow, maneuverability, and versatility. These answer whether the ski suits the place you ski.

Progression is one dimension on its own: skill ceiling. It answers whether you'll outgrow the ski.

Each category does a job. Body fit covers the things you can measure. Performance covers the things you feel. Terrain match covers the things your geography decides for you. Skill ceiling covers what you'll regret in two years.

A worked example: an East Coast carver, 8 days a year

The deterministic weighting layer described in this section is the system we're building toward. Today, Claude executes scoring directly from the rubric in our system prompt; the modifiers below describe how the math will run end to end once the deterministic layer ships later this year.

Say you tell us you ski 8 days a year, mostly groomers in Vermont, you describe yourself as an intermediate, and your top priority is edge grip. A candidate ski (call it ski A) gets these raw scores from its spec sheet:

length_fit         88
waist_width_fit    92
flex_pattern       78
rocker_profile     75
edge_grip          95
forgiveness        70
stability          88
damping            90
powder_float       60
maneuverability    78
versatility        82
skill_ceiling      88

Those raw scores are computed from the ski's specs against your inputs. They're not the final story. Three modifiers rescale each dimension before the average.

Regional weighting. East Coast skiers spend most of their season on hardpack and ice. Edge grip matters more here than anywhere else, so we multiply it by 1.4. Powder float matters less, so we multiply it by 0.5. The same ski scored for a Pacific Northwest skier would get the inverse treatment.

Ability weighting. An intermediate is still building consistency, so forgiveness moves to a 1.3 weight. A ski's skill ceiling is also slightly amplified at this level (1.1), because a year-one intermediate often improves faster than they expect, and a ski with no ceiling stops paying off in 18 months.

Priority weighting. You said edge grip was your top priority, which stacks another 1.2 multiplier on top of the regional one. Edge grip on this ski now reads: 95 raw, multiplied by 1.4 for region, multiplied by 1.2 for priority. That math overshoots 100, which is fine; we cap each weighted score at 100 before averaging.

After modifiers, ski A looks like:

length_fit         88
waist_width_fit    92
flex_pattern       78
rocker_profile     75
edge_grip         100   (capped from 159.6)
forgiveness        91
stability          88
damping            90
powder_float       30
maneuverability    78
versatility        82
skill_ceiling      96.8

Average: 82.4. That's the match score you see on the recommendation card.

A different ski with raw scores tilted toward powder and away from edge grip would lose ground here, even if its averaged spec sheet looked similar. That's the whole point: a ski's “objective” quality is meaningless without the skier in front of it.

Why these weights, specifically

The weights aren't arbitrary. They come from three places.

The geography. East Coast and European Alps skiers spend most of their season on hardpack and ice. The Rockies and Pacific Northwest invert that distribution. Weighting edge grip and float reciprocally by region is the simplest way to honor what skiers actually ski on rather than treating every resort as the same.

The progression curve. Beginner-to-intermediate is the steepest skill jump in the sport, and ski choices made at the beginner tier often feel sluggish a year later. We let the skill-ceiling weight ramp up at the intermediate tier specifically to discount skis you'll outgrow fastest.

The priorities you state. Asking what someone wants is more reliable than inferring it from inputs. We multiply your stated priority a flat 1.2x, no more. The ceiling matters: priorities should bias the ranking, not dictate it. A skier who says they want forgiveness most but skis primarily groomers in Vermont still needs edge grip in the mix, because a ski that fails on ice will feel terrible regardless of how forgiving it is.

What we deliberately do not score

The longer the list of variables, the easier it is to feel like the answer is rigorous. Most of what could be added to this list would make the scoring worse, not better.

Brand reputation. The same brand makes great skis and disposable skis; brand-level scores would penalize good models from middling brands and reward middling models from prestige brands.
Topsheet graphics. The aesthetics matter to you, but they don't help us match you to a ski. We surface graphics on the recommendation card so you can decide.
Press reviews. Magazine and YouTube reviews have to please an editor and an algorithm. Their scores correlate with click-through, not with fit.
Popularity. The most-sold ski is not the most-fit ski. It's the ski with the best retail placement.
Price by itself. Your budget is a hard constraint; skis above it don't appear at all. Within budget, a $700 ski and a $1,200 ski compete on fit, not on which one is cheaper.

The honest limits

A few things you should know.

First, today the scoring runs end to end inside Claude. The match scores you see now are the model's read of the rubric in our system prompt, not a pure-function calculation. The deterministic weighting layer described above is what we're building toward, not what runs in production. Once the catalog ingestion and the scoring functions land later this year, the score itself becomes deterministic and the model's job narrows to writing the reasoning text.

Second, raw spec scores are opinionated. Translating “this ski has 92mm underfoot, a 17m turn radius, and a flat tail” into a 0 to 100 forgiveness score is a judgment call informed by published spec sheets, ski-shop tester notes, and time on snow. Two reasonable people would not give identical raw scores. The weighting layer sits on top of that judgment, not in place of it.

Third, no scoring system replaces a session with a real boot fitter. If you have a complicated foot, a recent injury, or the budget for one in-person fitting, do that and use Skipilot to narrow the shortlist before you go.

What you do with the score

The match score is a starting point, not a verdict. The recommendation card shows the per-dimension breakdown alongside the headline number, and the why-not-others section names the runners-up and what beat them. Pay attention to the breakdown more than the average: a ski at 84 with a low forgiveness score might be a worse match than a ski at 81 that's strong everywhere you care about.

Then go skiing.

Micah Gao is the founder of Skipilot.

April 30, 2026

Take the quiz →