10-review minimum before any rating is shown publicly — with a bootstrapping question to answer
Ratings with small sample sizes are misleading in both directions — a new app with two 5★ reviews from friends looks great; one dissatisfied early user can tank an otherwise good app. The default instinct is to show something, even a provisional badge. The research argues against it. HubSpot uses a hard 10-review minimum for their B2B marketplace, which this prototype adopts.
The risk is launch-day silence. Kit's App Store has a relatively small number of apps and many will have modest install bases. If most apps don't cross 10 reviews for months, the feature could make the browse experience worse — not better — with a grid full of "Not yet rated" labels. Before committing to 10, we need to model this against actual install data: what percentage of Kit apps would show a rating at launch? At 6 months? If the answer is "very few," the right response is either a lower provisional threshold with a clear caveat (e.g. "Based on 4 reviews"), a proactive seeding campaign on launch, or a phased rollout starting with the highest-install apps.
HubSpot App Marketplace · B2B rating standards
Install-verified badge + usage duration on every review card
The core trust problem with review systems isn't fake 1★ attacks — it's inflated 5★ reviews from developers promoting their own apps, or from users who installed briefly and never really used it. Both patterns produce ratings that don't reflect actual experience.
Shopify shows a "Verified install" indicator on every review and a usage duration stamp ("Over 6 years using the app") that communicates depth of experience without requiring the reviewer to say anything. This is structural anti-gaming — the system validates the review at write-time — rather than relying on moderation or report buttons after the fact. The prototype implements both: "Verified install · Using for X months" appears on every review card. Kit has this data from its install records; no additional verification step is required from the reviewer.
Shopify App Store review cards
Reviewer identity: creator type + Kit tenure, not company name
B2B review platforms (G2, Capterra, TrustRadius) show reviewer role and company size on every review because professional buyers need to know if feedback is relevant to their context. A 500-person enterprise's experience with an integration is meaningfully different from a solo creator's.
Kit doesn't have verified company-size data, and asking for it is friction most reviewers won't tolerate. The prototype adapts the pattern to Kit's context: reviewers self-select a creator type at submission (Solo creator / Agency / SMB), and their Kit tenure is surfaced automatically from account data ("Kit member 4 years"). This gives readers enough context to judge relevance — a solo creator can filter mentally to reviews from people like them — without requiring Kit to build a verification system or handle sensitive business data.
G2 / Capterra review card design · UX research on B2B review systems
30-day calendar trigger as the default, with a usage-milestone alternative worth exploring
Most review prompts arrive too early — immediately after install, before the user has done anything meaningful. HubSpot sends an automated 30-day post-install email, which is a reasonable heuristic: if you've had something installed for a month, you've either found value or you haven't.
The weakness is that calendar time doesn't track usage. A creator who installed Canva 35 days ago but hasn't synced a single asset yet has no experience to share. For Kit, a stronger signal might be milestone-based: after the creator has sent their first broadcast using an image from the app, or after an automation using the integration has triggered for the first time. The prototype implements the 30-day banner (simpler, known to work), but the question for the team is whether Kit has the usage event data to support milestone-based triggering and whether the engineering cost is justified.
HubSpot App Marketplace · Shopify developer rating guidance
Developer response to reviews — a V1 decision, not a future-work deferral
The prototype covers the creator-facing experience thoroughly but says nothing about what happens when a developer receives a 1-star review they believe is unfair, inaccurate, or based on a misunderstanding. This isn't a minor edge case — it's the first question any app developer will ask when they hear Kit is adding ratings, and it directly affects the developer goodwill the Extensibility team depends on.
Shopify and HubSpot both let developers post a public reply to reviews. This serves two functions: it gives developers a legitimate channel to address criticism, and it signals to readers that the developer is engaged and responsive. The prototype does not include a developer reply UI, and V1 probably shouldn't — it adds surface area and moderation complexity. But this needs to be an explicit decision with reasoning, not a silence. The open question is: if a developer receives a damaging review they believe is wrong, what recourse do they have at launch? A report/flag mechanism is the minimum viable answer. Without it, the feature risks going live with no developer safety valve.
Shopify App Store developer responses · HubSpot App Marketplace review replies
Weighted sort score vs. raw average for "Top rated" ordering
A new app with a single 5★ review would rank above Canva with 4.8★ from 1,200 reviews if sorted by raw average. This is the canonical gaming problem for any rating-sorted list, and it makes "Top rated" untrustworthy within days of launch.
Shopify's algorithm blends the app's own average toward a platform-wide baseline, weighted by review count — the fewer reviews an app has, the more its displayed score is pulled toward the global mean. An app with 1 review at 5.0★ might display as 4.1★ in the sort order; once it has 50+ reviews the algorithm steps back and lets the true average dominate. The prototype displays the unmodified average (e.g. 5.0★) on the app's own detail page for transparency, but sorts the browse grid by the weighted score. This distinction — display score vs. sort score — is a decision the engineering team will want to weigh in on.
Shopify App Store ranking algorithm · Statistical modelling (Bayesian average)