Product Prototypes
Ecosystem · App Store

App Store Ratings

Kit's App Store has a quality signal problem. Today, creators browse apps that look identical — same card layout, same description format — with no way to know whether an integration is well-maintained and loved, or whether it was abandoned after a few commits. The only feedback loop is installing, waiting, and hoping. This creates decision anxiety that slows installs and, for low-quality apps, generates churn that Kit can neither see nor attribute.

This prototype explores what it would look like to add creator ratings and reviews to the App Store — on browse cards, on app detail pages, and with a submission flow that captures meaningful signal without interrupting the experience. The three things we want to validate: whether ratings on cards help with the browse decision, whether the submission timing and flow feel right, and whether the social proof section on the detail page actually moves the needle on installs.

Open prototype
What's in this prototype
Ratings on browse cards
Every app card in the browse grid now shows a decimal rating and review count (e.g. ★ 4.3 · 847 reviews) beneath the description. Apps below the 10-review threshold show nothing — not a zero, not a placeholder — to avoid misleading signal from small samples. A "Top rated" option is available in the sort dropdown, which reorders the grid using a weighted score rather than a raw average.
📊
Filterable rating histogram
The app detail page gets a rating section above Key Features: a large aggregate score, five distribution bars (5★ → 1★) with review counts, and a row of filter chips (All · Most recent · Most helpful · Critical). Each histogram bar is clickable — tapping 1★ filters the reviews below to show only critical feedback. The J-shaped distribution (lots of 4–5★, small tail) reads as authentic; a flat or all-5★ spike triggers gaming suspicion.
💬
Creator reviews with identity context
Each review card shows: avatar initials, creator type (Solo creator / Agency / SMB), how long they've been on Kit, a "Verified install" badge, usage duration ("Using for 8 months"), star rating, and review text. The B2B pattern from G2 and Capterra shows that context matters more than volume — a solo creator and an agency both need to see reviews from accounts like theirs before trusting the signal.
📝
Rating submission flow
A three-step modal: (1) five large interactive stars — hover highlights, tap sets, no submit button visible until a star is clicked; (2) optional text area and a creator-type selector (self-categorisation feeds the reviewer identity card); (3) confirmation screen. Stars come first and are the minimum viable submission — text and identity are opt-in so friction doesn't suppress the signal.
🔔
Post-use rating prompt
A dismissable amber banner on the app detail page, shown to creators who have had the app installed for 30+ days: "You've been using Canva for a month. How has it been so far?" with inline star taps and a "Not now" dismiss. The prompt is non-blocking and appears only on the detail page — never as a modal interrupt mid-workflow. The prototype shows the 30-day version; one open question is whether a usage-milestone trigger would be better.
🏆
Top-rated sort
Adds "Top rated" to the browse sort dropdown alongside Popular, A–Z, Newest. Apps without a qualifying rating (fewer than 10 reviews) are pushed to the end rather than excluded entirely — they're still discoverable, but well-rated apps surface first. The prototype sorts by weighted score, not raw average, to prevent a new app with 1×5★ from topping the list above Canva with 1,200+ reviews.
Design & product decisions
10-review minimum before any rating is shown publicly — with a bootstrapping question to answer
Ratings with small sample sizes are misleading in both directions — a new app with two 5★ reviews from friends looks great; one dissatisfied early user can tank an otherwise good app. The default instinct is to show something, even a provisional badge. The research argues against it. HubSpot uses a hard 10-review minimum for their B2B marketplace, which this prototype adopts.
The risk is launch-day silence. Kit's App Store has a relatively small number of apps and many will have modest install bases. If most apps don't cross 10 reviews for months, the feature could make the browse experience worse — not better — with a grid full of "Not yet rated" labels. Before committing to 10, we need to model this against actual install data: what percentage of Kit apps would show a rating at launch? At 6 months? If the answer is "very few," the right response is either a lower provisional threshold with a clear caveat (e.g. "Based on 4 reviews"), a proactive seeding campaign on launch, or a phased rollout starting with the highest-install apps.
HubSpot App Marketplace · B2B rating standards
Install-verified badge + usage duration on every review card
The core trust problem with review systems isn't fake 1★ attacks — it's inflated 5★ reviews from developers promoting their own apps, or from users who installed briefly and never really used it. Both patterns produce ratings that don't reflect actual experience.
Shopify shows a "Verified install" indicator on every review and a usage duration stamp ("Over 6 years using the app") that communicates depth of experience without requiring the reviewer to say anything. This is structural anti-gaming — the system validates the review at write-time — rather than relying on moderation or report buttons after the fact. The prototype implements both: "Verified install · Using for X months" appears on every review card. Kit has this data from its install records; no additional verification step is required from the reviewer.
Shopify App Store review cards
Reviewer identity: creator type + Kit tenure, not company name
B2B review platforms (G2, Capterra, TrustRadius) show reviewer role and company size on every review because professional buyers need to know if feedback is relevant to their context. A 500-person enterprise's experience with an integration is meaningfully different from a solo creator's.
Kit doesn't have verified company-size data, and asking for it is friction most reviewers won't tolerate. The prototype adapts the pattern to Kit's context: reviewers self-select a creator type at submission (Solo creator / Agency / SMB), and their Kit tenure is surfaced automatically from account data ("Kit member 4 years"). This gives readers enough context to judge relevance — a solo creator can filter mentally to reviews from people like them — without requiring Kit to build a verification system or handle sensitive business data.
G2 / Capterra review card design · UX research on B2B review systems
30-day calendar trigger as the default, with a usage-milestone alternative worth exploring
Most review prompts arrive too early — immediately after install, before the user has done anything meaningful. HubSpot sends an automated 30-day post-install email, which is a reasonable heuristic: if you've had something installed for a month, you've either found value or you haven't.
The weakness is that calendar time doesn't track usage. A creator who installed Canva 35 days ago but hasn't synced a single asset yet has no experience to share. For Kit, a stronger signal might be milestone-based: after the creator has sent their first broadcast using an image from the app, or after an automation using the integration has triggered for the first time. The prototype implements the 30-day banner (simpler, known to work), but the question for the team is whether Kit has the usage event data to support milestone-based triggering and whether the engineering cost is justified.
HubSpot App Marketplace · Shopify developer rating guidance
Developer response to reviews — a V1 decision, not a future-work deferral
The prototype covers the creator-facing experience thoroughly but says nothing about what happens when a developer receives a 1-star review they believe is unfair, inaccurate, or based on a misunderstanding. This isn't a minor edge case — it's the first question any app developer will ask when they hear Kit is adding ratings, and it directly affects the developer goodwill the Extensibility team depends on.
Shopify and HubSpot both let developers post a public reply to reviews. This serves two functions: it gives developers a legitimate channel to address criticism, and it signals to readers that the developer is engaged and responsive. The prototype does not include a developer reply UI, and V1 probably shouldn't — it adds surface area and moderation complexity. But this needs to be an explicit decision with reasoning, not a silence. The open question is: if a developer receives a damaging review they believe is wrong, what recourse do they have at launch? A report/flag mechanism is the minimum viable answer. Without it, the feature risks going live with no developer safety valve.
Shopify App Store developer responses · HubSpot App Marketplace review replies
Weighted sort score vs. raw average for "Top rated" ordering
A new app with a single 5★ review would rank above Canva with 4.8★ from 1,200 reviews if sorted by raw average. This is the canonical gaming problem for any rating-sorted list, and it makes "Top rated" untrustworthy within days of launch.
Shopify's algorithm blends the app's own average toward a platform-wide baseline, weighted by review count — the fewer reviews an app has, the more its displayed score is pulled toward the global mean. An app with 1 review at 5.0★ might display as 4.1★ in the sort order; once it has 50+ reviews the algorithm steps back and lets the true average dominate. The prototype displays the unmodified average (e.g. 5.0★) on the app's own detail page for transparency, but sorts the browse grid by the weighted score. This distinction — display score vs. sort score — is a decision the engineering team will want to weigh in on.
Shopify App Store ranking algorithm · Statistical modelling (Bayesian average)
Questions for the team
Things to explore and validate