Building the Clixbee Score

💡
This is the second post in a short series covering my adventure into vibe-coding a photo app. Check out the first post for more context.

When I started testing Clixbee against real photo batches, something felt off.

The app produced scores, but the analysis time was slow and the scores were inconsistent. Sharpness alone could take several seconds per photo. Composition was disabled and returning fallback values. Some analyzers had placeholder strings wired into the UI. Under heavier loads, concurrency issues would cause intermittent crashes. The engine worked in principle but it did not work in practice.

This was a great opportunity to lean into Claude Code's CLI. After an assessment, Claude identified sharpness as the most obvious bottleneck. The analyzer was doing far too much work. It ran redundant Vision requests, executed a Laplacian convolution that required a GPU readback, and iterated across millions of pixels without sampling. There was also a vertical gradient bug that zeroed out half of the edge calculation. These are pain points that would have taken me weeks or months to learn and find on my own.

The fix was not clever either. Claude suggested I remove what was not contributing meaningful signal. Redundant Vision passes went away, then the Laplacian path was deleted. Sharpness analysis dropped from several seconds to a few dozen milliseconds. More importantly, the score started to stabilize.

This improvement pushed me to move to the other analyzers with Claude next. Composition needed structural changes because Vision requests were running synchronously on Swift’s cooperative thread pool, which caused stalls on complex images. AI suggested I move that work onto dedicated queues, batch related requests into a single perform() call with timeouts, and isolate contour detection so it could degrade gracefully. It also corrected a bias in the leading lines score by shifting the sigmoid curve. All automatically while I approved the changes.

Needless to say, I didn't stop there. With Claude chugging along, we hit exposure and color. Claude found concurrency issues, a noise normalization error, and weak execution rules. Lightweight timing logs now flag analysis that exceeds 150 milliseconds, making performance regressions easier to catch early.

To avoid relying on large real images for testing, Claude helped me create regression tests using programmatically generated images. Gradients, noise fields, dark frames, and flat colors help verify score bands and relative ordering. They run quickly and expose unintended shifts in weighting.

The best part, in my opinion, is that the engine remains on device. It evaluates the image and returns the score without any cloud uploading. Calling back to the original premise of Clixbee, the photos are already good. Let's just find out which one is the best.

Finalizing the design, App Store build, and releasing is next. Stay tuned!