One of the things we’re trying hard to do with fuzzy.io through our private beta is illustrate the uses of our platform by providing example applications. We spend a lot of time working with customers helping them use fuzzy.io to solve problems in their businesses, and we want to have clearer ways for them
to understand it.
We thought it would be fun to set aside a day to build something that would be useful for us, and serve as a good example of how fuzzy.io can be used. Matt and I both enjoy using Product Hunt, to discover new products/sites/apps. So as our demo app, we decided to make a web site that would rate and rank products on Product Hunt according to different criteria for each user. Here’s the project: Personal Hunt
When we launched it on Product Hunt, we were featured in the daily Top 10. Check it out on Product Hunt here:
var s = document.createElement(‘script’);
s.async = true;
var theUrl = ‘//embedhunt.com/products/22507/widget’;
s.src = theUrl;
var embedder = document.getElementById(’embedhunt-22507′);
window.addEventListener(‘load’, async_load, false);
We’ve found that fuzzy.io works really well for personalization and recommendations, so this would be a good exercise to see how quickly we could build out a custom recommendation engine.
Deciding on outputs
With any fuzzy analysis, there are several steps you have to take. First is to decide what kind of outputs you want to generate. For Personal Hunt, we were more interested in presenting the most relevant products to the user, in order from most relevant to least. So, it was clear that “relevance” was the output we were looking for. Since we didn’t have a fixed set of units, we went with 0-100 as a unitless scale — 0 being completely irrelevant, 100 being totally relevant.
Once we knew which outputs we were trying to generate, we had to choose some inputs that would influence those outputs. We decided to limit ourselves, for the time being, to data that we could get from the Product Hunt API itself. Product Hunt’s application is closely tied to Twitter, and we think there may be some more data we could obtain from connecting to a user’s Twitter stream, but we decided to see what value we could get from PH data first.
Once we’d reviewed the documents, we decided to concentrate our inputs on three main areas:
- Your behavior
- Whether you voted up a related product
- Whether you commented on a related product
- Your friends’ behavior
- How many of the people you follow voted up the product
- How many of the people you follow commented on the product
- Whether any of the people you follow hunted the product
- Whether any of the people you follow made the product
- The entire Product Hunt community
- Total number of votes on a product
- Total number of comments on a product
For each of these inputs, we set up five fuzzy sets mapping to very low, low, medium, high, and very high values of the input. We used the data from a single week to get a rough idea of what each fuzzy set should be. So, say, for the total number of votes on a product, we looked at how many votes the the top products get, and made that “very high”. We compared how many votes the lowest products get, and we made that “very low”. And we did roughly linear models around the middle.
Finally, we defined rules to link the inputs to the outputs. Since there weren’t clear interactions between the input areas, we decided to just make linear relationships between each input and the relevance output. So, if the number of your friends who voted up the product was medium, it would make the relevance medium (barring any other input).
For some of the inputs, we felt that they should only have a positive effect. So, for example, it’s relatively rare, even for people who work in tech, to have a friend be one of the makers of a product. So we didn’t want to have a “very low” value of “number of friends who made this product” drag down the overall relevance for the product. In these cases, we only added fuzzy sets, and rules, for the “high” and “very high” values, and left the rest undefined.
With most of our customers, it’s important for us to weight rules according to their importance to the outputs. Rule weights in fuzzy systems can go from 0 (no influence) to 100% (full influence). We typically tell customers to use values of “low” (25%), “normal” (50%) and “high” (100%”). After initial configuration, the rule weights can be optimized using a feedback training system.
In this case, we wanted to let the end users set the importance of each rule. We’d done something similar in the Best Team in Baseball example, but in this case we wanted to make it fully configurable.
So, each Personal Hunt user has their own fuzzy agent in the fuzzy.io system. Its rule weights are set based on user input, and that agent is used to score each product. This has the upside of giving each Personal Hunt user highly customized scores for each product. The downside is that tuning these scores is up to the user, rather than up to an automated optimization process. However, we felt that for a demo app that made sense.
We got a version of the app ready for a local event, and we had a good presentation. We’ve also used the app for demos to tech-savvy audiences since then, since we figured that it will make sense to folks familiar with Product Hunt. We also got some product feedback from our friend Chris Messina, who gave us a few suggestions about the app:
- We were originally just re-ranking the current days products. He suggested that for active Hunters, it made sense to show recent products from, say, the last week, that the user hasn’t voted or commented on. That way, we could show “products you may have missed” and provide some real value.
- The 0-100% fuzzy.io relevance score didn’t make any sense to him, so he suggested a letter-grade system (A-F) or a scale of 1-10.
We implemented these, and found out that we had a lot of optimization to do. Our original version fetched all the products and rated them automatically, but fetching a week’s worth of posts and rating those was taking too long. Not to mention that, to get each post’s full votes and comments, we needed to do the paging requests to the Product Hunt API (which only shows a few sample votes when you retrieve a post).
We weren’t satisfied with the linear fuzzy sets for our inputs, so we did some fuzzy c-means clustering on the Product Hunt data for the previous week. This gave us some more accurate fuzzy sets that reflected the reality of Product Hunt data. Our intuition had been close, but we were glad to have more accurate fuzzy sets in there.
We ended up changing our app to sync data globally from PH once every 15 minutes. This makes Web requests quick, but it also means that if you’ve voted or commented on a product in the last 15 minutes, we might show it to you in the list. The app also caches the scores for each product. Finally, we use a boxcar method to score a bunch of products at once, to save on round-trip time via HTTPS.
Overall, we’re happy with the results. The app is pretty quick, and it gives some good results. If we were going to make changes, we’d probably do a more thorough statistical analysis of the data from Product Hunt to rebalance the models and get an even 0-10 score for each product. Right now, the scores tend to cluster between 2-6, with only a few exceptional products getting higher than that. We think there’s probably some work we could do to improve that spread.
All the code for Personal Hunt is available at https://github.com/fuzzy-io/personalhunt. It shows the signs of being a hack-day project with a few layers of additional functionality and desperate optimization, but overall it’s probably worthwhile code to review for anyone wanting to implement similar functionality.
Last but not least, feel free to sign up for our private beta.