Build Fraud Detection Into Your Apps

There are a number of fraud detection tools available on the market, but for some companies, it’s important to have greater control over how the fraud scoring algorithms work and to have them keep learning and improving using machine learning. Our team has put together a demo and some sample code to show how you can build this with

fraud score

For this example, let’s assume that is being used by a company with an online store, where customers order products online. After running their business for a year, they’ve noticed that all of the following factors often indicate that a transaction is fraudulent:

  • Transactions from new users
  • Transactions much larger than the online store’s average size
  • Mismatches and very large distances between the billing address, shipping address, phone number location and IP address location

They also noticed that in addition to the points above, some lesser indicators of fraud are:

  • Using the most expensive shipping option might indicate fraud
  • Multiple credit cards used from same IP increases fraud
  • Multiple credit cards shipping to the same address increases fraud


With, you could quickly build an agent based on these rules and get a fraud score in real-time for each transaction. In fact, we built one and created a demo interface you could use to try it out:

Visit our Fraud Detection demo site

We’ve also made all of the code behind this demo available. Check it out on GitHub.

Using a Chatbot to Offer Dynamic Promo Codes

Chatbots have become a really popular way for companies to interact with customers. Most chatbot experiences, though, are pretty static. As a developer, you need to set up a messaging flow with content that doesn’t change much based on user interaction.

Some of our users have discovered that you can use along with most chatbot frameworks to create a much more personalized experience for your customers. Some examples include offering dynamic promotions based on user behavior, and recommending the most appropriate products.

To show this in action, we’ve built a sample bot that we call the Promobot. You can chat with Promobot on Facebook Messenger and based on how you answer the questions it asks, Promobot will offer you a personalized discount on your subscription. Promobot is built using’s Botkit toolkit.

Here’s a sample chat with Promobot:

dynamic chatbot coupon.png

Behind the scenes, is taking the data from the chat responses to predict the discount most likely to result in a purchase. To get started, the agent powering the discount decision is a pretty simple one. It’s based on these 5 rules:

IF hasAccount IS no THEN discount IS high
IF hasAccount IS yes THEN discount IS low
IF tutorial IS no THEN discount IS high
IF tutorial IS yes THEN discount IS low
lastAPICall INCREASES discount

And the discount that Promobot offers in our example ranges from 0% to 20% based on the user’s responses.

Once Promobot offers the user a discount code, it then uses’s machine learning to automatically optimize its rules to offer the lowest effective discount possible.

chatbot machine learning.png

In our example, Promobot does this by asking the user if they plan on using the discount code and then providing that feedback to In real implementations, you might send this feedback to once the user actually makes a purchase.

The feedback Promobot provides is the % of full price paid by the customer. So, if Promobot offers a 15% discount and the customer says they plan to use it, Promobot sends “85” (100% minus 15%) to If the customer says they don’t plan to use it, then we consider it a lost sale and Promobot sends “0” (zero) to’s machine learning algorithms are seeking to maximize this value, and will automatically optimize the rules to get the highest possible result.

This is a simple example, but a more complex one could include dozens of questions and external data to create a truly personalized experience.

Please feel free to chat with Promobot on Facebook Messenger and check out the project’s code on GitHub to fork it for your own bots.


How to Create Custom User Scores for Intercom

Some of our Google Sheets Add-on beta users have been using it to do some really interesting stuff. One of my favorites is the company that is using it to identify their best customers by creating custom user scores using and user data from Intercom.


Here’s how you can build something similar:

Gather Data

First, export user data from Intercom (Instructions here) and upload it to a new Google Sheet. It should look something like this:

intercom data@2x.png

Given all of the data available in Intercom, you can really easily do a lot here. To get started, let’s narrow it down to a few columns:

sample columns.png

With these columns, we can easily get a few useful facts about each customer:

  • How long has it been since they signed up
  • How long since their last login
  • How many times they’ve logged in
  • Whether they’re on a free or paid plan

You can add columns and use an easy formula in Google Sheets to calculate the number of days between today and the Signed up and Last seen dates:


And an if/then statement to show whether or not the user is on a paid plan:

=if(P2 = "free", 0, 1)

The result is a sheet like this:

columns with formulas.png

(To help you get started quickly, here’s a copy of this spreadsheet. Just go to the File > Make a Copy menu to make a copy of it in your own Google account.)

Set Up Your Rules

If you don’t already have a account, you can sign up here to try it for free.

Next, let’s decide on a few rules to start things off. These are a pretty obvious place to start:

  1. The longer it’s been since a user signed up, the better, so a larger number of days here is good.
  2. The shorter it’s been since a user last logged in, the better, so a smaller number of days is best.
  3. The more times a user has logged in, the better, so a larger number here is also good.
  4. A user on a paid plan is better than one on a free plan.

Let’s also say that rules 2,3 and 4 are much stronger signals than rule 1, so we can give rule 1 a lower weight.

Here’s what those rules might look like in


intercom scoring rules@2x.png

(Follow this link and click on Create to make a copy of this agent in your own account)

Take note of your API key and the Agent ID of the agent you created above. You’ll need them for the next step.

Calculate User Scores

If you haven’t already made a copy of the sample spreadsheet above, you can do that here.

Next, install the Google Sheets add-on.

Once the add-on is installed, go back to the Add-ons menu > > Settings and enter your API key and Agent ID.

With the Google Sheets add-on, you can use the function =FUZZYAI() to send data to and display the result. It expects inputs in the following format:

=FUZZYAI(<name of input 1>, <value of input 1>, <name of input 2>, <value of input 2>, etc)

So for this example, paste this formula into cell B2 (in the User Score column):

=FUZZYAI("daysSinceSignup", D2, "daysSinceLastSeen", F2, "webSessions", G2, "paidPlan", I2)

A score should appear in that column. Just copy and paste that formula into B3 -> B10 to get the rest of the scores.

Set This Up For Yourself in 5 Steps

  1. Create a sample User Score agent in your account by following this link and clicking on Create. (If you don’t already have a account, sign up here to try it for free.)
  2. Take note of your API key (found on your dashboard) and the Agent ID of the User Score agent you created in step 2.
  3. Go to this sample Google Sheet and make a copy of it in your own Google account.
  4. Add the Google Sheets add-on to that sheet.
  5. Copy the FUZZYAI formula shown above to cell B2.

And with that, you should have a working version of this example! Try changing some of the data in the spreadsheet to see how it affects the score.

Other Ideas

The goal here, of course, is to show a pretty simple example of how to do this. We’ve seen users do similar things to create lead scoring, do churn risk analysis, and much more.

Some of our users have built agents to do this type of user score and then send that score back to Intercom daily as custom user attributes.


How to Build Your Own Product Recommendation Engine

A common application of is in powering custom recommendation engines. For many companies, generic solutions don’t offer enough flexibility (or require too much work manually setting up links between all of the different products in the catalog), and building a custom recommendation engine from scratch requires way too much time and effort.

To show how easily it can be done, we’ve put together an open source Product Recommendation plugin for Drupal Commerce stores that lets anyone spin up their own product recommendation engine with

You can see this recommendation engine in action on our demo store:


How Do The Recommendations Work?

When a user is looking at a product page on an online store, the goal of this recommendation agent is to identify the other products in the catalog that might be relevant.

The way differs from other machine learning platforms is that we let developers encode their own knowledge about how a system should work as a set of rules. The API uses those rules to provide recommendations. Over time, as feedback is sent to on how well the rules performed, the API learns and improves automatically.

In our sample recommendation agent, we identified a few rules that might show affinity between the current product and each of the other products in the catalog:

  • If the current product and another product are in the same category, it’s likely to be a good recommendation
  • If the current product and another product are not in the same category, it’s less likely to be a good recommendation
  • If the current product and another product have very different prices, it’s less likely to be a good recommendation (i.e. it decreases affinity)
  • If many customers who bought the current product also bought another product, that’s likely to be a very good recommendation (i.e. it increases affinity)
  • If the current product and another product have many of the same words in their titles, it’s likely to be a good recommendation (i.e. it increases affinity)
  • If the current product and another product have many of the same works in their descriptions, it’s likely to be a good recommendation (i.e. it increases affinity)

Here’s what those rules look like when created within


Keep in mind that these were purposely chosen because they can work generically for many online stores. Specific stores may have other rules that make sense, for example, a clothing store may want to show products for the same season and so might add rules like:

  • If sameSeason is true then affinity is high
  • If sameSeason is false then affinity is very low

What’s Next?

If you want to try it for yourself, sign up for a account and follow the instructions in the Drupal Commerce Recommendation Engine plugin on GitHub.

This project is meant as a starting point. Every developer will likely have their own modifications to make to the rules and integration in their ecommerce platform.

In an upcoming blog post, we’ll talk about how to add feedback into this recommendation engine so that it can learn based on actual customer behavior over time.

Build Your Own AI-Powered Twitter Feed

One of our favorite quick demos of’s capabilities is to show how easy it is to take your own Twitter feed, score each of the Tweets, and surface the most relevant ones. It’s one of the first new agent templates we built for the platform and it’s a lot of fun to try out.

To show how easy it can be, we put together a Ruby on Rails project to help you get started and get more acquainted with You can find that project on GitHub.

Getting Started

Getting your Tweet Relevance agent set up will take just a couple of minutes. Once you’ve signed up for a free account, go to your Dashboard and take note of the API key shown on the top left-hand corner of the page. You’ll need it later.

Next step is to create your Tweet Relevance agent. There are two ways to do this. The easiest way is to use this link and just click on Create.

Alternatively, you can create it manually by logging into your dashboard, clicking on the ADD AN AGENT  button. From there, select the TWEET RELEVANCE template:


That will automatically create a Tweet Relevance agent like the one below:


This initial template starts off with just 3 rules that should be pretty easy to understand:

  • tweets with more likes are more relevant
  • tweets with more shares are more relevant
  • older tweets are less relevant

Take note of your agent ID, which is found just above the TWEET RELEVANCE title on this page. Later in this post we suggest a few things you can add to this, but this is a good starting point.

Installing the Rails App

Next step is to clone our Tweet Relevance Ruby on Rails on GitHub, and follow the installation instructions in the file, from there you’ll be guided through the next steps of setting up the app.


Once you’ve got things set up, your app will show you the tweets that are most relevant based on the rules we defined earlier. Each tweet will be scored like this: TweetRelevance.png

What’s Next?

Now that you’ve got this simple app working, what else can you do? If you want to play around with the app and, here are some other things you could try:

  • Add new rules that take into account your friends’ behavior: how many people you follow liked a tweet, how many people you follow shared a tweet.
  • Try combining different rules, for example if you want to identify tweets that are liked by your friends but not a lot of other people, that rule could be: IF number of likes by friends IS very high AND number of likes IS low THEN relevance IS very high.
  • Set up rules to increase relevance of tweets that include keywords you’re interested in and decrease relevance of tweets that include keywords you’re not interested in.
  • Add a feedback metric to train and improve the results: add thumbs up and thumbs down buttons next to each tweet that send positive or negative feedback to the API based on which tweets you find most or least relevant.


The Best Team in Baseball

On Saturday the Montreal tech community came together at Notman House for Montreal Baseball Hackday. The event was organized by Plank Design to celebrate Montreal’s baseball heritage and the love of the game. The fact that baseball junkies love data and stats makes baseball and hacking a natural pair.

I showed up at the event with a few ideas for using fuzzy logic in a project. My friend Aran Rasmussen suggested we form a team to take on one of the challenge projects: “Prove that the 1994 Expos were the best team in baseball.” We were joined by Reda Lofti who helped us out with HTML and CSS for the project. It was Reda’s first hackday ever, and he’d just learned HTML last month.

For Montreal baseball fans, 1994 is the championship year that never was. The season ended in August due to the crippling baseball strike that led to the first cancellation of the World Series in almost a century. The Montreal Expos, who’d shown strong results for the previous few years, led the league at the time of the shut-down. And many people think that they would have been world champions if the season hadn’t ended prematurely.

But a lot happens in baseball between August and October. Wins and losses mid-season don’t really count. If you’re trying to argue that the Expos were the best team in baseball in 1994, how do you do it?

Fuzzy scoring

We took on the challenge by reframing it this way: what is the combination of statistics, and weights, that when presented to a fuzzy logic agent, give the Expos as the number 1 team? While Aran scoured the Web for stats from 1994, I started putting together a fuzzy agent strategy that would work.

If you’re unfamiliar with fuzzy logic, here’s the short description: a fuzzy agent accepts a number of input variables and maps them onto fuzzy sets — intuitive terms from the problem domain. It then uses a set of fuzzy rules to reason about the input variables and produce output fuzzy memberships. The output fuzzy values are then defuzzified into a single crisp score.

For our project, I decided to use as an output a score between 0 and 10, showing how “good” a team in 1994 was. We’ve found a few problem domains where this kind of unitless output is helpful.

For input values, Aran managed to come up with seven important stats that baseball afficionados use to compare teams and predict future performance:

  • Run differential
  • ERA
  • OPS
  • Speed score
  • Strikeout-to-walk percentage
  • WHIP
  • RA9-WAR

Some of these are familiar to any baseball fan; others are only relevant to the most hardcore SABR fanatic. But we wanted to pick numbers that were commonly used to say who’s the best team.

Aran boiled down the stats to a single table that we used for input to the fuzzy agent. I then broke down each input variable into 5 fuzzy sets — “veryLow”, “low”, “medium”, “high”, and “veryHigh”. Casual review showed each statistic varied linearly, so I just broke down the stats in 5 sets of equal size.

Some of the sets, like ERA, varied inversely with our output score. A low ERA shows a better team, and a high ERA shows a worse team. But most of them varied proportionally — a higher run differential shows a better team. So I mapped each of the inputs to an output using simple rules.

Varying the weights

The point of our project, however, was to help a user pick their argument points to show that the 94 Expos were the best. There are techniques to optimize the weights of your fuzzy rules to come out with expected scores based on training data. However, we wanted to give the user an interface to vary their own weights.

So we put together some radio buttons labelled: “Ignore”, “Low”, “Medium”, and “High”. (We changed them later, but you get the idea.) Each button represented a relative weight for rules based on that input: 0%, 25%, 50%, and 100%. When the user changes an input, we post the new weights to the back end; it then makes a new fuzzy agent with those rule weights, and scores each of the teams from the 1994 season — with corresponding data. It returns the values to the front-end, which then shows them in a sorted table.

You can see the results here: . All the code is on github at You’ll need your own API key to make the code work.

I had a lot of fun doing this project. Aran and Reda were fun to work with, and the baseball hackday was a blast. (A bag of Cracker Jack in the 9th inning was what I needed to get through the day!) Baseball is a good example of a mix between hard data and user wisdom, which is an area that fuzzy logic shines. I’m looking forward to seeing what other ways we can apply fuzzy logic to baseball stats.

Adaptive Pricing with Fuzzy Logic

One of the most important uses for machine intelligence in e-commerce applications is adaptive pricing – that is, dynamically changing the price of a product in an online store based on characteristics of the user, the product, and the store. It’s a technique that’s used extensively by big on-line retailers like Amazon, but it’s less common for smaller companies.

Partly this is because of the risk of getting prices wrong. If you offer an online user a price that’s too high, you’re going to fail to convert that user. If you offer one that’s too low, you unnecessarily cut into your margin.

Another reason is the difficulty of implementation. Traditional procedural code can be OK for the most trivial adaptive pricing, but if you have more than a couple of factors that affect price, the code can become complex and unmaintainable. It’s also hard to test its accuracy, and unhandled edge cases can cause devastating or embarrassing outcomes (free cars, billion-dollar paper-clips).

Machine learning algorithms, on the other hand, require a sizable corpus of data to show successes and failures. Unless you’re willing to offer some cohort of customers an array of random prices for a product in order to determine its optimal price, machine learning for pricing can present some challenges.

One technique that’s worked well for companies with a two-sided market, like Uber, is to have a market-based bid-and-ask system. Uber, for example, slowly raises the end user’s “bid” for a ride until some driver chooses to respond. This can be really useful, but if your two-sided market isn’t real-time (Etsy providers can only knit so fast, after all), then bid-and-ask won’t be very useful.

In this post, I go over a simple technique for adaptive pricing using fuzzy logic.

Why fuzzy logic?

Fuzzy logic is a machine intelligence technique based on multivariate truth values. A fuzzy logic agent works by mapping exact or “crisp” values into data structures representing membership in a so-called “fuzzy set”. So a user who’s been to your site 2 times might not be exactly a new user, but they’re kind of a new user and kind of an experienced user. Fuzzy logic assigns a value to their membership in each set — say, they’re 75% new, and 40% experienced.

Fuzzy logic then allows you to define rules to reason about these sets — something like, “IF the user is new, THEN the discount will be high.” If the user is only partially a member of the set, the output is only partially a member of its set. You can define multiple rules, and it’s OK if they’re potentially conflicting.

A fuzzy agent then converts these fuzzy output membership measurements to a “crisp” single value which can be used for further processing in a procedural language.

If you have base fuzzy logic software, or you use an API like, the steps for developing a new fuzzy logic agent are pretty simple:

  • Define your output variables and their fuzzy sets.
  • Define your input variables and their fuzzy sets.
  • Define the rules that link your inputs to your outputs. Usually, this is a one-to-one match of one input fuzzy set to one output fuzzy set.

Fuzzy logic is a good fit for adaptive pricing for a number of reasons:

  • Like procedural methods, it uses explicit rule definition by the programmer to direct its calculations.
  • It doesn’t require much data to get started. In fact, it depends on your own experience with your customers.
  • It has smoothly varying output as you vary an input. Procedural code tends to have discontinuities as you hit different thresholds (“if (userAge >= 2) { … }”).
  • It handles contradictory inputs well. You can have one factor that suggests a high price, and another that suggests a low price, and a fuzzy logic agent will synthesize the two and come out with a reasonable middle ground.
  • It handles new inputs well. You can get started on an adaptive pricing algorithm with only a few inputs, and then add new inputs as they come up.
  • It handles missing data well. If you don’t know the number of times the user has been on the site for some reason, you can do a calculation with the other factors you do know, and you’ll get a reasonable output.
  • It’s relatively easy to audit. Figuring out why a discount was higher or lower than you intuitively expected can be done just by looking at the fuzzified versions of the inputs and outputs.

There are some restrictions to using fuzzy logic, too. The algorithm requires numerical inputs, so if you have input data like “user is in North Dakota”, you need to reframe it to something like “User is in a state with 0.1% market penetration”. Usually it’s relatively easy to do.

Defining output variables

For adaptive pricing, there are a couple of different ways to define an output. One is to define the price that will be offered to the customer. That’s reasonable, but if you have multiple products available in your store, with widely varying prices, it will mean you need to have a different fuzzy agent for each product. That can be hard to develop and maintain.

Instead, I recommend modelling your adaptive pricing as an adaptive discount. Here, your output is a discount in percent values. Full price would be a discount of 0%, and a free product would be a discount of 100%. The advantage here is that you can use the same fuzzy agent for all your products. You also avoid having out-of-bounds pricing that shocks customers (compare Uber’s problems with unexpectedly high “surge pricing”). You can also set a maximum value of the discount to match your profit margin (assuming your margin is about the same on all your products!). You may be willing to sell a product below margin to activate a new customer, but if not, you can use the discount’s maximum value to prevent losing money on your adaptive pricing algorithm.

discount-fuzzy-setsIn this diagram, I’ve defined a discount output variable and its five corresponding fuzzy sets: “veryLow”, “low”, “moderate”, “high”, and “veryHigh”. This is a pretty common pattern in fuzzy engineering: input and output variables usually come with 3, 5, 7, or occasionally 9 outputs.

The diagram shows each fuzzy set, with the variable value in the x axis and the percentage of membership in the y axis. In this example, a discount of 20% is “moderate”, and a discount of 5% has a 50% membership in the “veryLow” set and a 50% membership in the “low” fuzzy set.

I chose 45% as the maximum discount, and then mapped out the other fuzzy sets from there. All of these numbers are based on more or less intuitive reasoning on my part. In a real-world situation, you can do customer surveys or talk to people in your company to get their ideas of what a “moderate” discount is. There are also mechanisms for varying these parameters based on experience (see below).

One thing to note is that at each output value, the membership in all the fuzzy sets adds up to about 100%. This isn’t strictly required, but it tends to give smoother-varying outputs.

It’s worth noting that this model is going to give a non-zero discount for almost any input values. You may want to set your base prices slightly higher to accommodate this.

Choosing inputs

There are a lot of different input variables you can use for an adaptive pricing algorithm. I’ve chosen a few here to give some ideas.

product-popularityOne important input is product popularity. How well is the current product selling? If it’s not selling very well, it makes sense to offer steeper discounts. If it’s selling very well, little or no discount makes sense. To model popularity, I’ve used the number of sales per week of the product. (I use sales per week to smooth over discrepancies between weekend and weekday Web traffic.) The above fuzzy sets show typical values for a medium-sized e-commerce site; it’s relatively easy to expand or contract them. As your e-commerce site grows, varying these input parameters makes a lot of sense.

It’s probably also worth noting that calculating the current number of sales per week of a product at runtime is too compute-intensive. This kind of input data should really be pre-calculated, probably on a weekly basis.

Another thing to note is that the number of fuzzy sets in this input is 5 — exactly the same number of fuzzy sets in the discount output variable. This is a good rule-of-thumb, since it makes mapping input variables to output variables in your rules much easier.

A similar input would be product category popularity. There may be value in incenting users to purchase products in underperforming categories, even if the product itself is doing well.

category-popularityAgain, popularity is measured in sales per week. The shape of the category popularity fuzzy sets is roughly the same as the product popularity, but the numbers are somewhat higher — representing a typical ecommerce site with a few products per category, with a few high performers and mostly average performers.

Another input would be store-wide performance. How have numbers been lately? If they’ve been low, it might make sense to juice the numbers with some bigger discounts.

site-performanceHere, I’ve chosen another 5 fuzzy sets, denominated by annual revenue in US dollars. Obviously, what’s “ok” for a mid-sized ecommerce site is “terrible” for Apple or Amazon, so these numbers need to be tuned for the company running the site. Especially in periods of high growth, what’s “good” in February may be “bad” by April.

For other site-wide inputs, adapting the price by time of day also makes sense. Depending on their market, most ecommerce sites drop off sales dramatically at nights and on weekends. Giving users an incentive here can boost those numbers.

sales-per-hour-of-the-weekNote that providing the input on this item will require a) figuring out what hour of the week (like 3-4PM on Tuesday) it is and b) looking up the sales for this hour of the week in a database. If sales per hour varies more than 1-2 orders of magnitude, it may make sense to use a logarithmic scale here.

Finally, what about varying the discount based on aspects of the user? One important user metric is recency — how long the user has been signed up. You may want to get new users activated, so providing a discount makes sense.

user-recencyHere, we’re measuring user recency in terms of hours since first acquisition (usually either arriving on a landing page or sign up). The concentration is on very recent users (less than 5 days). Anything larger will get a veryLongTerm membership (and thus, a veryLow discount).

user-salesUser sales are linked to, but not proportional to, user recency. Long-time users can fail to activate for a long time. In this case, I’ve broken down the user sales into “unactivated”, “activated”, “frequent”, “veryFrequent” and “whale”. Note that number of sales per user is an integer, so a lot of these values are impossible. But it’s still worthwhile to map the input onto a smoothly-continuous set.

There are a number of other user characteristics you can key prices to. One that’s especially helpful is market penetration. This is a measure of your company’s penetration of a given geographical market, measured in numbers per million. It requires, again, two important lookups: first, identifying the user’s geographical location, either based on information they provided you, or on their IP address, or on data from the browser. That will give a latitude/longitude pair, which in turn has to be mapped into a “market” as you define it — a city, county, US state or Canadian province, or country. You’ll then need some information on your previous sales to that market.

market-penetrationHere, I’ve only defined three fuzzy sets — “low”, “medium”, and “high”. Most of the heavy lifting on market penetration happens well before the fuzzy agent gets the input.


The last step in fuzzy agent design is defining rules to link inputs to outputs. Because we’ve mostly used inputs with the same number of fuzzy sets as the output, it’s relatively easy to define our rules.

For some inputs, like productPopularity, we map the output sets inversely to the input sets. So, if productPopularity is low, the discount is high, and if productPopularity is veryHigh, the discount is veryLow.

For others, like userSales, there’s more of a bow shape. Users who have not activated often get high discounts, and top customers also get high discounts. The ones in the middle get low or moderate discounts.


Next steps

If you’re interested in trying out this example, it’s possible to sign up at and set up a new fuzzy agent with a Web API.

The next steps, after designing a fuzzy agent, are integrating it into the store. This can happen either through a plugin, or with custom code. has a number of Open Source libraries for connecting to the service, for example.

Another important step is varying the fuzzy set boundaries and inputs based on performance. This process, called fuzzy learning, requires a feedback mechanism, showing the performance of each discount value. Performance, here, would be profit on the sale, if any. This feedback can be used to tune the inputs.