How to create a production-ready Recommender System

You might have seen e-commerce websites every day. Or read many articles from lots of blogs, news, and Medium publications.

From your perspective as a user or reader, what is the common pain point when looking at all of those things?

One simple answer:

There are a lot of things available to see, and you often lost when trying to discover something.

Yes, with those huge number of things or articles on those websites, the users need a solution, to simplify their discovery journey.

If you are operating an e-commerce website or a blog, maybe you want to ask. Why bother?

Well, have you heard about the funnel?

The less the funnel of a user when trying to do something, the bigger the conversion. That is a basic rule in user experience. So, if with reducing the number of steps can increase your site page view or even revenue, why not?

How a recommender system can help?

In simple term, a recommender system is a discovery system. The system learns from the data and provides recommendations to users. Without the user specifically searching for that item, that item was brought automatically by the system.

Sounds like magic.

And this magic has been used by Amazon and Netflix since decades ago.

How awesome it is, when you open Spotify and it already gives you a list of songs to listen to (Discover Weekly, and I amazed on how it can pick songs that I have never heard before, and I like it).

In-depth about recommender system

In general term, there are two kinds of recommender system known by us, a human. Well, not all human.

1. Content-based filtering

The type of recommender system that can easily be digested by our brain. Without a sign of short-circuiting or exploding.

For example, you are an avid novel reader. And you like “And then there were none” by Agatha Christie. You bought it from an online bookstore.

It makes sense if the bookstore will show you “The ABC Murders” the next time you open the website.

Why?

Because both of them written by Agatha Christie.

Hence, the Content-based filtering model will recommend you that title.

Wow, so easy! Let’s use that!

Wait…

While Content-based filtering is easily digested by our brain and looks so simple, it can fail to guess the real behavior of the user.

For example, I don’t like Hercule Poirot, but I like other detectives in her novels. In that case, “The ABC Murders” should not be recommended for me.

2. Collaborative filtering

This type will overcome the previous problem. Essentially, the system record all the previous interaction of the user on the website. And provide recommendations based on that.

How does it work?

Take a look at this scenario.

There are two users, A and B.

A bought item 1

A bought item 2

A bought item 3

B bought item 1

B bought item 3

The collaborative filtering will recommend B item 2 since there is another user who bought item 1 and 3 also bought item 2.

You might say, wow, they could be sporadically bought together in coincidence.

But, what if, there are 100 users who have the same behavior with user A?

That was, the so-called, the power of crowds.

So, why waiting? Let’s just start creating Collaborative filtering system in your production environment!

Hold your horse, mate!

While it has an extremely good performance. It has several serious issues. More importantly when you are trying to create a production-ready system.

The downside of Collaborative filtering

  1. It doesn’t know about context. In contrast with Content-based filtering that recommends similar items, Collaborative filtering will not recommend based on similarity. When this is your concern, the solution is going hybrid. Combine both methods.
  2. It needs huge hardware resources since you need to store a user-item matrix. Imagine if you open your e-commerce website and it has 100K users. At the same time, you serve 10K products. In this case, you will need 10K x 100K matrix with each element hold 4 bytes integer. Yep, you need 4GB memory just for storing the matrix. Not even doing other things.
  3. Cold start. A new user will not get any benefit from the system since you have no idea about him.
  4. The unchangeable. If you are not doing anything on the website, the result of the recommender system will stay the same. The user will think that there is nothing new on the website. And they will leave.

While the problem no. 1 is easily solved with going hybrid, the other ones will still be a headache.

Well, solving the number 2, 3, and 4 is the reason for this post.

Let’s just start.

The definite guide on making your recommendation system production-ready

I might be in the same spot as you. I was really confused on how to make this thing possible. With the limitation of the machine, and of course common sense, I can’t deploy a huge service just for this tiny requirement.

for a production-ready system, you might not want the best accuracy of whatever performance it has.

A somewhat inaccurate yet acceptable most often work in the real world use case.

The most interesting part on how you can do that is,

  1. Batch computation on the general recommendation indicator.
  2. Query on real-time, without using the user-item matrix, but take several latest interactions of the user and query it to the system.

Let me explain while we build the system.

Recommender system with Python

Why python? Well, python is one of the easiest languages to learn. It will take you just a couple of hours to understand the syntax.

for item in the_bag:
print(item)

And you can print all the item in the bag.

That easy.

Go to Python website to download and install it according to your Operating System.

For this tutorial, you need several packages

pip install numpy
pip install scipy
pip install pandas
pip install jupyter
pip install requests

Numpy and Scipy are python package to handle mathematical computation, you will need them for the matrix. Pandas is used for your data. Requests is for http calls. and Jupyter is a web app to run your python code interactively.

type jupyter notebook and you will see something like this

Jupyter Notebook

Write the code on the cells provided, and the code will be run interactively.

Before we begin, you need several tools.

  1. Elasticsearch. it is an open-source search engine, that can enable you to search your document really fast. You will need this tool to save your computed indicator so that you can query in real-time.
  2. Postman. an API development tool. You will need this to simulate the query into elasticsearch. As elasticsearch can be accessed via http.

Download and install both of them, and you are ready to go.

The data

For this tutorial, let’s take a look at a dataset in Kaggle. The Retailrocket recommender system dataset. Download it and extract the data in your Jupyter notebook directory.

It should be like that.

Among those files, you only need events.csv for this tutorial.

That file consists of millions of actions by users to the items on the e-commerce website.

Let’s explore the data!

import pandas as pd
import numpy as np

Write those imports on the Jupyter notebook. And you are ready to go.

df = pd.read_csv('events.csv')
df.shape

It will print you (2756101, 5). It means you have 2.7M rows with 5 columns.

Let’s check it out.

df.head()

It has five columns.

  1. Timestamp, the timestamp of the event.
  2. Visitorid, the id of the user
  3. Itemid, the id of the item
  4. Event, the event
  5. Transactionid, an id of the transaction if the event is a transaction

Let’s check, what kind of events are available

df.event.unique()

You will get three events, viewaddtocart, and transaction

For the sake of simplicity, you might not want to play with all of the events. And for this tutorial, you will only play with transactions.

So, let’s filter the transactions only.

trans = df[df['event'] == 'transaction']
trans.shape

It will return (22457, 5)

You will have 22K transactions you can play with. I think it is good enough for a newbie like us.

Let’s take a look further into the data

visitors = trans['visitorid'].unique()
items = trans['itemid'].unique()print(visitors.shape)
print(items.shape)

You will get 11,719 unique visitors and 12,025 unique items.

The rule of thumb on creating a simple yet effective recommender system is to downsample the data without losing quality. It means, you can take only maybe 50 latest transactions for each user and you still get the quality you want because behavior changes over-time.

trans2 = trans.groupby(['visitorid']).head(50)
trans2.shape

Now you only have 19,939 transactions. Means around 2K transactions are obsolete.

Because of the visitor id and item id are huge numbers, you will be hard to remember each of those ids.

trans2['visitors'] = trans2['visitorid'].apply(lambda x : np.argwhere(visitors == x)[0][0])
trans2['items'] = trans2['itemid'].apply(lambda x : np.argwhere(items == x)[0][0])trans2

You need other columns that are a 0 based index. You will see something like this.

It’s cleaner. Now you can use only the visitors and items column for all of our next steps.

The next step: Create the user-item matrix

The nightmare is coming…

You have 11,719 unique visitors and 12,025 items, so you will need around 500MB memory to store the matrix.

Sparse matrix comes to the rescue.

Sparse matrices are a matrix with most of their element are zero. It makes sense since not all of the users buy all the items. Lot’s of the connection will be zero.

from scipy.sparse import csr_matrix

Scipy has the thing.

occurences = csr_matrix((visitors.shape[0], items.shape[0]), dtype='int8')def set_occurences(visitor, item):
occurences[visitor, item] += 1trans2.apply(lambda row: set_occurences(row['visitors'], row['items']), axis=1)occurences

Apply the set_occurences function on each row in the data you have.

It will print something like this

<11719x12025 sparse matrix of type '<class 'numpy.int8'>'
with 18905 stored elements in Compressed Sparse Row format>

From those 140 million cells in the matrix, only 18,905 are filled with non-zero.

So basically you only need to store those 18,905 value to the memory. A 99.99% improved efficiency.

The downside of the sparse matrix is, it is computationally higher when trying to retrieve the data in real-time. So, you should not finish at this step.

Co-occurrence is a better occurrence

Let’s construct an item-item matrix where each element means how many times both items bought together by a user. Call it the co-occurrence matrix.

To create a co-occurrence matrix, you need to dot product the transpose of the occurrence matrix with itself.

I have tried it without sparse matrix and my computer suddenly stops working. So, let’s not do that.

cooc = occurences.transpose().dot(occurences)
cooc.setdiag(0)

It finished instantly with a sparse matrix. And I am happy.

The setdiag function is setting the diagonal to 0, means you don’t want to compute the value of item 1 and item 1 comes together since they are the same item.

Anomalous behavior is better

The co-occurrence matrix will consist of the number of the time both items bought together.

But there is a chance, that there is an item. That item is bought regardless of the behavior of the user. Might be a flash sale, or something like that.

In reality, you might want to really capture the behavior of the user, clean from something like that flash sale. Because it is not a behavior you are expecting.

To remove those things affected, you need to penalize the score on the co-occurrence matrix.

Ted Dunnings in the previous book has an algorithm called, Log-Likelihood Ratio or LLR.

def xLogX(x):
return x * np.log(x) if x != 0 else 0.0def entropy(x1, x2=0, x3=0, x4=0):
return xLogX(x1 + x2 + x3 + x4) - xLogX(x1) - xLogX(x2) - xLogX(x3) - xLogX(x4)def LLR(k11, k12, k21, k22):
rowEntropy = entropy(k11 + k12, k21 + k22)
columnEntropy = entropy(k11 + k21, k12 + k22)
matrixEntropy = entropy(k11, k12, k21, k22)
if rowEntropy + columnEntropy < matrixEntropy:
return 0.0
return 2.0 * (rowEntropy + columnEntropy - matrixEntropy)def rootLLR(k11, k12, k21, k22):
llr = LLR(k11, k12, k21, k22)
sqrt = np.sqrt(llr)
if k11 * 1.0 / (k11 + k12) < k21 * 1.0 / (k21 + k22):
sqrt = -sqrt
return sqrt

The LLR function is computing the likelihood of two events, A and B appear together.

The parameters are,

  1. k11, number of when both events appeared together
  2. k12, number of B appear without A
  3. k21, number of A appear without B
  4. k22, number of other things appeared without both of them

Now calculate the LLR function and save it to the pp_score matrix.

row_sum = np.sum(cooc, axis=0).A.flatten()
column_sum = np.sum(cooc, axis=1).A.flatten()
total = np.sum(row_sum, axis=0)pp_score = csr_matrix((cooc.shape[0], cooc.shape[1]), dtype='double')
cx = cooc.tocoo()
for i,j,v in zip(cx.row, cx.col, cx.data):
if v != 0:
k11 = v
k12 = row_sum[i] - k11
k21 = column_sum[j] - k11
k22 = total - k11 - k12 - k21
pp_score[i,j] = rootLLR(k11, k12, k21, k22)

Sort the result, so that the highest LLR score on each item is on the first column of each row.

result = np.flip(np.sort(pp_score.A, axis=1), axis=1)
result_indices = np.flip(np.argsort(pp_score.A, axis=1), axis=1)

Indicators, how to recommend

That first item on the result matrix, if high enough, can be considered as an indicator to the item.

Let’s take a look at one of the result

result[8456]

You will get

array([15.33511076, 14.60017668,  3.62091635, ...,  0.        ,
0. , 0. ])

And looking at the indices

result_indices[8456]

Will get you

array([8682,  380, 8501, ..., 8010, 8009,    0], dtype=int64)

You can safely answer that with a high number of LLR score, item 8682 and 380 can be an indicator for item 8456. While item 8501 since the score is not that big, might not be an indicator for item 8456.

It means that, if someone bought 8682 and 380, you can recommend him 8456.

Easy.

But, for a rule of thumb, you might want to give some limit on the LLR score, so insignificant indicators will be removed.

minLLR = 5
indicators = result[:, :50]
indicators[indicators < minLLR] = 0.0indicators_indices = result_indices[:, :50]max_indicator_indices = (indicators==0).argmax(axis=1)
max = max_indicator_indices.max()indicators = indicators[:, :max+1]
indicators_indices = indicators_indices[:, :max+1]

Now you are ready to put those together to elasticsearch. So you can query the recommendation in real-time.

import requests
import json

Okay, now you are ready to put the things inside elasticsearch you have prepared before.

But, be careful. If you are trying to add the data one by one using /_create/<id> API, it will take you forever. Of course, you can, but you need maybe half an hour to an hour just to move our 12,025 items into elasticsearch.

I did it once, so please, don’t repeat my mistake.

So what’s the solution?

Bulk update

Fortunately, elasticsearch has bulk API that can easily send multiple documents at once.

So, create a new index (items2, I used items for the previous mistake) and let’s try it

actions = []
for i in range(indicators.shape[0]):
length = indicators[i].nonzero()[0].shape[0]
real_indicators = items[indicators_indices[i, :length]].astype("int").tolist()
id = items[i]

action = { "index" : { "_index" : "items2", "_id" : str(id) } }

data = {
"id": int(id),
"indicators": real_indicators
}

actions.append(json.dumps(action))
actions.append(json.dumps(data))

if len(actions) == 200:
actions_string = "\n".join(actions) + "\n"
actions = []

url = "http://127.0.0.1:9200/_bulk/"
headers = {
"Content-Type" : "application/x-ndjson"
}
requests.post(url, headers=headers, data=actions_string)if len(actions) > 0:
actions_string = "\n".join(actions) + "\n"
actions = [] url = "http://127.0.0.1:9200/_bulk/"
headers = {
"Content-Type" : "application/x-ndjson"
}
requests.post(url, headers=headers, data=actions_string)

And voila, it will finish within several seconds.

Hit this API in Postman

127.0.0.1:9200/items2/_count

You will have your data stored already

{
"count": 12025,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
}
}

Let’s check to the data of your item with /items2/240708

{
"id": 240708,
"indicators": [
305675,
346067,
312728
]
}

Id is the id of the item. While indicators are the other items which become indicators to recommend this item.

Real-time query

The best part of the thing we create is a real-time query,

{
"query": {
"bool": {
"should": [
{ "terms": {"indicators" : [240708], "boost": 2}}
]
}
}
}

Post the request to 127.0.0.1:9200/items2/_search

And you will get three results. 312728, 305675, and 346067. Exactly the three items that were bought together with item 240708.

Great! So, the problem of needing a huge resource is a non-factor now. So, how about the other two problems?

Before that, rest your eye for a while.

The cold start problem: I don’t know you

The very common problem when building recommender systems is the cold start problem. Every new user will not have any of their behavior recorded in the system.

So, what should the system recommend them?

Let’s take a look at our recently built recommendation system. Do you think anything strange about the result?

Yep, the result only returns 3 recommended items. Just 3. How do you plan to display it to the customer?

Let’s display the other item not recommended at the end of the list. Just for the sake of good user experience.

{
"query": {
"bool": {
"should": [
{ "terms": {"indicators" : [240708]}},
{ "constant_score": {"filter" : {"match_all": {}}, "boost" : 0.000001}}
]
}
}
}

You can use a constant score to return all other items.

But, for all the non-recommended items, you need to rank them so that the things that users probably will like even though not captured in their behavior.

In many cases, popular item works really well.

How do you calculate a popular item?

popular = np.zeros(items.shape[0])def inc_popular(index):
popular[index] += 1trans2.apply(lambda row: inc_popular(row['items']), axis=1)

Simple, count the item appearance one by one. So the highest popular value is the most popular.

Let’s create another index, called items3. And bulk insert

actions = []
for i in range(indicators.shape[0]):
length = indicators[i].nonzero()[0].shape[0]
real_indicators = items[indicators_indices[i, :length]].astype("int").tolist()
id = items[i]

action = { "index" : { "_index" : "items3", "_id" : str(id) } }

# url = "http://127.0.0.1:9200/items/_create/" + str(id)
data = {
"id": int(id),
"indicators": real_indicators,
"popular": popular[i]
}

actions.append(json.dumps(action))
actions.append(json.dumps(data))

if len(actions) == 200:
actions_string = "\n".join(actions) + "\n"
actions = []

url = "http://127.0.0.1:9200/_bulk/"
headers = {
"Content-Type" : "application/x-ndjson"
}
requests.post(url, headers=headers, data=actions_string)if len(actions) > 0:
actions_string = "\n".join(actions) + "\n"
actions = []url = "http://127.0.0.1:9200/_bulk/"
headers = {
"Content-Type" : "application/x-ndjson"
}
requests.post(url, headers=headers, data=actions_string)

In this indexing phase, you include the popular field. So, your data will look like this

{
"id": 240708,
"indicators": [
305675,
346067,
312728
],
"popular": 3.0
}

You will have three fields. Id and indicators like the previous one, and the popular field. The count of that item bought by users.

Let’s add popular to our previous query.

Function score, the way to combine scores

So, you have multiple sources of scores now, i.e. the indicator matches and popular, how to combine the scores?

Elasticsearch has function score to do with that.

{
"query": {
"function_score":{
"query": {
"bool": {
"should": [
{ "terms": {"indicators" : [240708], "boost": 2}},
{ "constant_score": {"filter" : {"match_all": {}}, "boost" : 0.000001}}
]
}
},
"functions":[
{
"filter": {"range": {"popular": {"gt": 0}}},
"script_score" : {
"script" : {
"source": "doc['popular'].value * 0.1"
}
}
}
],
"score_mode": "sum",
"min_score" : 0
}
}
}

Rework your query and add a function score to slightly add 0.1 times the popular value to the constant score you have above. You don’t have to stick with 0.1, you can use other function, even Natural logarithm. Like this,

Math.log(doc['popular'].value)

Now, you will see your most popular item, 461686 placed fourth, just below the recommended items.

And also other popular items below.

The unchangeable, static recommendation

As you can see, our result stays the same every time we run the real-time query. That might be good because our technique is reproducible, but at the same time, the user might not be happy about it.

Ted Dunnings from the book said, the click-through rate of the recommendation, will fall really low after the 20th result. It means any item we recommend after that will not be known to the user.

How to solve this?

There is a technique called dithering. It is creating a random noise when querying to bring up the least recommended item but still make the strongly recommended item at the top.

{
"query": {
"function_score":{
"query": {
"bool": {
"should": [
{ "terms": {"indicators" : [240708], "boost": 2}},
{ "constant_score": {"filter" : {"match_all": {}}, "boost" : 0.000001}}
]
}
},
"functions":[
{
"filter": {"range": {"popular": {"gt": 1}}},
"script_score" : {
"script" : {
"source": "0.1 * Math.log(doc['popular'].value)"
}
}
},
{
"filter": {"match_all": {}},
"random_score": {}
}
],
"score_mode": "sum",
"min_score" : 0
}
}
}

The random score, it will give all of your item uniformly distributed random noise. The score will be minuscule so that the top recommendation will not go down.

Hit that query, and see the result. You can see

The positive note is. your user will not have to go scrolling to the second or third page. He or she just needs to hit that refresh button on the browser, and he or she will be provided new contents.

Just like magic.

Conclusion

Building a production-ready recommender system is not that hard. And current technology allows us to do that.

Create the system with your data and get ready to deploy it to production.

Leave a Reply

Your email address will not be published. Required fields are marked *