Swiss “Bank” secrets

The last 10 months I have been involved in a very nice project, which tries to show to the world the most interesting benches in Switzerland. Many languages use the same word for “bench” and for “bank” and the website title is a funny pun. Not in English, though.

Users of the web application (www.bankgeheimnisse.ch) can place new benches in the map, add comments and pictures to existing ones, say which ones they have visited and add some metadata (accessible by wheelchair, paper bins, etc.).

I developed the site from scratch and, although there is still a lot of work to do, we are getting some media attention (easily recognizable in the charts shown below).

I didn’t have time until now to quantify the “success”. Finally I have. And I wanted to get started with some Jupyter notebooks and Pandas. This is it.

Let’s start by importing all necessary libraries. The first line makes the charts appear in the notebook automatically when the plot function of a Pandas Dataframe is invoked.

%matplotlib notebook
import psycopg2 as pg
import pandas.io.sql as psql
import pandas as pd
import numpy as np
import os

I am keeping a trace of the http requests we receive in a table called app.trace on the database. Let’s connect to it:

connection = pg.connect(f"dbname={os.environ['DBNAME']} user={os.environ['DBUSER']} password={os.environ['DBPASSWORD']} host={os.environ['DBHOST']} port={os.environ['DBPORT']}")

You can start jupyter with this command in order to set the environment variables:

env DBHOST=xxx DBPORT=5432 DBNAME=xxx DBUSER=xxx DBPASSWORD=xxx jupyter notebook

Page views per day

Let’s see how many page views we get per day since we started. The app.trace table has 4M rows so it took a while to bring all the data into Pandas:

trace = psql.read_sql_query("SELECT * FROM app.trace", connection)
page_views_per_day = trace.groupby(pd.Grouper(key="instant", freq='D'))["id"].count()
page_views_per_day.plot()
errors = trace.loc[trace["headers"].astype("int32") >= 500]
errors_per_day = errors.groupby(pd.Grouper(key="instant", freq='D'))["id"].count()
errors_per_day.plot()

The previous chart shows the number of queries along with the number of those that returned an error (HTTP code 500). We had some memory problems and Amazon has been restarting the application regularly whenever there was some load. However, the percentage of errors is really small, to the point where they show as a straight line at zero. We did however have errors. We can see them if we show them in their own chart:

errors_per_day.plot()

Benches added per week

These were visits to the site. Let’s see how much of this attention resulted in people getting involved and actually adding benches.

bench_dates = psql.read_sql_query("SELECT id, creation FROM app.bench", connection)

bench_per_week = bench_dates.groupby(pd.Grouper(key="creation", freq='W')).count()

bench_per_week.plot()

Last two months were quite a change. We got around 2000 new benches all of a sudden. We will see how much of this is due to holidays. Theoretically Autumn is also high season for people walking and finding nice benches.

Number of users in time

Well, those benches were added by new users. In the last two months we tripled the number of users.

users = psql.read_sql_query("SELECT id, creation FROM app.user", connection)

users_in_time = users.groupby(pd.Grouper(key="creation", freq='D')).count().transform(np.cumsum)

users_in_time.plot()

That’s it. I am happy some people is using the result of my work.

You can go and take a look at the benches, some pictures are very nice!