About This Website

This is a personal project created by Niall Power

It houses some tools and services I've built just for my own use (internal visibility only), but I also built a database, backend and frontend for querying Irish transport information. I plan to continue upgrading and adding more features over time.

I am primarily focused on the backend over the frontend but I've used this as a training ground to brush up on a bit of JS and CSS. On this page I have documented a few brief snippets on the different technology I've used to create this.


I have incorporated:

Frontend (HTML, CSS, JS, HTMX)

Backend (Python, SQL)

Logging integration

Relational database management and migrations

Server side templating

Nginx hosting

Domain management

Email services

Account management and various external APIs.

The Transport Data

My database consists of 2 primary components that store all the transport related data.

The Defined Schedules

Transport for Ireland uses the open General Transit Feed Specification (GTFS) standard developed by Google, which contains information on bus stops, bus schedules and routes.

I have compiled all this data (which is supplied in massive txt files) into a relational database that my application can query on demand.

I did this by creating my database structure built off class based models in python, I then read all the txt files using an importing script I wrote to map all of the rows in the csv files to class objects. I had to establish many supporting indexes for creating some additional joins outside of the usual primary keys. Some of the files have millions of lines so I wrote the SQL directly to import those for speed. This comes to over 5-10 million rows of data which I can fully import from scratch in minutes when there is an update. This update is triggered automatically using a scheduled cron job and a bash script each night which keep my database updated.

The RealTime Updates

RealTime data is queried every minute by my server which matches the accuracy available via the api. You cannot specify what data to return, it just returns everything for a total system update. There are limited fields available but its enough for the core requirements for realtime.

To prevent my api quota being used by my application I decided to create additional database tables for the realtime information. Every 60 seconds the data in the DB is updated using another importer job I created. It transforms the large json api response into a relation table of realtime updates. My application queries these tables and performs some transformations and calculations on the data to return a precise RealTime update to the closest minute.

I am forced to calculate many things manually due to the structure of data returned, in some areas its better than what the dublin bus app / TFI app offer but they have access to data that I do not and they can query it as quickly as they like.

The Extra Stop Features

I've imported the PTIMS database to complement my stop information. PTIMS is the government's tool for managing its stop network and contains a lot of information about the facilities available at every stop. Using the common AtcoCode present in both datasets, I was able to link PTIMS to the GTFS database.

I import and relate these 4 datasets in PTIMS into a single object in my database per bus stop which can be joined to the GTFS objects. This is particularly useful for information regarding accessibility, the presence of a bus shelter, and the presence of RealTime displays.

The Website

Here is quick snapshot of some of the technologies powering the website component.

Flask

The framework I am using is Flask and this happens to be the first application I've built with it. Previously, I had worked with Django, but I wanted to experiment with something new.

Thankfully, there were plenty of tutorials available online, which made the process much more enjoyable. Whenever I encountered an issue, I found that someone out there had already come up with a solution. It allows me to write the majority of code for this site using python which I am the most comfortable with.

Speed from idea to deployment is nice and agile with broad feature support to incorporate pretty much anything I could ask for in a python web framework. High levels of concurrency were not a major concern so I opted not to use FastAPI or Starlite.

Jinja

Jinja templating language is what allows me to generate html using embedded logic like if statements and for loops. This is used for programatically constructing elements like tables, deciding if elements are hidden or shown, passing variables etc. I had no prior knowledge of the possible languages to use for this, it just came up first and seemed well supported and well suited for my needs.

Javascript

I have used javascript for some browser functions, calling endpoints, sorting tables and loading some external elements. I kept the JS I needed to use as minimal as reasonable.

I would much rather keep as much code in one place (Python) as possible to improve my own maintainability. This means almost only server side rendered HTML. I opted not to use any opinionated js frameworks as I don't want to spend my time learing and maintaining a framework like react for what is fairly simple UI.

HTMX

HTMX caught my eye as it enables Javascript functionality without writing any Javascript. It allows you to add dynamic functionality to your site such as reloading sections of a page upon user interaction. I've used it for some of the more dynamic elements on the site such as the search functionality for stops and routes LINK.

I have found HTMX incredible for making powerful interactive elements with minimal effort and totally bypassing any complicated JS for any of the common web patterns. I will try to use HTMX only when adding new features and only reach for javascript when I have to.

Python

Python is the entire backend using the Flask framework. I've written all functions in python for serving requests and pages, processing data and interacting with the DB. I'm using SQLAlchemy to interface between python and my DB which has made writing the many queries much more managable.

I have good knowledge in python from beforehand so this made the most sense for me. I started initially looking at python web frameworks and this is what led me to arrive at flask. I wanted to make the development process of the more complicated functions as easy as possible for myself, as this is a passion project and I didnt want to fight syntax in a new language I didn't know.

Performance concerns were not relevant for this use case I felt. If anything, the structure of my code would be the bottleneck long before the language was. I'm using caching on many of my queries where the output is not expected to change, this greatly improves performance. After the initial request to populate the cache basically all site latency is python which is measured in dozens of ms at worst. (cloudflare proxy adds about 80ms).

I love the community support around python, there are a million examples and tutorials for just about everything which suits me as I always want to learn more python.

Pytest - Automated Testing

I have written a small suite of automated tests using Pytest. I wanted to incorperate this for some core areas of the code just to make it easier for myself when refactoring or adding new features.

There are two ways I use these tests.
1. Run them on my CLI while developing. This gives me instant feedback while I'm developing to ensure my changes are not breaking.
2. Automatically using github actions. Upon every commit my test suite is triggered and runs through the tests automatically.

For testing I have an alternate application config, it creates and can populate a totally seperate database (sqlite) and use seperate environement variables for the tests. This allows me to set consistent and repeatable behaviour using specific data/users and have certain features disabled for testing.

SqLite3 - Database

I'm using SQLite3 to run my database, given my usecase it ticks all the boxes and had plenty of existing documentation and guides written in relation to flask which made the setup process basically seamless. I interact with the DB using SQLAlchemy. This is using a combination of Object Relation Mapping (SQLAlchemy) and just raw SQL when I need more speed.

I found it to have excellent latency and performance for even my heaviest queries. I'm savvy enough at SQL and the RDM that anytime I hit an issue it doesnt take long to identify the problem and come up with a solution. Its very manageable starting up additional instances of my databases for testing or development with Sqllite which is great, I don't need to worry about hosting cores on the cloud and can always hop into the cli to fix any schema issues.

Alembic - Database Migrations

I'm using flask-migrate (simply a wrapper over alembic) to manage my migrations. This was a learning area for me as I hadn't properly had to do too much of this before but I'm very glad I spent some time on it.

Using alembic I'm able to make adjustments to my models in python such as adding a new column/table. I create the migration script which automatically constructs the necessary SQL alters, I review the generated script and then its ready to apply. The migration script is added to my repo in a commit and I can propegate that over to my server with little hassle.

The Server Tech

Linux

My server is running Ubuntu Linux which I'm quite familiar with. I'm locally hosting my own hardware. I have experience hosting other services locally such as media servers, my own DNS server, domain level ad blocking (pi-hole) etc so this made sense to host locally as well. I have no running costs associated with this website outside of the tiny amount of electricity and my own time.

I did also fully deploy this application to the Google App Engine (cloud) which I had also some experience with in the past. It all ran fine and fit into the requirements for the free tier of hosting. The primary issue was Google doesnt allow you to write to the disk on the hosted vms and that is necessary for some parts of my app. I couldnt find a free tier of Database hosting that would allow the high amount of bandwidth I need in a free tier (importing data mostly), but I deployed it for the experience anyway. (no longer available for demo due to cost)

GoAccess - HTTP Logging

I'm using Goaccess for HTTP logging though my Nginx server which is an open source log analyser that I'm running via the command line and with cronjobs. My web server Nginx outputs web traffic logs into files on my disk.

On a schedule I have cronjobs running a GoAccess processes on sections of these logs to generate html report files for me. (1/3/7/30 day etc). This allows me to view traffic to particular endpoints, static files, request counts, bandwidth etc broken down by time periods. The jobs generate html files on my disk so I can open them in a web browser.

BetterStack - Application Logging

For application based python logs I'm using BetterStack which is a sleek log management platform. This was my first time using it but I'm very familiar with log management already so there was no learning curve with Logtail. I have configured seperate log environments for development, testing and production.

I log custom attributes with my log messages. For example I can filter where User.ID = X to see all logs containing that matching attribute.
Setting this up in the python code was very straightforward and allows me to set custom attributes depending on the context.

In Logtail I then have custom views for saving particular filters and can create alerts based on certain conditions being filled. This gives me great visibility over actions being taken in my application. Their free tier is plenty for my needs at the moment and I've really enjoyed using it.

Cloudflare - Domain Management

I'm routing traffic to my domain through cloudflare which then routes to my actual web server. This seemed like an obvious choice as the free tier included lots of features I wanted and it was straightforward. set-up.

Cloudflare handles bot management and attacks on the domain (there are hundreds per day) and operates a caching layer in front of my site. The cache is not heavily used as I have few static files but this may be useful in future.

This additional layer in front of my web server also allows me to hide any sensitive details on my webserver as all traffic goes to and comes from cloudflare. The only thing an attacker can see is the cloudflare servers. I am also able able to Geo-Block countries in cloudflare which helps cut down on bot traffic.

The Maps

I have embedded two different verions of mapping data. Folium is probably the only one enabled at the moment.

Folium - Map Generation Library

I'm using Folium which is an mapping library for generating map data in various formats. My server does all the rendering of the html map directly into the page so its very light for clients to run.

This is totally free to serve which is ideal for my use case. So far I'm placing markers/clusters (stop and vehicle locations), drawing polylines (for bus routes) and creating layer control for disabling certain overlays but there's much more I can add to in future.

Plotly - Graphing Library

I'm using Plotly which is an graphing and mapping library for visualising data. Information isn't that useful unless you can explore and understand it. Plotly makes it easy for me to generate graphs of any type to represent any of the data sets I have.

I wanted to be able to graph some real data in order to learn more about using these tools so this seemed like the perfect opportunity. TODO: Generate graphing for trends over time, per operator, delays and ghost trips.

Geoapify - Geolocating / Address Autocompletion

I'm using the Geoapify API for performing auto-complete of addresses and geolocation on the nearby stops page

When you start typing an address, that string is queried and you are returned a list of matching locations. You can select and submit that location which will be used to get the Latitude and Longitude of the address. I have all of the locations of the stops saved in the database so I'm then able to return a map of the closest ones within a radius using the Haservine formula to compare the distances.

All stops with a location inside the radius are compiled into a folium map and returned as the final result. The Google equivilant service is paid with no free tier, while Geoapify has a very generous free tier of 3K requests a day which made it ideal for my usecase.