About This Website
This is a personal project created by Niall Power
It houses some tools and services I've built just for my own use (internal visibility only), but I
also
built a
database, backend and frontend for querying Irish transport information. I plan to continue
upgrading and adding more features over time.
I am primarily
focused on the backend over the frontend but I've used this as a training ground to brush up on a bit of JS
and CSS. On this page I have documented a few brief snippets on the different technology I've used to create
this.
I have incorporated:
Frontend (HTML, CSS, JS, HTMX)
Backend (Python, SQL)
Logging integration
Relational database management and migrations
Server side templating
Nginx hosting
Domain management
Email services
Account management and various external APIs.
The Transport Data
My database consists of 2 primary components that store all the transport related data.
The Defined Schedules
Transport for Ireland uses the open General Transit Feed Specification (GTFS)
standard developed by Google, which contains information on bus stops, bus schedules and routes.
I have compiled all this data (which is supplied in massive txt files) into a relational database that my
application can query on demand.
I did this by creating my database structure built off class based models in python, I then read all the txt
files using an importing script I wrote to map all of the rows in the csv files to class objects. I had to establish many supporting indexes
for creating some additional joins outside of the usual primary keys.
Some of the files have millions of lines so I wrote the SQL directly to import those for speed. This
comes to over 5-10 million rows of data which I can fully
import from scratch in minutes when there is an update. This update is triggered automatically using a
scheduled cron job and a bash script each night which keep my database updated.
The RealTime Updates
RealTime data is queried every
minute by my server which matches the accuracy available via the api. You cannot specify what data to
return, it just returns everything for a total system update. There are limited fields available but its
enough for the core requirements for realtime.
To prevent my api quota being used by my application I decided to create additional database tables for the
realtime information. Every 60 seconds the data in the DB is updated using another importer
job I created.
It transforms the large json api response into a relation table of realtime updates. My application queries
these tables and performs some transformations and calculations on the data to return a precise RealTime
update to the closest minute.
I am forced to calculate many things manually due to the structure of data returned, in some areas its
better than what the dublin bus app / TFI app offer but they have access to data that I do not and they can
query it as quickly as they like.
The Extra Stop Features
I've imported the PTIMS
database to complement my stop information. PTIMS is the government's tool for managing its stop
network and contains a lot of information about the facilities available at every stop. Using
the common AtcoCode present in both datasets, I was able to link PTIMS to the GTFS database.
I import and relate these 4 datasets in PTIMS into a single object in my database per bus stop which can be
joined to the GTFS objects. This is particularly useful for information regarding accessibility, the presence of a bus
shelter, and the presence of RealTime displays.
The Website
Here is quick snapshot of some of the technologies powering the website component.
Flask
The framework I am using is Flask and this happens
to be the first application I've built with it. Previously, I had worked with Django, but I wanted to experiment with something new.
Thankfully, there were plenty of tutorials available online, which made the process much more
enjoyable. Whenever I encountered an issue, I found that someone out there had already come up with a
solution. It allows me to write the majority of code for this site using python which I am the most comfortable with.
Speed from idea to deployment is nice and agile with broad feature support to incorporate pretty much anything I could ask for in a python
web framework. High levels of concurrency were not a major concern so I opted not to use FastAPI or
Starlite.
Jinja
Jinja templating language is what allows me to generate html using embedded logic like if statements and for loops. This is used for programatically constructing elements like tables, deciding if elements are hidden or shown, passing variables etc. I had no prior knowledge of the possible languages to use for this, it just came up first and seemed well supported and well suited for my needs.
Javascript
I have used javascript for some browser functions, calling endpoints, sorting tables and loading some
external elements. I kept the JS I needed to use as minimal as reasonable.
I would much rather keep as much code in one place (Python) as
possible to improve my own maintainability. This means almost only server side rendered HTML. I opted not to use any
opinionated js frameworks as I don't want to spend my time learing and maintaining a framework like react for what is fairly simple UI.
HTMX
HTMX caught my eye as it enables Javascript functionality without writing any Javascript. It allows you to add dynamic
functionality to your site such as reloading sections of a page upon user interaction. I've used it for some of the more dynamic
elements on the site such as the search functionality for stops and routes LINK.
I have found HTMX incredible for making powerful interactive elements with minimal effort and totally bypassing any complicated JS for any of the common web patterns.
I will try to use HTMX only when adding new features and only reach for javascript when I have to.
Python
Python is the entire backend using the Flask framework. I've written all functions in python for serving
requests and pages, processing data and interacting with the DB. I'm using SQLAlchemy to interface
between python and my DB which has made writing the many queries much more managable.
I have good knowledge in python from beforehand so this made the most sense for me. I started initially
looking at python web frameworks and this is what led me to arrive at flask. I wanted to make the
development process of the more complicated functions as
easy as possible for myself, as this is a passion project and I didnt want to fight syntax in a new language
I didn't know.
Performance concerns were not relevant for this use case I felt. If anything, the
structure of my
code would be the bottleneck long before the language was. I'm using caching on many of my queries
where the
output is not expected to change, this
greatly improves performance. After the initial request to populate the cache basically all
site latency
is python which is measured in dozens of ms at worst. (cloudflare proxy adds about 80ms).
I love the community support around python, there are a million examples and tutorials for just about
everything which suits me as I always want to learn more python.
Pytest - Automated Testing
I have written a small suite of automated tests using Pytest. I wanted to incorperate this for some core
areas of the code just to make it easier for myself when refactoring or adding new features.
There are two ways I use these tests.
1. Run them on my CLI while developing. This gives me instant feedback while I'm developing
to ensure my
changes are not breaking.
2. Automatically using github actions. Upon every commit my test suite is triggered and
runs through the tests automatically.
For testing I have an alternate application config, it creates
and can populate a totally seperate database
(sqlite) and use seperate environement variables for the tests. This allows me to set consistent and
repeatable
behaviour using specific data/users and have certain features disabled for testing.
SqLite3 - Database
I'm using SQLite3 to run my database, given my usecase it ticks all the boxes and had plenty of existing
documentation and guides written in relation to flask which made the setup process basically
seamless. I interact with the DB using SQLAlchemy. This is using a combination of Object Relation Mapping (SQLAlchemy) and just
raw SQL when I need more speed.
I found it to have excellent latency and performance for even my heaviest queries. I'm savvy enough at SQL
and the RDM that anytime I hit an issue it doesnt take long to identify the problem and come up with a
solution. Its very manageable starting up additional instances of my databases for testing or development
with Sqllite which is great, I don't need to worry about hosting cores on the cloud and can always hop into
the cli to fix any schema issues.
Alembic - Database Migrations
I'm using flask-migrate (simply a wrapper over alembic) to manage my migrations. This was a learning area for
me as I hadn't properly had to do too much of this before but I'm very glad I spent some time on it.
Using alembic I'm able to make adjustments to my models in python such as adding a new column/table. I create the
migration script which automatically constructs the necessary SQL alters, I review the generated
script and then its
ready to apply. The migration script is added to my repo in a commit and I can propegate that over to my server
with little hassle.
The Server Tech
Linux
My server is running Ubuntu Linux which I'm quite familiar with. I'm locally hosting my own hardware.
I have experience hosting other services locally such as media servers, my own DNS server, domain level ad
blocking (pi-hole) etc so this made sense to host locally as well. I have no running costs associated with
this
website outside of the tiny amount of electricity and my own time.
I did also fully deploy this application to the Google App Engine (cloud) which I had also
some experience with in the past. It all
ran fine and fit into the requirements for the free tier of hosting. The primary issue was Google doesnt
allow you to write to the disk on the hosted vms and that is necessary for some parts of my app. I couldnt
find a free tier of Database hosting that would allow the high amount of bandwidth I need in a free tier
(importing data mostly), but I
deployed it for the experience anyway. (no longer available for demo due to cost)
GoAccess - HTTP Logging
I'm using Goaccess for HTTP logging though my Nginx server which is an
open source log analyser that I'm running via the
command line and with cronjobs.
My web server Nginx outputs web traffic logs into files on my disk.
On a schedule I have
cronjobs
running
a GoAccess processes on sections of these logs to generate html report files for me. (1/3/7/30 day etc).
This
allows me
to view traffic to particular endpoints, static files, request counts, bandwidth etc broken
down by time
periods.
The jobs generate html files on my disk so I can open them in a web browser.
BetterStack - Application Logging
For application based python logs I'm using BetterStack which is a
sleek log management platform. This was my first time using it but I'm very familiar with log
management already so there was no learning curve with Logtail.
I have configured seperate log environments for development, testing and production.
I log
custom
attributes with
my log messages. For example I can filter where User.ID = X to see all logs containing that matching
attribute. Setting this up in the python code was very straightforward and allows me to set
custom
attributes depending on the context.
In Logtail I then have custom views for saving particular filters and
can create alerts based on certain conditions being filled. This gives me great visibility over actions
being taken in my application. Their free tier is plenty for my needs at
the moment and I've really enjoyed using it.
Cloudflare - Domain Management
I'm routing traffic to my domain through cloudflare which then routes to my actual web
server. This seemed
like an obvious choice as the free tier included lots of features I wanted and it was straightforward.
set-up.
Cloudflare handles bot management and attacks on the domain (there are hundreds per day) and operates a
caching layer in front of my site. The cache is not heavily used as I have few static files but this may be
useful in future.
This additional layer in front of my web server also allows me to hide any sensitive details on my webserver
as all traffic goes to and comes from cloudflare. The only thing an attacker can see is the cloudflare
servers. I am also able able to Geo-Block countries in cloudflare which helps cut down on bot traffic.
The Maps
I have embedded two different verions of mapping data. Folium is probably the only one enabled at the moment.
Folium - Map Generation Library
I'm using Folium
which is an mapping library for generating map data in various formats. My server does all the
rendering of
the html map directly into the page so its very light for clients to run.
This is
totally free to
serve
which is ideal for my use case. So far I'm placing markers/clusters (stop and vehicle locations), drawing
polylines (for bus
routes) and creating layer control for disabling certain overlays but there's much more I can add to in
future.
Plotly - Graphing Library
I'm using Plotly
which is an graphing and mapping library for visualising data. Information isn't that useful unless you can
explore and understand it. Plotly makes it easy for me to generate graphs of any type to represent any of
the data sets I have.
I wanted to be able to graph some real data in order to learn more
about using these tools so this seemed like the perfect opportunity. TODO: Generate graphing
for trends over time, per operator, delays and ghost trips.
Geoapify - Geolocating / Address Autocompletion
I'm using the Geoapify API for performing auto-complete of addresses
and geolocation on the nearby stops page
When you start typing an address, that string is queried and you are returned a list of matching
locations. You can select and submit that location which will be used to get the Latitude and Longitude of
the address. I have all of the locations of the stops saved in the database so I'm then able to return a map
of the closest ones within a radius using the Haservine formula to compare the distances.
All stops with a location inside the radius are compiled into a folium map and returned as the
final result. The Google equivilant service is paid with no free tier, while Geoapify has a very generous
free tier of 3K requests a day which made it ideal for my usecase.