Calculating a Levenshtein Distance in Python and Swift

Imagine you’re writing a mobile app, and your user searches for the word kitten. Unfortunately, the only search terms you expected them to enter were from the following:

smitten
mitten
kitty
fitting
written

How do we figure out which word the user meant to type?

Levenshtein Distances

A Levenshtein distance is a distance between two sequences a and b. If a and b are strings, the Levenshtein distance is the minimum amount of character edits needed to change one of the strings into the other. There are three types of edits allowed:

  • Insertion: a character is added to a
  • Deletion: a character is removed from b
  • Substitution: a character is replaced in a or b

For example, if the first string a = 'abc' and the second string is b = 'abc', the Levenshtein distance between the two strings is 0 because a and b are equal. If a = 'abcd' and b = 'a', the distance is 3. If a = 'abcd' and b = 'aacc', the distance is 2.

The definition of the Levenshtein distance for a string a with a length i and a string b with a length j is:

This definition is a recursive function. The first portion, max(i, j) if min(i, j) = 0, is the base cases where either the first string or the second string is empty.

The function 1_(ai != bi) at the end of the third minimum element is the cost. If a[i] != b[i], the cost is 1, otherwise the cost is 0. The first minimum element is a deletion from a, the second is an insertion, and the third is a substitution.

A Naive Implementation

First, let’s implement a straightforward implementation in Swift. We’ll create a function named levenshtein_distance and write the base cases to check whether either of the strings are empty:

func levenshtein_distance(a: String, b: String) -> Int {
    // If either array is empty, return the length of the other array
    if (a.count == 0) {
        return b.count
    }
    if (b.count == 0) {
        return a.count
    }
}

Then we add the recursive portion. We calculate the cost for the substitution, then find the minimum distance between the three different possible edits (deletion, insertion, or substitution):

func levenshtein_distance(a: String, b: String) -> Int {
    // ...

    // Check whether the last items are the same before testing the other items
    let cost = (a.last == b.last) ? 0 : 1

    let a_dropped = String(a.dropLast())
    let b_dropped = String(b.dropLast())

    return min(
        // Find the distance if an item in a is removed
        levenshtein_distance(a: a_dropped, b: b) + 1,
        // Find the distance if an item is removed from b (i.e. added to a)
        levenshtein_distance(a: a, b: b_dropped) + 1,
        // Find the distance if an item is removed from a and b (i.e. substituted)
        levenshtein_distance(a: a_dropped, b: b_dropped) + cost
    )
}

Let’s test our distance function with a simple test case:

print(opti_leven_distance(a: "123", b: "12"))
1

More example test cases can be found below in the final files. And now we can compare the distances of our words to the string kitten to figure out which word the user probably meant to type:

// Print out the distances for our test case
let first_word = "kitten"
let test_words = ["smitten", "mitten", "kitty", "fitting", "written"]

for word in test_words {
    let dist = opti_leven_distance(a: first_word, b: word)
    print("Distance between \(first_word) and \(word): \(dist)")
}
Distance between kitten and smitten: 2
Distance between kitten and mitten: 1
Distance between kitten and kitty: 2
Distance between kitten and fitting: 3
Distance between kitten and written: 2

The user probably meant to type mitten instead of kitten!

An Improved Implementation

The recursive implementation of the Levenshtein distance above won’t scale very well for larger strings. What if we needed to find the distance between a thousand strings, each with hundreds of characters?

One improved way to calculate a Levenshtein distance is to use a matrix of distances to “remember” previously calculated distances. First, the distance function should check for empty strings. Then, we’ll create a matrix to hold the distance calculations:

func opti_leven_distance(a: String, b: String) -> Int {
    // Check for empty strings first
    if (a.count == 0) {
        return b.count
    }
    if (b.count == 0) {
        return a.count
    }

    // Create an empty distance matrix with dimensions len(a)+1 x len(b)+1
    var dists = Array(repeating: Array(repeating: 0, count: b.count+1), count: a.count+1)
}

The first column and first row of the distance matrix are zeros as an initialization step. The next column goes from 1 to the length of a to represent removing each character to get to an empty string, and the next row goes from 1 to the length of b to represent adding (or inserting) each character to get to the value of b:

func opti_leven_distance(a: String, b: String) -> Int {
    //...

    // a's default distances are calculated by removing each character
    for i in 1...(a.count) {
        dists[i][0] = i
    }
    // b's default distances are calulated by adding each character
    for j in 1...(b.count) {
        dists[0][j] = j
    }
}

Similar to our naive implementation, we’ll check the remaining indices in the distance matrix. This time, however, we’ll use the previous values stored in the matrix to calculate the minimum distance rather than recursively calling the distance function. The final distance is the last element in the distance matrix (at the bottom right):

func opti_leven_distance(a: String, b: String) -> Int {
    //...

    // Find the remaining distances using previous distances
    for i in 1...(a.count) {
        for j in 1...(b.count) {
            // Calculate the substitution cost
            let cost = (a[i-1] == b[j-1]) ? 0 : 1

            dists[i][j] = min(
                // Removing a character from a
                dists[i-1][j] + 1,
                // Adding a character to b
                dists[i][j-1] + 1,
                // Substituting a character from a to b
                dists[i-1][j-1] + cost
            )
        }
    }

    return dists.last!.last!
}

We can use our test cases again to verify that our improved implementation is correct:

print(opti_leven_distance(a: "123", b: "12"))

// Print out the distances for our test case
let first_word = "kitten"
let test_words = ["smitten", "mitten", "kitty", "fitting", "written"]

for word in test_words {
    let dist = opti_leven_distance(a: first_word, b: word)
    print("Distance between \(first_word) and \(word): \(dist)")
}
1
Distance between kitten and smitten: 2
Distance between kitten and mitten: 1
Distance between kitten and kitty: 2
Distance between kitten and fitting: 3
Distance between kitten and written: 2

Swift and Python Implementations

Distance.playground:


import Foundation


/// Calculates the Levenshtein distance between two strings
/// - Parameter a: The first string
/// - Parameter b: The second string
func levenshtein_distance(a: String, b: String) -> Int {
    // If either array is empty, return the length of the other array
    if (a.count == 0) {
        return b.count
    }
    if (b.count == 0) {
        return a.count
    }

    // Check whether the last items are the same before testing the other items
    let cost = (a.last == b.last) ? 0 : 1

    let a_dropped = String(a.dropLast())
    let b_dropped = String(b.dropLast())

    return min(
        // Find the distance if an item in a is removed
        levenshtein_distance(a: a_dropped, b: b) + 1,
        // Find the distance if an item is removed from b (i.e. added to a)
        levenshtein_distance(a: a, b: b_dropped) + 1,
        // Find the distance if an item is removed from a and b (i.e. substituted)
        levenshtein_distance(a: a_dropped, b: b_dropped) + cost
    )
}

/// String extension to add substring by Int (such as a[i-1])
extension String {
    subscript (i: Int) -> Character {
      return self[index(startIndex, offsetBy: i)]
    }
}

/// A more optimized version of the Levenshtein distance function using an array of previously calculated distances
/// - Parameter a: The first string
/// - Parameter b: The second string
func opti_leven_distance(a: String, b: String) -> Int {
    // Check for empty strings first
    if (a.count == 0) {
        return b.count
    }
    if (b.count == 0) {
        return a.count
    }

    // Create an empty distance matrix with dimensions len(a)+1 x len(b)+1
    var dists = Array(repeating: Array(repeating: 0, count: b.count+1), count: a.count+1)

    // a's default distances are calculated by removing each character
    for i in 1...(a.count) {
        dists[i][0] = i
    }
    // b's default distances are calulated by adding each character
    for j in 1...(b.count) {
        dists[0][j] = j
    }

    // Find the remaining distances using previous distances
    for i in 1...(a.count) {
        for j in 1...(b.count) {
            // Calculate the substitution cost
            let cost = (a[i-1] == b[j-1]) ? 0 : 1

            dists[i][j] = min(
                // Removing a character from a
                dists[i-1][j] + 1,
                // Adding a character to b
                dists[i][j-1] + 1,
                // Substituting a character from a to b
                dists[i-1][j-1] + cost
            )
        }
    }

    return dists.last!.last!
}

/// Function to test whether the distance function is working correctly
/// - Parameter a: The first test string
/// - Parameter b: The second test string
/// - Parameter answer: The expected answer to be returned by the distance function
func test_distance(a: String, b: String, answer: Int) -> Bool {
    let d = opti_leven_distance(a: a, b: b)

    if (d != answer) {
        print("a: \(a)")
        print("b: \(b)")
        print("expected: \(answer)")
        print("distance: \(d)")
        return false
    } else {
        return true
    }
}

// Test the distance function with many different examples
test_distance(a: "", b: "", answer: 0)
test_distance(a: "1", b: "1", answer: 0)
test_distance(a: "1", b: "2", answer: 1)
test_distance(a: "12", b: "12", answer: 0)
test_distance(a: "123", b: "12", answer: 1)
test_distance(a: "1234", b: "1", answer: 3)
test_distance(a: "1234", b: "1233", answer: 1)
test_distance(a: "1248", b: "1349", answer: 2)
test_distance(a: "", b: "12345", answer: 5)
test_distance(a: "5677", b: "1234", answer: 4)
test_distance(a: "123456", b: "12345", answer: 1)
test_distance(a: "13579", b: "12345", answer: 4)
test_distance(a: "123", b: "", answer: 3)
test_distance(a: "kitten", b: "mittens", answer: 2)

print(opti_leven_distance(a: "123", b: "12"))

// Print out the distances for our test case
let first_word = "kitten"
let test_words = ["smitten", "mitten", "kitty", "fitting", "written"]

for word in test_words {
    let dist = opti_leven_distance(a: first_word, b: word)
    print("Distance between \(first_word) and \(word): \(dist)")
}

Here’s a Python implementation of the Swift code above as distance.py. The Python version also can handle any list as well as any str

# Calculates the Levenshtein distance between two strings
def levenshtein_distance(a, b):
    # If either array is empty, return the length of the other array
    if not len(a):
        return len(b)
    if not len(b):
        return len(a)

    # Check whether the last items are the same before testing the other items
    if a[-1] == b[-1]:
        cost = 0
    else:
        cost = 1

    return min(
        # Find the distance if an item in a is removed
        levenshtein_distance(a[:-1], b) + 1,
        # Find the distance if an item is removed from b (i.e. added to a)
        levenshtein_distance(a, b[:-1]) + 1,
        # Find the distance if an item is removed from a and b (i.e. substituted)
        levenshtein_distance(a[:-1], b[:-1]) + cost
    )

# A more optimized version of the Levenshtein distance function using an array of previously calculated distances
def opti_leven_distance(a, b):
    # Create an empty distance matrix with dimensions len(a)+1 x len(b)+1
    dists = [ [0 for _ in range(len(b)+1)] for _ in range(len(a)+1) ]

    # a's default distances are calculated by removing each character
    for i in range(1, len(a)+1):
        dists[i][0] = i
    # b's default distances are calulated by adding each character
    for j in range(1, len(b)+1):
        dists[0][j] = j

    # Find the remaining distances using previous distances
    for i in range(1, len(a)+1):
        for j in range(1, len(b)+1):
            # Calculate the substitution cost
            if a[i-1] == b[j-1]:
                cost = 0
            else:
                cost = 1

            dists[i][j] = min(
                # Removing a character from a
                dists[i-1][j] + 1,
                # Adding a character to b
                dists[i][j-1] + 1,
                # Substituting a character from a to b
                dists[i-1][j-1] + cost
            )

    return dists[-1][-1]

# Function to test whether the distance function is working correctly
def test_distance(a, b, answer):
    dist = opti_leven_distance(a, b)

    if dist != answer:
        print('a:', a)
        print('b:', b)
        print('expected:', answer)
        print('distance:', dist)
        print()

if __name__ == '__main__':
    # Test the distance function with many different examples
    test_distance('', '', 0)
    test_distance('1', '1', 0)
    test_distance('1', '2', 1)
    test_distance('12', '12', 0)
    test_distance('123', '12', 1)
    test_distance('1234', '1', 3)
    test_distance('1234', '1233', 1)
    test_distance([1, 2, 4, 8], [1, 3, 4, 16], 2)
    test_distance('', '12345', 5)
    test_distance([5, 6, 7, 7], [1, 2, 3, 4], 4)
    test_distance([1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5], 1)
    test_distance([1, 3, 5, 7, 9], [1, 2, 3, 4, 5], 4)
    test_distance([1, 2, 3], [], 3)
    test_distance('kitten', 'mittens', 2)



    first_word = 'kitten'
    test_words = ['smitten', 'mitten', 'kitty', 'fitting', 'written']
    for word in test_words:
        dist = opti_leven_distance(first_word, word)
        print(f'Distance between {first_word} and {word}: {dist}')

Announcing Darkscreen - A Dark App

I’m so excited to announce that my first iOS app, Darkscreen - A Dark App, has a public beta on Testflight! Ever since I was given my first iPod (all the way back in 7th grade!) I’ve dreamed of creating something that millions of people have the ability to enjoy, and I can’t express how excited I am. Here’s the official description:

Darkscreen allows you to use other iPad apps in Split View without any distractions, no hassle.

Darkscreen provides multiple themes, including:

  • Dark
  • Light
  • 80s
  • 90s
  • Outrun

Download using Testflight today!

Why Darkscreen?

I really love using Apollo for Reddit by Christian Selig, but he hasn’t gotten a chance to create a true iPad experience for his Reddit client yet. I use Darkscreen next to Apollo in Split View so that Apollo can be in an iPhone-sized container while keeping the rest of the screen black.

For example, posts shown in Apollo don’t quite look right when in full horizontal mode on iPad:

Apollo in full horizontal mode

Now with Darkscreen, I can browse Apollo in its intended view size without being distracted by other apps:

Apollo in Split View with Darkscreen

Switching to a new theme in Darkscreen automatically updates the table view as well as the root view underneath:

Darkscreen switching themes

My next goal, of course, is for Darkscreen to respond to the system-wide Dark Mode setting.

Why Open Source?

I found it an interesting challenge to modify the appearance of all of all views in the app immediately after a user selects a theme in a UITableView, and I hope this brief example can help other developers implement their own theme system.

Even though iOS 13 introduces system-wide Dark Mode, this example app can be helpful to support any custom themes that go beyond the default dark and light styles.

How to Update the Theme for a View

I’ve implemented the theme system using a Settings Bundle, so the BaseViewController can subscribe to settings (theme) changes:

func registerForSettingsChange() {
    NotificationCenter.default.addObserver(self,
                                            selector: #selector(BaseViewController.settingsChanged),
                                            name: UserDefaults.didChangeNotification,
                                            object: nil)
}

A Theme corresponds to UI styles and colors:

class Theme {

    // ...

    init(_ name: String, statusBar: UIStatusBarStyle, background: UIColor, primary: UIColor, secondary: UIColor) {
        self.name = name
        statusBarStyle = statusBar
        backgroundColor = background
        primaryColor = primary
        secondaryColor = secondary
    }
}

When a setting changes, BaseViewController updates its UI elements:

@objc func settingsChanged() {
    updateTheme()
}

func updateTheme() {
    // Status bar
    setNeedsStatusBarAppearanceUpdate()

    // Background color
    self.view.backgroundColor = Settings.shared.theme.backgroundColor

    // Navigation bar
    self.navigationController?.navigationBar.updateTheme()
}

And UINavigationBar is extended to support theme switching:

public extension UINavigationBar {
    func updateTheme() {
        // Background color
        barTintColor = Settings.shared.theme.backgroundColor

        // Bar item color
        tintColor = Settings.shared.theme.secondaryColor

        // Title text color
        titleTextAttributes = [NSAttributedString.Key.foregroundColor: Settings.shared.theme.secondaryColor]

        // Status bar style
        barStyle = Settings.shared.theme.navbarStyle

        // Tell the system to update it
        setNeedsDisplay()
    }
}

How to Build a Song Recommender Using Create ML MLRecommender

Beta Warning

This example was written using macOS Catalina Version 10.15 Beta and Xcode Version 11.0 beta 5. Changes may have been made to the MLRecommender constructor since this article was written (October 2019).

Objective

By the end of this post, we’ll learn how to use the Create ML MLRecommender to recommend a song to a user given their listening history. We’ll also learn how to parse and prepare an MLDataTable using Python and data from a third party.

Introduction to MLRecommender

A personalized recommendation system can be used in many different applications, such as a music player, video player, or social media site. A machine learning recommendation system compares a user’s past activity to a large library of activity from many other users. For example, if Spotify wanted to recommend you a new Daily Mix, their ML recommendation system might look at your listening history for the past few weeks and compare that history to your friends’ history. Our goal today is to create an MLRecommender to recommend songs to a user given their listening history.

The constructor for MLRecommender is:

init(trainingData: MLDataTable, userColumn: String, itemColumn: String, ratingColumn: String? = nil, parameters: MLRecommender.ModelParameters = ModelParameters()) throws

Creating the Data Tables

The first step is to create the trainingData in the form of an MLDataTable. In this case, our training data is the listening history of many different users from the Million Song Dataset, which holds the metadata for over a million songs and ratings provided by users.

We’ll use two files from the dataset. The first is 1000.txt, which contains the user id, song id, and listen time for 10000 records. We’ll call that history.txt from now on. The second is song_data.csv, which contains the song id, title, release date and artist name. We’ll call that songs.csv from now on. All of the complete files for this tutorial can be found at the end of the post.

Here’s what our input files look like. Notice that songs.csv has a header row while history.txt does not:

# history.txt

b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOAKIMP12A8C130995	1
b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBBMDR12A8C13253B	2
b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBXHDL12A81C204C0	1
...
# songs.csv

song_id,title,release,artist_name,year
SOQMMHC12AB0180CB8,"Silent Night","Monster Ballads X-Mas","Faster Pussy cat",2003
SOVFVAK12A8C1350D9,"Tanssi vaan","Karkuteillä",Karkkiautomaatti,1995
SOGTUKN12AB017F4F1,"No One Could Ever",Butter,"Hudson Mohawke",2006
...

We’ll be using the pandas Python library to handle our CSV data. First, download the files above and name them history.txt and songs.csv, and we’ll load them:

import csv
import pandas as pd

history_file = 'history.txt' # 'https://static.turi.com/datasets/millionsong/10000.txt'
songs_metadata_file = 'songs.csv' # 'https://static.turi.com/datasets/millionsong/song_data.csv'

# Import the files
history_df = pd.read_table(history_file, header=None)
history_df.columns = ['user_id', 'song_id', 'listen_count']
metadata_df =  pd.read_csv(songs_metadata_file)

songs.csv already has the column headers in the file, so we didn’t need to add those like we did with history_df. This is what our dataframes now look like:

# history_df

                                    user_id             song_id  listen_count
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOAKIMP12A8C130995             1
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBBMDR12A8C13253B             2
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBXHDL12A81C204C0             1
...
# metadata_df
# (The '\' means that the row continues onto the next lines)

              song_id              title                release  \
0  SOQMMHC12AB0180CB8       Silent Night  Monster Ballads X-Mas
1  SOVFVAK12A8C1350D9        Tanssi vaan            Karkuteillä
2  SOGTUKN12AB017F4F1  No One Could Ever                 Butter

        artist_name  year
0  Faster Pussy cat  2003
1  Karkkiautomaatti  1995
2    Hudson Mohawke  2006
...

Next, to create a single listening history for all users, we want to merge the song data in the metadata_df to the listening history in the history_df and create a CSV to use in Swift. Let’s also add a column that combines the song title with the artist name so that we can see both in our MLRecommender:

# Merge the files into a single csv
song_df = pd.merge(history_df, metadata_df.drop_duplicates(['song_id']), on="song_id", how="left")
song_df.to_csv('merged_listen_data.csv', quoting=csv.QUOTE_NONNUMERIC)

# Add a "Title - Name" column for easier printing later
song_df['song'] = song_df['title'] + ' - ' + song_df['artist_name']

Here’s what our combined song dataframe now looks like:

# song_df

                                    user_id             song_id  listen_count  \
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOAKIMP12A8C130995             1
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBBMDR12A8C13253B             2
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBXHDL12A81C204C0             1

             title              release    artist_name  year  \
0         The Cove   Thicker Than Water   Jack Johnson     0
1  Entre Dos Aguas  Flamenco Para Niños  Paco De Lucia  1976
2         Stronger           Graduation     Kanye West  2007

                              song
0          The Cove - Jack Johnson
1  Entre Dos Aguas - Paco De Lucia
2            Stronger - Kanye West
...

As of the time of writing, MLRecommender requires that the item id column in trainingData go from 1 to the number of items. In other words, if our trainingData included only three songs, merged_listen_data.csv would have song ids like SOQMMHC12AB0180CB8, SOVFVAK12A8C1350D9, and SOGTUKN12AB017F4F1, but we need to have song ids of 0, 1, and 2. Let’s add a new column to the CSV that uses incremental song ids from 0 to N:

# Find the unique song ids
song_ids = metadata_df.song_id.unique()

# Create a new dataframe of the unique song ids and a new incremental
# id for each one
incremental_id_df = pd.DataFrame({'song_id': song_ids})
incremental_id_df['incremental_song_id'] = incremental_id_df.index

# Merge the original song metadata with the incremental ids
new_song_id_df = pd.merge(song_id_df, incremental_id_df, on='song_id', how='left')
new_song_id_df.to_csv('songs_incremental_id.csv', quoting=csv.QUOTE_NONNUMERIC)

# Create a new merged history and song metadata CSV with incremental ids
new_history_df = pd.merge(history_df, incremental_id_df, on='song_id', how='inner')
new_history_df.to_csv('merged_listen_data_incremental_song_id.csv', quoting=csv.QUOTE_NONNUMERIC)

Here’s what our new song CSV file looks like. Notice that there’s now an added column at the beginning with a song id from 0 to 999999:

# songs_incremental_id.csv

"","song_id","title","release","artist_name","year","incremental_song_id"
0,"SOQMMHC12AB0180CB8","Silent Night","Monster Ballads X-Mas","Faster Pussy cat",2003,0
1,"SOVFVAK12A8C1350D9","Tanssi vaan","Karkuteillä","Karkkiautomaatti",1995,1
2,"SOGTUKN12AB017F4F1","No One Could Ever","Butter","Hudson Mohawke",2006,2
...

And here’s what our final merged listening data looks like with the incremental ids, ready to be read by the MLRecommender:

# merged_listen_data_incremental_song_id.csv

"","Unnamed: 0","user_id","song_id","listen_count","title","release","artist_name","year","incremental_song_id"
0,0,"b80344d063b5ccb3212f76538f3d9e43d87dca9e","SOAKIMP12A8C130995",1,"The Cove","Thicker Than Water","Jack Johnson",0,397069
1,18887,"7c86176941718984fed11b7c0674ff04c029b480","SOAKIMP12A8C130995",1,"The Cove","Thicker Than Water","Jack Johnson",0,397069
2,21627,"76235885b32c4e8c82760c340dc54f9b608d7d7e","SOAKIMP12A8C130995",3,"The Cove","Thicker Than Water","Jack Johnson",0,397069
...

Now we’re ready to load it into the recommender!

Using MLRecommender

Create a new Swift Playground, and add the two CSVs merged_listen_data_incremental_song_id.csv and songs_incremental_id.csv as resources to your Playground. For help on adding resources to a Swift Playground, check out this post. Make sure your Swift Playground is a blank macOS Playground and not an iOS Playground. Because our MLRecommender will only give us the user id and incremental song id when generating recommendations, we’ll use the second CSV to view the song titles.

First, let’s load the merged listening history with incremental ids:

import Foundation
import CreateML

// Create an MLDataTable from the merged CSV data
let history_csv = Bundle.main.url(forResource: "merged_listen_data_incremental_song_id", withExtension: "csv")!
let history_table = try MLDataTable(contentsOf: history_csv)
print(history_table)
Columns:
    X1	string
    Unnamed: 0	integer
    user_id	string
    song_id	string
    listen_count	integer
    title	string
    release	string
    artist_name	string
    year	integer
    incremental_song_id	integer
Rows: 2000000
Data:
+----------------+----------------+----------------+----------------+----------------+
| X1             | Unnamed: 0     | user_id        | song_id        | listen_count   |
+----------------+----------------+----------------+----------------+----------------+
| 0              | 0              | b80344d063b5...| SOAKIMP12A8C...| 1              |
| 1              | 18887          | 7c8617694171...| SOAKIMP12A8C...| 1              |
| 2              | 21627          | 76235885b32c...| SOAKIMP12A8C...| 3              |
| 3              | 27714          | 250c0fa2a77b...| SOAKIMP12A8C...| 1              |
| 4              | 34428          | 3f73f44560e8...| SOAKIMP12A8C...| 6              |
| 5              | 34715          | 7a4b8e7d2905...| SOAKIMP12A8C...| 6              |
| 6              | 55885          | b4a678fb729b...| SOAKIMP12A8C...| 2              |
| 7              | 65683          | 33280fc74b16...| SOAKIMP12A8C...| 1              |
| 8              | 75029          | be21ec120193...| SOAKIMP12A8C...| 1              |
| 9              | 105313         | 6fbb9ff93663...| SOAKIMP12A8C...| 2              |
+----------------+----------------+----------------+----------------+----------------+
+----------------+----------------+----------------+----------------+---------------------+
| title          | release        | artist_name    | year           | incremental_song_id |
+----------------+----------------+----------------+----------------+---------------------+
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
| The Cove       | Thicker Than...| Jack Johnson   | 0              | 397069              |
+----------------+----------------+----------------+----------------+---------------------+
[2000000 rows x 10 columns]

From there, we can create an MLRecommender. Our trainingData is the data table format of the merged listening history CSV, the userColumn is the user_id column name and the itemColumn is the incremental_song_id column name. The user_id of b80344d063b5ccb3212f76538f3d9e43d87dca9e was randomly picked from the merged CSV data:=.

// Generate recommendations
let recommender = try MLRecommender(trainingData: history_table, userColumn: "user_id", itemColumn: "incremental_song_id")
let recs = try recommender.recommendations(fromUsers: ["b80344d063b5ccb3212f76538f3d9e43d87dca9e"])
print(recs)
Columns:
    user_id	string
    incremental_song_id	integer
    score	float
    rank	integer
Rows: 10
Data:
+----------------+---------------------+----------------+----------------+
| user_id        | incremental_song_id | score          | rank           |
+----------------+---------------------+----------------+----------------+
| b80344d063b5...| 114557              | 0.0461493      | 1              |
| b80344d063b5...| 834311              | 0.0436045      | 2              |
| b80344d063b5...| 939015              | 0.043068       | 3              |
| b80344d063b5...| 955047              | 0.0427589      | 4              |
| b80344d063b5...| 563380              | 0.0426116      | 5              |
| b80344d063b5...| 677759              | 0.0423951      | 6              |
| b80344d063b5...| 689170              | 0.0418951      | 7              |
| b80344d063b5...| 333053              | 0.041788       | 8              |
| b80344d063b5...| 381319              | 0.0403042      | 9              |
| b80344d063b5...| 117491              | 0.0400819      | 10             |
+----------------+---------------------+----------------+----------------+
[10 rows x 4 columns]

But we want to know the song metadata associated with each recommended incremental_song_id. Let’s load the song metadata table and join the recommendations with the song metadata using the incremental id:

// Use the songs data CSV to print the recommended song titles
let songs_csv = Bundle.main.url(forResource: "songs_incremental_id", withExtension: "csv")!
let songs_table = try MLDataTable(contentsOf: songs_csv)
print(songs_table)

let song_title_recs = recs.join(with: songs_table, on: "incremental_song_id")
print(song_title_recs)
Columns:
    X1	string
    song_id	string
    title	undefined
    release	string
    artist_name	string
    year	integer
    incremental_song_id	integer
Rows: 1000000
Data:
+----------------+----------------+----------------+----------------+----------------+
| X1             | song_id        | title          | release        | artist_name    |
+----------------+----------------+----------------+----------------+----------------+
| 0              | SOQMMHC12AB0...| Silent Night   | Monster Ball...| Faster Pussy...|
| 1              | SOVFVAK12A8C...| Tanssi vaan    | Karkuteillä   | Karkkiautoma...|
| 2              | SOGTUKN12AB0...| No One Could...| Butter         | Hudson Mohawke |
| 3              | SOBNYVR12A8C...| Si Vos Querés | De Culo        | Yerba Brava    |
| 4              | SOHSBXH12A8C...| Tangle Of As...| Rene Ablaze ...| Der Mystic     |
| 5              | SOZVAPQ12A8C...| Symphony No....| Berwald: Sym...| David Montgo...|
| 6              | SOQVRHI12A6D...| We Have Got ...| Strictly The...| Sasha / Turb...|
| 7              | SOEYRFT12AB0...| 2 Da Beat Ch...| Da Bomb        | Kris Kross     |
| 8              | SOPMIYT12A6D...| Goodbye        | Danny Boy      | Joseph Locke   |
| 9              | SOJCFMH12A8C...| Mama_ mama c...| March to cad...| The Sun Harb...|
+----------------+----------------+----------------+----------------+----------------+
+----------------+---------------------+
| year           | incremental_song_id |
+----------------+---------------------+
| 2003           | 0                   |
| 1995           | 1                   |
| 2006           | 2                   |
| 2003           | 3                   |
| 0              | 4                   |
| 0              | 5                   |
| 0              | 6                   |
| 1993           | 7                   |
| 0              | 8                   |
| 0              | 9                   |
+----------------+---------------------+
[1000000 rows x 7 columns]


Columns:
    user_id	string
    incremental_song_id	integer
    score	float
    rank	integer
    X1	string
    song_id	string
    title	undefined
    release	string
    artist_name	string
    year	integer
Rows: 11
Data:
+----------------+---------------------+----------------+----------------+----------------+
| user_id        | incremental_song_id | score          | rank           | X1             |
+----------------+---------------------+----------------+----------------+----------------+
| b80344d063b5...| 114557              | 0.0461493      | 1              | 114578         |
| b80344d063b5...| 117491              | 0.0400819      | 10             | 117512         |
| b80344d063b5...| 333053              | 0.041788       | 8              | 333174         |
| b80344d063b5...| 381319              | 0.0403042      | 9              | 381465         |
| b80344d063b5...| 381319              | 0.0403042      | 9              | 444615         |
| b80344d063b5...| 563380              | 0.0426116      | 5              | 563705         |
| b80344d063b5...| 677759              | 0.0423951      | 6              | 678222         |
| b80344d063b5...| 689170              | 0.0418951      | 7              | 689654         |
| b80344d063b5...| 834311              | 0.0436045      | 2              | 834983         |
| b80344d063b5...| 939015              | 0.043068       | 3              | 939863         |
+----------------+---------------------+----------------+----------------+----------------+
+----------------+----------------+----------------+----------------+----------------+
| song_id        | title          | release        | artist_name    | year           |
+----------------+----------------+----------------+----------------+----------------+
| SOHENSJ12AAF...| Great Indoors  | Room For Squ...| John Mayer     | 0              |
| SOOGZYY12A67...| Crying Shame   | In Between D...| Jack Johnson   | 2005           |
| SOGFKJE12A8C...| Sun It Rises   | Fleet Foxes    | Fleet Foxes    | 2008           |
| SOECLAD12AAF...| St. Patrick'...| Room For Squ...| John Mayer     | 0              |
| SOECLAD12AAF...| St. Patrick'...| Room For Squ...| John Mayer     | 0              |
| SOAYTRA12A8C...| All At Once    | Sleep Throug...| Jack Johnson   | 2008           |
| SOKLVUI12A67...| If I Could     | In Between D...| Jack Johnson   | 2005           |
| SOYIJIL12A67...| Posters        | Brushfire Fa...| Jack Johnson   | 2000           |
| SORKFWO12A8C...| Quiet Houses   | Fleet Foxes    | Fleet Foxes    | 2008           |
| SOJAMXH12A8C...| Meadowlarks    | Fleet Foxes    | Fleet Foxes    | 2008           |
+----------------+----------------+----------------+----------------+----------------+
[11 rows x 10 columns]

The last table printed has our recommended songs, and the first one is “Great Indoors”! We can now use our MLRecommender for other user ids.

Wrap Up

First, we took a look at the MLRecommender constructor. Then, we gathered song data from the Million Song Dataset. We modified the dataset to increase legibility and added incremental ids for the song metadata. We loaded the song metadata and listening history into a Swift Playground, created an MLRecommender from the listening history and generated recommended songs. Then, we used the song metadata to join the recommended songs to their titles and artists.

Source Files

Each of the files mentioned in this tutorial can be found here, including:

  • songs.csv: Metadata for one million songs
  • history.txt: Song listening history for multiple users
  • data-parser.py: Python code to manipulate the Million Song Dataset
  • merged_listed_data.csv: Merged dataset of song metadata and listening history
  • merged_listed_data_incremental_song_id.csv: merged_listed_data.csv with incremental ids added
  • songs_incremental_id.csv: songs.csv with incremental ids added
  • MusicRecommender.playground: Swift Playground for creating the MLRecommender

This blog post was inspired by Eric Le’s How to build a simple song recommender system.

How to Change the Status Bar Style in iOS 12

For some iOS apps, it may be helpful to change the color of the status bar at the top of the screen. For example, if I have a dark background, the default status bar style is hard to read:

Dark iPhone app

To change the appearance of the status bar within a view controller, first add “View controller-based status bar appearance” as an item to your Info.plist with a value of YES:

Info.plist

Then in any view controller, you override the preferredStatusBarStyle property:

override var preferredStatusBarStyle: UIStatusBarStyle {
    return .lightContent
}

And if you ever need to update the status bar color, call setNeedsStatusBarAppearanceUpdate(). Now the full view controller looks like this:

import UIKit

class ViewController: UIViewController {
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view,
        // typically from a nib.
        setNeedsStatusBarAppearanceUpdate()
    }

    override var preferredStatusBarStyle: UIStatusBarStyle {
        return .lightContent
    }
}

Running this view controller, we get a light status bar!

Dark iPhone app with light status bar

Compiling Cub for Shortcuts

Earlier this week I heard about Shortcuts JS, which uses JavaScript to generate Shortcuts. Here’s an example from their homepage:

// We'll use this later to reference the output of a calculation
let calcVar = actionOutput();

// Define a list of actions
const actions = [
  comment({
    text: 'Hello, world!',
  }),
  number({
    number: 42,
  }),
  calculate({
    operand: 3,
    operation: '/',
  }, calcVar),
  showResult({
    // Use the Magic Variable
    text: withVariables`Total is ${calcVar}!`,
  }),
];

While this is a good first step to writing “real code” to make Shortcuts, specifying the operands and others in this fashion is clunky. I wondered how easy it would be to use the syntax tree instead to create the Shortcut, and Will Richardson has done that exact thing for Cub in a blog post:

All I have to do was traverse the syntax tree and generate a Shortcuts file. In terms of code this is fairly straightforward - just add an extension to each AST node that generates the corresponding code.

I’m not familiar enough with iOS app development or Swift to do it myself, but it would be really interesting to write an app that can use something like swift-ast to generate Shortcuts. Who knows what power iOS users could get if they could program advanced Shortcuts using Swift?

State of the Apps 2018

Inspired by Cortex’s annual State of the Apps discussion, I thought it would be fun to start documenting what I’m using the most on my phone every year. Below are the my most-used 3rd party apps of the year.

Productivity

1Password 1Password
I moved from LastPass to 1Password earlier this year and I couldn’t be happier. With the Password AutoFill API, 1Password integrates with the iOS keyboard to fill in logins with only one tap, and then app even copies one-time authentication codes to the clipboard. 1Password’s integration with iOS 12 has even stellar and I can’t recommend it more.

Todoist Todoist
A cross-platform task management system, Todoist is how I keep track of all of my projects, from TA grading deadlines to senior design final deliverables. While Todoist’s UI doesn’t have a native app feel, it’s clean and consistent.

Slack Slack
Aptly categorized by Federico Viticci as a barely passable iOS client designed to access a web app, Slack was the app I used almost every day to communicate with several groups at my university. Slack supports some newer iOS features such as grouped notifications, but the UI mostly resembles its desktop Electron client and often hides away features in non-obvious places, such as touching and holding a message to add a reaction.

Social & Entertainment

Instagram Instagram
While I’m not a fan of the company behind Instagram for many many many reasons, Instagram is a place where I can keep up with my friends via Stories and posts, and I’ve found it to be much more positive overall than other social networks this year.

Apollo Apollo
Reddit is one of my primary social networks I use every day, and Apollo is hands down my favorite way to experience Reddit on any platform. The design is intuitive and easy to personalize, and it has a fantastic night mode. I can’t stop recommending the app to my friends that use the first party client.

lire lire
A few months ago I was using IFTTT applets to monitor RSS feeds and push them to a specific project in Todoist, but it because untenable after adding too many feeds. I moved to Inoreader as my RSS service and lire as my RSS reader. The app has a clean, native-looking design, and it uses its own extractor to display a story’s full text. I wish the per-feed options were more straightforward and easy to access, but overall I’m very happy using it as my primary RSS reader.

Overcast Overcast
Overcast has been my podcast player go-to app since I started using it over 3 years ago. Smart Speed and Voice Boost are still industry-leading features that I can’t live without, and new features are being continuously added, such as full text search in version 5.0.

GroupMe GroupMe
Another app in the “barely passable” category, I use GroupMe every day at my university for group chats ranging from friend groups to club event announcements. The app is missing many basic features I expect from a communication app, including read-message syncing across devices and platforms, and I cannot wait until I can delete this app from my phone.

Health & Finance

AutoSleep AutoSleep
While the Apple Health app tracks basic sleep information, AutoSleep provides in-depth detail for sleep trends and day-to-day stats. I check it often to look at my readiness for the day, cumulative sleep debt, and overall sleep time consistency. It was especially interesting to compare one particularly-gruesome senior design week of little sleep with the rest of the semester averages.

YNAB YNAB
While I had toyed with budgeting on Mint in the past, YNAB (or You Need a Budget) is a great way to manage your savings and expenses. The service requires a subscription, but its features such as Bank Syncing and Goal Tracking as well as it’s straightforward usage make it an excellent deal. YNAB has given me a clear way to know exactly what I’m spending every month.

Venmo Venmo
A digital wallet app owned by PayPal, Venmo allows me to send and receive payments from other people. Similar apps like Apple Pay and Cash App are available, but Venmo is what nearly all of the people in my social circle have centralized on. Further, with the addition of a Venmo card option released in the summer, Venmo has made it to handle group events and easily split payments.

Utilities

Deliveries Deliveries
This app has allowed me to neurotically check my packages from every online retailer. All I need to do is copy the tracking number from USPS, FedEx, or UPS and paste it into Deliveries, and then I can push updates whenever the package status has changed. Deliveries is so good it encourages me to buy more things from Amazon, if that’s even possible.

CARROT Weather CARROT Weather
I switched from Dark Sky to CARROT Weather in February, and that was mostly because of the fun sadistic messages that managed to be both funny and relevant. At the same time, the app has seen numerous updates this year to support multiple weather locations, highly customizable Apple Watch complications, and more.

1Blocker X 1Blocker X
For any website that doesn’t respect its users and decides to use popups, newsletter signup prompts, and auto-playing video ads, 1Blocker X is great at preventing them in the background. The app does what a utility should do: work without me even remembering I have a content blocker turned on.

Google Maps Google Maps
Whenever I need to travel more than 10 minutes, I turn on Google Maps for live traffic updates and possible shorter route recommendations. In addition to retailer busy times and reviews, Google Maps is a great way to gather details about locations around me.

Due Due
For smaller tasks that I want to be reminded of over and over, I put them in Due, which is great at spamming me with notifications until I mark the task complete. Plus, the app recently added custom snooze times from a notification in a recent update.

Google Photos Google Photos
I use Google Photos both as a secondary back up to iCloud Photos and for more power photo analytics than what Apple provides. For now, I’ll gladly trade Google using my aggregated photo data to have an easy way to search photos by person, place, or thing across platforms. I also share albums and pictures through Google Photos to groups that have a mix of Android an iOS devices.

Tailor Tailor
A nifty utility app that stitches multiple screenshots into one vertical image, Tailor provides an easy way to send more readable conversations to other people, whether they’re from Messages, Slack, or other apps.

Home Screen

Finally, here’s a picture of my home screen at the end of 2018:

My home screen

The Case of the 500-Mile Email

Trey Harris, writing to sage-members:

I was working in a job running the campus email system some years ago when I got a call from the chairman of the statistics department.

“We’re having a problem sending email out of the department.”

“What’s the problem?” I asked.

“We can’t send mail more than 500 miles,” the chairman explained.

I choked on my latte. “Come again?”

“We can’t send mail farther than 500 miles from here,” he repeated. “A little bit more, actually. Call it 520 miles. But no farther.”

While it’s an older story, it’s a fantastic one. And it’s a great reminder to check your software versions.

Web API and Templates with Python requests and Jinja2

Introduction

Web APIs are becoming an increasingly popular method to retrieve and store data, and they can be an extremely powerful tool for any programmer. In this walkthrough, you will use GroupMe’s public API to retrieve the last few messages in a group and display it using a custom HTML template in Python 3.7.1:

Output Website

While we’ll be using specific GroupMe API endpoints, this walkthrough will help you to learn the basics of working with web APIs in general.

Set up

Before we begin, you need to have the following modules installed:

  • requests (for connecting to the API)
  • jinja2 (for adding data to our template)

Using a Web API

Working with requests

The requests library is great for creating HTTP requests, and it has fantastic documentation. We’ll be using requests.get(url) to get the information we need:

>>> import requests
>>> # Use JSONPlaceHolder as an example
>>> req = requests.get('https://jsonplaceholder.typicode.com/todos/1')
>>> print(req.text)
{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

Create an Application

To use GroupMe’s API, we need to first register an application to get our API key. Log into the developer application page and click “Create Application”. You’ll be taken to this page:

Create Application Page

We won’t be using the Callback URL, so set that to any valid URL. Fill in the developer information and submit the form, and you’ll be taken to the application’s detail page:

Application Page

Copy the Access Token at the bottom of the application page. That’ll be our API key.

Find the Group ID

To get messages in a group, we’ll first need to get the group’s ID. GroupMe’s documentation says to use the base url https://api.groupme.com/v3 and has the following example using curl:

$ curl -X POST -H "Content-Type: application/json" -d '{"name": "Family"}' https://api.groupme.com/v3/groups?token=YOUR_ACCESS_TOKEN

From this, we know that the url we use with requests.get() will be in the form1

https://api.groupme.com/v3/...?token=YOUR_ACCESS_TOKEN

Looking at the groups API, we can use GET /groups to retrieve a list of our most recent groups:

>>> import requests
>>> API_KEY = 'YOUR_API_KEY'
>>> base_url = 'https://api.groupme.com/v3'
>>> req = requests.get(f'{base_url}/groups?token={API_KEY}')
>>> req.content
b'{"response":[{"id":"1111","group_id":"2222","name":"My Group Name","phone_number":"+1 5555555555","type":"private","description":"Group description goes here",
...

First, we construct the url for the API request and pass it as the argument to requests.get(). Then, we print the result of the API request stored as req.content.

The response we get from GroupMe is a JSON-formatted string, so we’ll move our script into its own file and parse the string using Python’s standard json library:

import json
import requests

API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.groupme.com/v3'

def api_request(request):
    '''Returns the data from a request (eg /groups)'''
    url = f'{BASE_URL}{request}?token={API_KEY}'
    req = requests.get(url)

    if not req.content:
        return None

    # We only want the data associated with the "response" key
    return json.loads(req.content)['response']

if __name__ == '__main__':
    groups = api_request('/groups')
    print(len(groups), 'group(s)')
    for group in groups:
        print(f'ID {group["group_id"]}: {group["name"]}')

The function api_request does the work of creating the final URL string for us. Then, it makes the request and checks that something was returned by GroupMe’s servers. If something was sent back to us, the content is converted2 from a string into a Python object using json.loads(). Finally, we return the data associated with the key response, because the rest is metadata unimportant to us.

When we run the script, our most recent groups are returned (as a JSON object decoded into a Python object). The result will tell us the group names and their group IDs:

3 group(s)
ID 11111111: Python Tips and Tricks
ID 22222222: University Friend Group
ID 33333333: GitHub Chat

Get Messages for a Group

We have a list of our group IDs, so we can use the following API to get a list of recent messages for one group:

GET /groups/<group_id>/messages

Let’s add this endpoint to our script as get_messages_for_group(group_id):

import json
import requests

API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.groupme.com/v3'

def api_request(request):
    '''Returns the data from a request (eg /groups)'''
    url = f'{BASE_URL}{request}?token={API_KEY}'
    req = requests.get(url)

    if not req.content:
        return None

    # We only want the data associated with the "response" key
    return json.loads(req.content)['response']

def get_messages_for_group(group_id):
    response = api_request(f'/groups/{group_id}/messages')

    # Just return the messages (and none of the metadata)
    return response['messages']

if __name__ == '__main__':
    messages = get_messages_for_group(YOUR_GROUP_ID)
    print(messages[0])

Our script will get the messages for a group (fill in YOUR_GROUP_ID) and print the most recent one. Running it will print something like:

{'attachments': [], 'avatar_url': None, 'created_at': 1544810700, 'favorited_by': [], 'group_id': '11112233', 'id': '882882828288288282', 'name': 'Johnny Test', 'sender_id': '22558899', 'sender_type': 'user', 'source_guid': 'android-11111111-3eee-4444-9999-aaaabbbbcccc', 'system': False, 'text': "Hello everyone!", 'user_id': '55551111'}

We can see from the message’s data that the sender’s name “Jonny Test” and the text was “Hello everyone!” Next, we should organize our API results as Python objects to be easier to expand on.

Creating Classes for API Objects

Now that we’re ready to start processing the data from the API, it’s a good time to create objects to represent our API objects. With Python classes, we can keep only the data we need and begin to process our own information. We’ll initialize our API objects by passing them the decoded Python object from api_request(request). This way, we can more easily add class properties without needing to change our request function.

Let’s make two classes, Group and Message:

class Message:
    def __init__(self, json):
        self.user_id = json['user_id']
        self.name = json['name']
        self.text = json['text']

class Group:
    def __init__(self, json):
        self.id = json['group_id']
        self.name = json['name']
        self.messages = []

Then we can add a method to Group to fetch its recent messages:

def get_recent_messages(self):
    messages = get_messages_for_group(self.id)

    # Convert each message to our object
    for message in messages:
        new_message_object = Message(message)
        self.messages.append(new_message_object)

And then we can use our script to print out the messages for a group:

import json
import requests

API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.groupme.com/v3'

def api_request(request):
    '''Returns the data from a request (eg /groups)'''
    url = f'{BASE_URL}{request}?token={API_KEY}'
    req = requests.get(url)

    if not req.content:
        return None

    # We only want the data associated with the "response" key
    return json.loads(req.content)['response']

def get_messages_for_group(group_id):
    response = api_request(f'/groups/{group_id}/messages')

    # Just return the messages and none of the metadata
    return response['messages']

class Message:
    def __init__(self, json):
        self.user_id = json['user_id']
        self.name = json['name']
        self.text = json['text']

class Group:
    def __init__(self, json):
        self.id = json['group_id']
        self.name = json['name']
        self.messages = []
        self.get_recent_messages()

    def get_recent_messages(self):
        messages = get_messages_for_group(self.id)

        # Convert each message to our object
        for message in messages:
            new_message_object = Message(message)
            self.messages.append(new_message_object)

if __name__ == '__main__':
    groups_json = api_request('/groups')
    my_group = Group(groups_json[0])

    for message in my_group.messages:
        print(message.text)
        print(f'-- {message.name}')
        print()

The result is the most recent messages for our most recent group:

Hello everyone!
-- Johnny Test

Hi guys I had a question about using @classmethod
-- Alexa Jones

Wow great work!
-- Katie Alendra

We have the data in a manageable format, so it’s time to start formatting it in a readable form.

Using Jinja Templates

We’ve come a long way so far! First, we learned how to make HTTP GET requests to a server. Then, we used GroupMe’s API docs to fetch data about different groups and messages, and then we created Python classes to better organize our information. Let’s create a Jinja template to print out our data.

Create the Template

First, I’ll make a group.html file that has the framework of I want the web page to look like:

<body>
    <h1>GROUP NAME</h1>
    <br />

    <!-- Repeat for every message -->
    <p><b>MESSAGE CONTENT</b> &mdash; NAME</p>
</body>

With Jinja, variables are inserted into the template using {{ variable_name }}, and logic statements have a form such as:


{% if should_display %}
    <p>This message should be displayed</p>
{% endif %}

If we assume that we’ll pass a Group() instance into our Jinja template with the variable name group, we can rewrite group.html as:


<body>
    <h1>{{ group.name }}</h1>
    <br />

    <!-- Repeat for every message -->
    {% for message in group.messages %}
    <p><b>{{ message.text }}</b> &mdash;{{ message.name }}</p>
    {% endfor %}
</body>

Note the {% endif %} and {% endfor %} in the above snippets; they’re required for all conditionals and loops.

Populate the Template

With our template written, let’s go back to our script and add a section to import our template using jinja2.

with open('group.html', 'r') as f:
    contents = f.read()

template = jinja2.Template(contents)
filled_template = template.render(group=my_group)

with open('output.html', 'w') as f:
    f.write(filled_template)

First, we read the contents of our template file. Because we’re only going to use one file, we can just load the text of our template into jinja2.Template, and then we can render the template by passing our my_group variable (from our main script) as group. Finally, we write the contents to output.html to view it in a browser.

Now we have our full script:

import json
import jinja2
import requests

API_KEY = 'YOUR_API_KEY'
BASE_URL = 'https://api.groupme.com/v3'

def api_request(request):
    '''Returns the data from a request (eg /groups)'''
    url = f'{BASE_URL}{request}?token={API_KEY}'
    req = requests.get(url)

    if not req.content:
        return None

    # We only want the data associated with the "response" key
    return json.loads(req.content)['response']

def get_messages_for_group(group_id):
    response = api_request(f'/groups/{group_id}/messages')

    # Just return the messages and none of the metadata
    return response['messages']

class Message:
    def __init__(self, json):
        self.user_id = json['user_id']
        self.name = json['name']
        self.text = json['text']

class Group:
    def __init__(self, json):
        self.id = json['group_id']
        self.name = json['name']
        self.messages = []
        self.get_recent_messages()

    def get_recent_messages(self):
        messages = get_messages_for_group(self.id)

        # Convert each message to our object
        for message in messages:
            new_message_object = Message(message)
            self.messages.append(new_message_object)

if __name__ == '__main__':
    groups_json = api_request('/groups')
    my_group = Group(groups_json[0])

    with open('group.html', 'r') as f:
        contents = f.read()

    template = jinja2.Template(contents)
    filled_template = template.render(group=my_group)

    with open('output.html', 'w') as f:
        f.write(filled_template)

Once run, we can view our output.html in a browser:

<body>
    <h1>Python Tips and Tricks</h1>
    <br />

    <!-- Repeat for every message -->
    <p><b>Hello everyone!</b> &mdash;Johnny Test</p>
    <p><b>Hi guys I had a question about using @classmethod</b> &mdash;Alexa Jones</p>
    <p><b>Wow great work!</b> &mdash;Katie Alendra</p>
</body>

Output Website

Conclusion

We’ve walked through how to access and parse a web API using the requests library, how to represent and organize the API data using Python classes, and how to render the information in a custom format using a Jinja template. Now go create your own cool stuff using APIs!


  1. We won’t in this walkthrough, but ff we needed to pass multiple parameters in the URL, it’ll look like v3/...?limit=10&another_param=1000&token=YOUR_ACCESS_TOKEN

  2. Typically, json.loads() returns a dict that can contain more dicts, lists, and values like None, strings, and integers. Check out the Python docs for examples. 

Expand Tilde Paths in Bash and Python

Sometimes it’s necessary to reference files in a script using ~. For example, if you want to schedule a cron job to run a script in a folder and place the results in the same folder, it’s helpful to use absolute referencing of the files in the script.

Bash

Here’s my first attempt to append to a file:

$ ./my_folder/run.sh >> "~/my_folder/output.txt"
-bash: ~/my_folder/output.txt: No such file or directory

The issue with the above line is that the ~ is not expanded to the home directory (such as /home/username/) because it is inside the quotes. To fix this, move the path outside of the quotes, but leave the filename in single quotes (to escape the . in the extension):

$ ./my_folder/run.sh >> ~/my_folder/'output.txt'

Python

I encountered a similar issue in Python:

>>> with open('~/my_folder/output.txt', 'r') as f:
...   contents = f.read()
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '~/my_folder/output.txt'

This can be fixed using os.path.expanduser(path):

>>> import os
>>> filename = os.path.expanduser('~/my_folder/output.txt')
>>> with open(filename, 'r') as f:
...   contents = f.read()
...
>>> print(contents)
10

iPad Screenshots

Dr. Drang:

On the iPad, ⇧⌘3 captures the whole screen, just like the Mac (and just like capturing with the top and volume up buttons). The ⇧⌘4 shortcut also captures the whole screen, but in a neat analogy to the Mac, it immediately puts you into editing mode so you can crop the capture down to a smaller size.

I don’t find these keyboard shortcuts surprising, but it is surprising that I never thought to try it on an iPad. With the new screenshot tool in macOS Mojave, I wonder what other features will reach parity on macOS and iOS in the future.

Removing Local Git Branches That Aren't 'master'

Every so often, I’ll want to delete all of my local branches for a repository that aren’t the master branch. An easy command to do this is:

$ git branch | grep -v "master" | xargs git branch -d

(If you want to keep multiple branches, such as master and develop, you can chain them together using grep -v "master\|develop")

git branch lists all of the local branches for the repo, grep -v prints all of the lines from the previous command that don’t match “master”, and xargs takes each line from the previous output and runs git branch -d <output_line>.

I recommend using -d rather than -D in case git recommends not deleting the branch.

How To Change a Git Repo's Authentication Protocol

HTTPS to SSH Key

Often I need to change a git repository to use an SSH key instead of my username and password to authenticate with the remote server. In order to do so, type the following in the repository’s folder on your machine:

$ git config remote.origin.url git@github.com:username/repository_name.git

(Make sure to include the .git at the end of the repository name.)

SSH Key to HTTPS

In order to change it to do the reverse, type:

$ git config remote.origin.url https://github.com/username/repository

Auto Login for PuTTY (Windows)

Often I find myself wanting to have an easy way to SSH into a server on a Windows PC. Unfortunately, SSH keys on Windows can often be a challenge, but there’s an easy way to have PuTTY connect without needing to type in a password every time.

To create a shortcut for a PuTTY connection to automatically log in, you only need two things: the name of the profile (in PuTTY) that has the connection and appearance settings, and the password to your account (for the server). Right click on the desktop to create a new shortcut, then for the link type:

"C:\Program Files (x86)\PuTTY\putty.exe" -load "<profileName>" -pw "<password>"

If you saved PuTTY to a different location other than Program Files (x86), then you’ll also need to change the location of putty.exe in the command above.

Once you’ve created the shortcut, you can pin it to the taskbar or the start menu for easy access!

These instructions were inspired by the instructions for the Purdue ECE 264 course page.

Pointing a Github Pages Repo to a Hover Domain

My blog is currently hosted using GitHub Pages—which is a great way to host your static site or blog for free—by linking it to my custom domain that I purchased through Hover. While both of these services are amazing, connecting the two required many open tabs and several waiting periods. This post will explain the steps needed to point a GitHub Pages repo to a custom domain on Hover.

Preflight Check

Before connecting GitHub Pages to a custom domain, I first updated my blog on my repository nickymarino.github.io, and checked that it was displaying properly at its default website (normally https://nickymarino.github.io).

First you need to update your repository with your custom domain. In the settings for the repo, enter the domain in the “Custom domain” in the GitHub Pages section.

GitHub Pages settings for the repo

A Records on Hover

The next step is to configure Hover. Find GitHub’s current list of IP addresses to create records with. Then, go to your Hover account, select your domain, and go to the DNS tab) to add to the DNS tab on Hover. At the time of writing, these are:

185.199.108.153
185.199.109.153
185.199.110.153
185.199.111.153

Then, go to your Hover account, select your domain, and go to the DNS tab. Delete any DNS records that have an “A” under “Records”.

For each IP address on GitHub’s help pages, add a DNS record. For each, the “Type” will be A, the “Hostname” will be @, and the “TTL” can be left as the default value.

Hover DNS settings

It may take several hours (or up to about a day) for the changes to take effect. Take a break, get some sleep, and then come back to your domain to make sure everything’s working. Now we can enforce HTTPS!

Create HTTPS certificate

If you head back to your repo’s settings page to enforce HTTPS, you might see the following “not yet available” error:

GitHub Pages HTTPS error

Per GitHub’s troubleshooting page, you need to remove and then re-add your custom domain for your repository. Wait around 24 hours for the certificate to be generated, and you should be good to go!

A New Look

I’ve owned this domain (nickymarino.com) for roughly three years now, and so far I’ve only used it as a resume/portfolio site. I’ve finally found a theme that I both appreciate and can spend time modifying to fit my needs.

My goal is to start writing (and podcasting!) more often, whether it’s a technical detail I found interesting, an overcome challenge I want to record for when I encounter it again, or anything else I can think of.

I have a few ideas up my sleeve.

You can stay updated via the site feed or on Twitter.