Playing Wordle in R

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The game Wordle has taken the world (or at least my facebook feed) by storm. It’s a really simple word game that’s a lot like the classic Mastermind. Here are the rules from the Wordle website:

The logic behind the game is pretty simple, so I thought I’d code up an R version so that those of you who can’t get enough of it can play it on your own! The full code is available here.

In my version, I allow the user to set 3 parameters:

  • dictionary: A vector of possible words that the computer can choose from for the word to be guessed.
  • wordLength: The length of the word to be guessed. Wordle sets this parameter to 5.
  • nGuesses: Number of guesses the player is allowed. Wordle sets this parameter to 6.

While the user sees guesses and answers as strings (e.g."early"), it’s much easier to work with vectors of characters in R (e.g. c("e", "a", "r", "l", "y")). Since these are going to be short words, there isn’t much of a performance difference.

I only have one helper function, which takes a guess and the answer (as character vectors) and evaluates the color of the tiles (“G” for green, “Y” for yellow, “-” for neither). I use two passes over the guess vector: the first to evaluate which tiles should be green, and the second to evaluate which should be yellow. There might be a way to do it over one pass, but I wanted to achieve certain behavior when letters were repeated (either in guess or answer) that wasn’t easy to do so in one pass. Again, there isn’t much of a performance hit since the words are short.

evaluateGuess <- function(guessVec, answerVec) {
  wordLength <- length(answerVec)

  resVec <- rep("-", wordLength)
  # first pass: mark exact matches (green)
  for (i in 1:wordLength) {
    if (guessVec[i] == answerVec[i]) {
      resVec[i] <- "G"
      answerVec[i] <- "-"  # mark unavailable for yellow
    }
  }
  
  # second pass: mark yellow
  for (i in 1:wordLength) {
    if (resVec[i] != "G") {
      idx <- match(guessVec[i], answerVec)
      if (!is.na(idx)) {
        resVec[i] <- "Y"
        answerVec[idx] <- "-"
      }
    }
  }
  
  resVec
}

# example
evaluateGuess(strsplit("early", "")[[1]], 
              strsplit("later", "")[[1]])
# [1] "Y" "G" "Y" "Y" "-"

Here is the main function for playing one round of Wordle:

playGame <- function(dictionary, wordLength = 5, nGuesses = 6) {
  # select an answer
  possibleAnswers <- dictionary[nchar(dictionary) == wordLength]
  answer <- sample(possibleAnswers, 1)
  answerVec <- strsplit(answer, "")[[1]]
  
  print(paste("You have", nGuesses, "chances to guess a word of length", 
              wordLength))
  
  guessCnt <- 0
  lettersLeft <- LETTERS
  while (guessCnt < nGuesses) {
    # display "keyboard"
    print(paste(c("Letters left:", lettersLeft), collapse = " "))
    
    # read in guess
    guessCnt <- guessCnt + 1
    guess <- readline(paste0("Enter guess ", guessCnt, ": "))
    while (nchar(guess) != wordLength) {
      guess <- readline(paste0("Guess must have ", wordLength, " characters: "))
    }
    guess <- toupper(guess)
    guessVec <- strsplit(guess, "")[[1]]
    
    # evaluate guess and update keyboard
    resVec <- evaluateGuess(guessVec, answerVec)
    
    # update keyboard
    lettersLeft <- setdiff(lettersLeft, guessVec)
    
    # print result
    print(paste(strsplit(guess, "")[[1]], collapse = " "))
    print(paste(resVec, collapse = " "))
    if (all(resVec == "G")) {
      print("You won!")
      return(guessCnt)
    }
  }
  print(paste("Sorry, you lost! Answer was ", answer))
  return(guessCnt)
}

Finally, we need to get a dictionary of words for our function playGame to choose words from. I found at least two possibilities: Collin’s Scrabble word list and the 10,000 most common English words according to Google’s Trillion Word Corpus. Here’s how you can read each of them into R after downloading the text file (you might need to change the file path):

# scrabble
dictionary <- read.csv("../Dictionaries/Collins Scrabble Words (2019).txt", 
                       header = FALSE, skip = 2)[, 1]

# google
dictionary <- read.csv("../Dictionaries/google-10000-english-usa-no-swears.txt",
                       head = FALSE)[, 1]

We’re now ready to play the game! Here’s an example of one call of playGame():

playGame(dictionary)
# [1] "You have 6 chances to guess a word of length 5"
# [1] "Letters left: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
# Enter guess 1: state
# [1] "S T A T E"
# [1] "Y - Y G Y"
# [1] "Letters left: B C D F G H I J K L M N O P Q R U V W X Y Z"
# Enter guess 2: Seat
# Guess must have 5 characters: seats
# [1] "S E A T S"
# [1] "Y G Y G -"
# [1] "Letters left: B C D F G H I J K L M N O P Q R U V W X Y Z"
# Enter guess 3: pesta
# [1] "P E S T A"
# [1] "- G G G G"
# [1] "Letters left: B C D F G H I J K L M N O Q R U V W X Y Z"
# Enter guess 4: vesta
# [1] "V E S T A"
# [1] "G G G G G"
# [1] "You won!"
# [1] 4

While this is playable, there are a number of ways this can be improved. If you are interested, perhaps you can implement these changes! (Some are much harder than others!)

  • For each guess, I only check that the input was of the correct length. There are a number of different checks that should also be implemented:
    • All the characters in the input string should be letters.
    • The guess should be a valid word.
    • In hard mode, any revealed hints must be used in all subsequent guesses.
  • One key UI component that makes Wordle fun is its keyboard: it tells you which letters you have used already and whether they resulted in yellow or green tiles (or neither). See figure at the end of this post for an example of this. In my version, I only display letters that have never been used in past guesses.
    • If you’ve implemented hard mode, determining which letters on the keyboard should be green or yellow is easy.
    • If hard mode is not on, determining which letters should be green or yellow is not so straightforward because (i) past hints might not be used in the current guess, (ii) letters which were yellow can turn green, and (iii) the situation is trickier if letters are repeated.
  • How do I handle cases where the guess and/or the answer has repeatedly letters? How might you change the code if you want different behavior?
  • This version of Wordle is played in the console. Can you make a version that has a UI like the actual game (maybe a Shiny app)?
  • The playability of Wordle depends quite a lot on the dictionary you give it. Are there better dictionaries out there?
  • Train an AI algorithm to play (this version of) Wordle optimally.

Wordle keyboard from an unspecified round.

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)