• Word lists - Separating into different files according to word-length (and removing symbol words)

    Hello. I am posting because I thought others may find this useful.

    I have written some VB code which takes a word list (a single text file with 1 word per line: OriginalList.txt), then goes through the entire list and separates the words into different files, according to the length of the word. Ie:
    -2_letters.txt
    -3_letters.txt
    -4_letters.txt
    etc.

    Words that have symbols (apostrophe, etc) are not included in the above files, and get dumped into another file (invalid.txt). Also, words < 2 or > 6 letters also get dumped into this file.

    The procedure takes a few minutes to complete, depending on how quick your PC is.

    I originally wrote this code to prepare some word lists, for use in my game "Covert Word" (winner of the API competition). Covert Word

    I will be using it again, once Oxford have released the new WordList endpoint that they are working on, that has all forms of words (run, runs, ran, running, etc), instead of just the base-forms.

    This is the code. This deals with words 2 letters up to 6 letters in length, but could be modified for more.

    Imports System.IO
    
    
    Public Class Form1
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    
            Dim pth As String
            pth = "C:\newWordLists\"
    
            Using r As StreamReader = New StreamReader(pth & "OriginalList.txt")
    
                Dim wrd As String
    
                wrd = r.ReadLine
    
                wrd = Trim(wrd)
                wrd = wrd.ToLower
    
                Do While (Not wrd Is Nothing)
    
                    'change this line if less than 2, or more than 6 letters is required:
    
                    If wrd.Length < 2 Or wrd.Length > 6 Or Not isAlpha(wrd) Then 'if word is invalid
    
                        My.Computer.FileSystem.WriteAllText(pth & "invalid.txt", Chr(13) & Chr(10) & wrd, True)
    
                    Else 'word is valid
    
    
                        If wrd.Length = 2 Then
                            My.Computer.FileSystem.WriteAllText(pth & "2_letters.txt", Chr(13) & Chr(10) & wrd, True)
                        ElseIf wrd.Length = 3 Then
                            My.Computer.FileSystem.WriteAllText(pth & "3_letters.txt", Chr(13) & Chr(10) & wrd, True)
                        ElseIf wrd.Length = 4 Then
                            My.Computer.FileSystem.WriteAllText(pth & "4_letters.txt", Chr(13) & Chr(10) & wrd, True)
                        ElseIf wrd.Length = 5 Then
                            My.Computer.FileSystem.WriteAllText(pth & "5_letters.txt", Chr(13) & Chr(10) & wrd, True)
                        ElseIf wrd.Length = 6 Then
                            My.Computer.FileSystem.WriteAllText(pth & "6_letters.txt", Chr(13) & Chr(10) & wrd, True)
    
                         'add more of the above here if less than 2, or more than 6 letters is required
    
                        End If
    
    
    
                    End If
    
                        wrd = r.ReadLine
                Loop
    
            End Using
    
            MsgBox("finished")
    
        End Sub
    
    
    
    
        Function isAlpha(ByVal str As String) As Boolean
            Dim iPos As Integer
            Dim bolValid As Boolean
            iPos = 1
            bolValid = True
    
            While iPos <= Len(str) And bolValid
                If Asc(UCase(Mid(str, iPos, 1))) < Asc("A") Or _
                 Asc(UCase(Mid(str, iPos, 1))) > Asc("Z") Then _
                  bolValid = False
    
                iPos = iPos + 1
            End While
    
            isAlpha = bolValid
        End Function
    
    End Class