- 305 All Categories
- 2 Language datasets
- 5 News and updates
- 16 API endpoints
- 14 Review my code
- 6 Tutorials and presentations
- 91 Frequently asked questions
- 5 How to get useful technical help
- 2 Member guidelines
- 12 Suggest an improvement
- 57 Report a bug
- 8 Ask the Community: Other
- 54 Ask the Community: Technical and operational questions
- 55 General
Hello. I am posting because I thought others may find this useful.
I have written some VB code which takes a word list (a single text file with 1 word per line: OriginalList.txt), then goes through the entire list and separates the words into different files, according to the length of the word. Ie:
Words that have symbols (apostrophe, etc) are not included in the above files, and get dumped into another file (invalid.txt). Also, words < 2 or > 6 letters also get dumped into this file.
The procedure takes a few minutes to complete, depending on how quick your PC is.
I originally wrote this code to prepare some word lists, for use in my game "Covert Word" (winner of the API competition). Covert Word
I will be using it again, once Oxford have released the new WordList endpoint that they are working on, that has all forms of words (run, runs, ran, running, etc), instead of just the base-forms.
This is the code. This deals with words 2 letters up to 6 letters in length, but could be modified for more.
Imports System.IO Public Class Form1 Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim pth As String pth = "C:\newWordLists\" Using r As StreamReader = New StreamReader(pth & "OriginalList.txt") Dim wrd As String wrd = r.ReadLine wrd = Trim(wrd) wrd = wrd.ToLower Do While (Not wrd Is Nothing) 'change this line if less than 2, or more than 6 letters is required: If wrd.Length < 2 Or wrd.Length > 6 Or Not isAlpha(wrd) Then 'if word is invalid My.Computer.FileSystem.WriteAllText(pth & "invalid.txt", Chr(13) & Chr(10) & wrd, True) Else 'word is valid If wrd.Length = 2 Then My.Computer.FileSystem.WriteAllText(pth & "2_letters.txt", Chr(13) & Chr(10) & wrd, True) ElseIf wrd.Length = 3 Then My.Computer.FileSystem.WriteAllText(pth & "3_letters.txt", Chr(13) & Chr(10) & wrd, True) ElseIf wrd.Length = 4 Then My.Computer.FileSystem.WriteAllText(pth & "4_letters.txt", Chr(13) & Chr(10) & wrd, True) ElseIf wrd.Length = 5 Then My.Computer.FileSystem.WriteAllText(pth & "5_letters.txt", Chr(13) & Chr(10) & wrd, True) ElseIf wrd.Length = 6 Then My.Computer.FileSystem.WriteAllText(pth & "6_letters.txt", Chr(13) & Chr(10) & wrd, True) 'add more of the above here if less than 2, or more than 6 letters is required End If End If wrd = r.ReadLine Loop End Using MsgBox("finished") End Sub Function isAlpha(ByVal str As String) As Boolean Dim iPos As Integer Dim bolValid As Boolean iPos = 1 bolValid = True While iPos <= Len(str) And bolValid If Asc(UCase(Mid(str, iPos, 1))) < Asc("A") Or _ Asc(UCase(Mid(str, iPos, 1))) > Asc("Z") Then _ bolValid = False iPos = iPos + 1 End While isAlpha = bolValid End Function End Class