In my end of February report, I mentioned that I was able to create a database to help me create disyllabic roots while preserving Hamming distance. One big part of that was devising a macro for Excel that would do the following:
- Range A is a bank of possible (according to the rules of my language) disyllabic roots. I generated this using Zompist’s Gen.
- Range B is where I input the roots I have chosen.
- Range C is a bank of roots that conflict with the roots in Range B. I also generated this using Gen with some really roundabout tricks.
- Rule 1: If a cell appears in Range A and Range C, it is highlighted yellow.
- Rule 2: If a cell appears in Range A and Range B, it is highlighted green.
- Rule 3: If a cell appears in Range B more than twice, it is highlighted red.
- Rule 4: If I a cell appears in Range B and Range C, it is highlighted red.
- Rule 5: Otherwise, a cell should not be highlighted.
Thus, any white cells in Range A were roots that could still be used because they didn’t conflict with anything else. Even though the highlighting was all done by a macro, there was still a significant portion of manual work that took a few hours. Much more time went into figuring out how to get the most efficient set of disyllabic roots. By that, I mean that I had to figure out how to get as many roots as possible that didn’t conflict with each other from the total bank of possible roots. It always comes back to Hamming distance!
[su_spoiler title=”Macro for disyllabic roots database” open=”no” style=”default” icon=”plus” anchor=”” class=””]
Sub HighlightDuplicates() 'Keyboard Shortcut: Ctrl+Shift+D Application.ScreenUpdating = False Dim ws As Worksheet, t0 As Single, t1 As Single Set ws = ThisWorkbook.Sheets("Database") t0 = Timer 'Rule 5: Otherwise, a cell should not be highlighted. ws.cells.Interior.Color = xlNone Const RANGE_A As String = "B1:E2300" Const RANGE_B As String = "G1:G2300" Const RANGE_C As String = "I1:AH2300" Dim dictA As Object, dictB As Object, dictC As Object Set dictA = CreateObject("Scripting.Dictionary") Set dictB = CreateObject("Scripting.Dictionary") Set dictC = CreateObject("Scripting.Dictionary") Call buildDict(dictA, ws.range(RANGE_A)) Call buildDict(dictB, ws.range(RANGE_B)) Call buildDict(dictC, ws.range(RANGE_C)) 'Rule 1: If a cell appears in Range A and Range C, 'I want them highlighted yellow. 'Rule 2: Then, if a cell appears in Range A and Range B, 'I want them highlighted green. Dim cell As range, key As String For Each cell In ws.range(RANGE_A) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictC.exists(key) Then cell.Interior.Color = vbYellow If dictB.exists(key) Then cell.Interior.Color = vbGreen End If Next For Each cell In ws.range(RANGE_C) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictA.exists(key) Then cell.Interior.Color = vbYellow End If Next For Each cell In ws.range(RANGE_B) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictA.exists(key) Then cell.Interior.Color = vbGreen End If Next 'Rule 3: Then, if a cell appears in Range B more than twice, 'I want them highlighted red. 'Rule 4: Then, if a cell appears in Range B and Range C, 'I want them highlighted red. For Each cell In ws.range(RANGE_B) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictB.exists(key) Then If dictB.Item(key) > 1 Then cell.Interior.Color = vbRed End If End If End If Next For Each cell In ws.range(RANGE_B) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictC.exists(key) Then If dictC.Item(key) > 0 Then cell.Interior.Color = vbRed End If End If End If Next For Each cell In ws.range(RANGE_C) If Len(cell.Value) > 0 Then key = CStr(cell.Value) If dictB.exists(key) Then If dictC.Item(key) > 0 Then cell.Interior.Color = vbRed End If End If End If Next t1 = Timer 'MsgBox "Completed in " & Int(t1 - t0) & " seconds" Application.ScreenUpdating = True End Sub
[/su_spoiler]
You can click on the plus icon or the name to expand that section to see the macro. It’s a biggun, so I decided that it was better to default to it being collapsed and not immediately assaulting any eyeballs.
Now, to even generate Range A, as I said, I used Zompist’s Gen tool. Making rules to create all possible disyllabic roots in greyfolk language was easy.
[su_spoiler title=”Categories for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]
C=mnptksylh S=yl A=a T=mnl
[/su_spoiler]
[su_spoiler title=”Rewrite rules for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]
hl|h hy|h lh|l ll|l yy|y mh|m mm|m nh|n nn|n
[/su_spoiler]
[su_spoiler title=”Syllable types for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]
CACA CASA SACA SASA CSACA CSASA CATCA CATSA SATCA SATSA CACSA CACAT CASAT SACSA SACAT SASAT CSATCA CSATSA CACSAT SACSAT CSACSA CSACAT CSASAT CATCSA CATCAT CATSAT SATCSA SATCAT SATSAT CSACSAT CATCSAT SATCSAT CSATCSA CSATCAT CSATSAT
[/su_spoiler]
Of course, the output type was for all possible syllables.
To generate Range C in Gen, I had to figure out some really roundabout tricks, and, even then, I still had to generate it one chunk at a time. Think of each disyllabic root as a STUVWXYZ map where each letter corresponds to a phoneme. S and W are the first position of their respective syllable and can be «m, n, p, t, k, s, y, l, h». T and X are the second position of their respective syllable and can be «y, l» or ‘-‘. U and Y are the third position of their respective syllable, but, when working with roots, both of them are always «a». Finally, V and Z are the fourth position of their respective syllable and can be «m, n, l» or ‘-‘. For example, the root «myaman» would be ‘mya-m-an’ because the fourth position of the first syllable and the second position of the second syllable are open. If you look at the above Gen rules, it should be clear that STUVWXYZ is essentially CSATCSAT.
Then, to figure out the roots that wouldn’t be compatible with other roots, which is what Range C is, I had to switch the process for ABCDEFGH. Each letter corresponds to the letter in the same position in STUVWXYZ, but each letter of ABCDEFGH take the values of phonemes that do not have enough Hamming distance from those in STUVWXYZ. So, if S=m, then A=mnp because «m, n, p» conflict with «m» according to my Hamming distance parameters. Oh, I also set a meaningless Q=xxxxxx so I could see the separation between ABCDEFGH and STUVWXYZ at a glance.
[su_spoiler title=”Categories for «myaman»” open=”no” style=”default” icon=”plus” anchor=”” class=””]
A=mnp B=yl C=a D=mnl E=mnp F=yl G=a H=mnl Q=xxxxxx S=m T=y U=a V=- W=m X=- Y=a Z=n
[/su_spoiler]
[su_spoiler title=”Syllable types for conflicting roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]
ATUVWXYZ SBUVWXYZ STCVWXYZ STUDWXYZ STUVEXYZ STUVWFYZ STUVWXYH
[/su_spoiler]
Of course, the output type was for all possible syllables.
[su_spoiler title=”Output for «myaman»” open=”no” style=”default” icon=”plus” anchor=”” class=””]
mla-m-an mya-m-al mya-m-am mya-m-an mya-mlan mya-myan mya-n-an mya-p-an myalm-an myamm-an myanm-an nya-m-an pya-m-an
[/su_spoiler]
From there, it was a matter of removing the dashes. However, an interesting question popped back up. What is the Hamming distance between something like ‘mya-m-an’ and ‘myamm-an’? What about ‘myamh-an’? I went ahead and decided that they were all equivalent (which is actually why ‘myamh-an’ doesn’t generate as I had already taken that into consideration). Because it was a problem I had faced before, I knew how to deal with it. However, another question popped up. What is the Hamming distance between something like ‘myamy-a-‘ and ‘mya-mya-‘? Same thing, I ended up deciding that they were the same as well. Though, they do have enough Hamming distance between them. I just wanted to simplify things and continue to get rid of roots that sounded too alike.
However, I did get some conflicts that weren’t actually conflicts. On the surface, «katya» and «kalya» might seem like they conflict because «t» and «l» conflict in the same position. However, «katya» is «k-a-tya-» (pronounced /ka.tja/) and «kalya» is «k-aly-a-» (pronounced /kal.ja/). So, the «t» and the «l» aren’t actually in conflicting positions.
Anyway, I had to run every planned disyllabic root through Gen, put the output in the database, then format it and remove dashes. It took quite a bit of time to do that manually for over 166 roots—yes, it was more than 166 because I had to add other altered roots at the cost of others after having already processed roots that had to be removed. More than anything else, it was just really repetitive and boring, and I pushed myself so hard during this entire process that I ended up really fried, stressed, and anxious. Oops!
In the next part, I will finally reveal the disyllabic roots!