After explaining how I planned the roots and how I created a database for them, I am ready to and happy to share the results!
The 4-phoneme disyllabic roots: «kaka, kana, lapa, mama, masa, naka, nana, pala, papa, saha, sama, sasa, tata, taya, yata».
The 5-phoneme disyllabic roots: «hahan, hakam, halan, halma, halta, hamam, hamla, hanal, hanpa, hanya, hapal, hapya, hasam, hatla, hayal, kalpa, kalsa, kalya, kampa, kamsa, kanta, kapan, kapla, kasan, kasla, katal, katam, katya, kayan, klata, kyapa, kyasa, kyaya, lalma, lalsa, laman, lamsa, lanam, lanta, lanya, lasan, lasla, latal, layal, layam, mahal1, maham, malal, malam, malna, malpa, malya, mamna, mampa, manan, manla, mapan, mapla, matya, mayan, mlaha, mlala, myana, myapa, myaya, nahan, nalan, nalta, namal, namam, namta, namya, nanma, nansa, nasal2, nasam, nasya, natla, nlama, nlasa, nlaya, nyaha, pakal, pakya, pamal, pamta, pamya, panam, panka, panma, pansa, pasal, pasya, patan, payam, plaka, plama, plasa, pyaha, sakal, sakam, sakya, salal, salam3, salna, salpa, salya, samna, sampa, sanan, sanka, sanla, sapan, sapla, sayan, slaka, slala, syana, syapa, syaya, tahal, taham, takan, takla, talka, tamka, tlaha, tlana, tyaka, tyala, tyama, yahan, yakal, yakam, yakya, yalan, yalna, yamal, yamam, yamna, yamya, yanka, yanma4, yansa, yasal, yasam, yasya».
Don’t mind my stupid jokes.
The 6-phoneme disyllabic roots: «hatyam, haklan, hasyal, hamyan».
As I mentioned, I expected to have a minimum total of 165 4-phoneme and 5-phoneme disyllabic roots, and I ended up with 166 after deciding to merge certain similar-sounding roots, but, in the end, I only scraped by with 162 because fitting in the numerals broke so many patterns. Furthermore, the breaking of those patterns will also ripple into what 6-phoneme disyllabic roots I’ll be able to use.
However, I’m not dismayed. I’ll use as many 6-phoneme disyllabic roots as I can. My new secret weapon is 6-phoneme trisyllabic roots! Why? Because I did find the source that I was looking for:
Pellegrino, F., Coupé, C., & Marsico, E. (2011). A cross-language perspective on speech information rate. Language, 539-558.
My takeaway is that information efficiency is fairly constant depending on the constituents (in my case, just phonemes) in a syllable. What that means is that the number of phonemes is more important than the number of syllables in terms of how quickly a word can be spoken to get information across. For example, Spanish and Japanese have simpler syllables but tend to have longer words, yet they are spoken a bit faster because they have simpler syllables, so the information rate stays about the same.
Previously, I had tried to put off using trisyllabic roots for as long as possible, but it seems like they might just be my secret weapon in terms of eking out efficient words in the wake of losing quite a few 5-phoneme roots.
For now, the plan is to take these roots and start creating words!