This follows from the Elision toki pona CLOS Mapping Experiment, and is much nicer from my view. I'd made many mistakes with that, one being not using Common Lisp's array displacement functionality for silly reasons; it's not reasonable to attempt to optimize the word storage for this toy, and I'm seeing a pattern, in that I've repeatedly jumped over the simpler and more pure implementation, only to return to it in any case. My small APL program nicely implements an Elision targeting toki pona:
d←'aeijklmnopstuw ' ⍝ The d is the character domain of toki pona, and the tp is the word dictionary. tp←(1⍴'a') 'akesi' 'ala' 'alasa' 'ale' 'ali' 'anpa' 'ante' 'anu' 'awen' (1⍴'e') 'en' 'esun' 'ijo' tp←tp,'ike' 'ilo' 'insa' 'jaki' 'jan' 'jelo' 'jo' 'kala' 'kalama' 'kama' 'kasi' 'ken' 'kepeken' tp←tp,'kili' 'kin' 'kiwen' 'ko' 'kon' 'kule' 'kulupu' 'kute' 'la' 'lape' 'laso' 'lawa' 'len' 'lete' tp←tp,'li' 'lili' 'linja' 'lipu' 'loje' 'lon' 'luka' 'lukin' 'lupa' 'ma' 'mama' 'mani' 'meli' 'mi' tp←tp,'mije' 'moku' 'moli' 'monsi' 'mu' 'mun' 'musi' 'mute' 'namako' 'nanpa' 'nasa' 'nasin' 'nena' tp←tp,'ni' 'nimi' 'noka' (1⍴'o') 'oko' 'olin' 'ona' 'open' 'pakala' 'pali' 'palisa' 'pan' 'pana' tp←tp,'pi' 'pilin' 'pimeja' 'pini' 'pipi' 'poka' 'poki' 'pona' 'pu' 'sama' 'seli' 'selo' 'seme' tp←tp,'sewi' 'sijelo' 'sike' 'sin' 'sina' 'sinpin' 'sitelen' 'sona' 'soweli' 'suli' 'suno' 'supa' tp←tp,'suwi' 'tan' 'taso' 'tawa' 'telo' 'tenpo' 'toki' 'tomo' 'tu' 'unpa' 'uta' 'utala' 'walo' 'wan' tp←tp,'waso' 'wawa' 'weka' 'wile' (0⍴0) (0⍴0) (0⍴0) (0⍴0) ⍝ Transform a toki pona language string to its Elision representation, but with no magnitude limits. ∇ e←elision s;t;a;⎕IO ⍝ The first line ensures the parameter adheres to that domain of the function. ⎕IO←0◊s←,s◊⎕ES 0⊃(⍲/s∊d)/⊂'The argument contained characters not part of toki pona.' t←(s≠' ')⊂s◊a←(128=tp⍳t)/t◊→3⍴⍨0=⍴a◊a←a[d⍋⊃a] ⍝ Separate words. Create that auxiliary dictionary. e←(⊂a),t⍳⍨tp,a ⍝ Combine the dictionaries, search them again, and fuse this for that final result. ∇
This program is licensed under the GNU Affero General Public License version three.
The elision function uses a neat trick I learned to break a vector into a vector of vectors based on another vector, and then merely searches the dictionary to assemble the auxiliary dictionary, before then giving each word its end code. The result is a vector of the auxiliary dictionary and indices.
I want only the simplest and barest algorithms in Elision; they must remain that most uninteresting, so boring, and basest aspect of the system. Only the language tables, and human descriptions of how to use them, should be pretty and valuable. In my mind, the tables float in a void, with algorithms merely being those lines which connect them to one another, perhaps passing through another, during.
Follows is a simple example of how to retrieve the character vector from the Elision representation:
r←elision 'mi pana e pelisimilitu'◊r ⍝ I exude verisimilitude. ┌5────────────────────────────┐ │┌1─────────────┐ 54 80 10 128│ ││┌12──────────┐│ │ │││pelisimilitu││ │ ││└────────────┘│ │ │└∊─────────────┘ │ └∊∊───────────────────────────┘ ¯1↓1↓⍕(tp,↑r)[1↓r] ⍝ I should've known to use ⍕ much earlier; the two drops merely remove spacing. ┌22────────────────────┐ │mi pana e pelisimilitu│ └──────────────────────┘
While a much better fit than Common Lisp, even APL got in my way. The right arrow used for skipping the dictionary sorting is only necessary because the empty array generated has a numerical prototype and, while I could try to force it to have a character prototype, I can't rely on it, in good faith.
Knowing the general behaviour of APL, testing on categories of input was fairly easy; I noticed some flaws when testing with empty or scalar inputs, as examples. Being able to craft the few algorithms needed precisely to suit Elision and nothing else will result in a much better implementation; I may find myself writing this again, with Ada, before finally moving to the English Elision. It was fun.
Follows is the elision function with unconditional code, which should work when using GNU APL:
∇ e←elision s;t;a;⎕IO ⎕IO←0◊s←,s◊⎕ES 0⊃(⍲/s∊d)/⊂'The argument contained characters not part of toki pona.' t←(s≠' ')⊂s◊a←((128=tp⍳t)/t),''◊a←a[d⍋⊃a]◊e←(⊂a),t⍳⍨tp,a ∇