My programming so far along this year has disappointed me. I've decided I should release more small works, to regularly program again, rather than only think about it, and to release anything, at all.
I wanted to have a working English Elision for this, but the basic program doesn't require it. This algorithm provides a hash, or summary, intended for words, to make linearly comparing them feasible. Acknowledging the domain of the sequences analyzed makes this easy, whereas the more general problem could seem very hard. The algorithm simply provides a set of counts for the entirety of the domain.
These programs are licensed under the GNU Affero General Public License version three.
The APL is the simplest and easiest to write, too small to warrant a name; the domain is named by d:
d←'abcdefghijklmnopqrstuvwxyz ' +/d∘.='test' 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 ⍝ Further analysis, such as determing the total character differences, is trivial. +/|(+/d∘.='treat')-+/d∘.='test' 3
A primary purpose for this is efficiently detecting anagrams. As is noticed, a comparison tolerance can be introduced to make detection of near-anagrams easy, using varying ideas of what tolerance is.
This Common Lisp pleases me least, as coercion of types such as characters to integers is too vague. Due to this, it simply uses a hash table for count storage, but can handle a much more vague domain:
(defun summarize (sequence &key ((:key every) 'identity) ((:test get) 'eql) ((:size length))) "Return a hash-table summarizing the input sequence as counts of each element. The size argument is a performance hint determining the initial size thereof. The test argument must be a valid hash-table test: EQ, EQL, EQUAL, or EQUALP." (let ((hash-table (apply 'make-hash-table :test get (if length `(:size ,length))))) (map nil (lambda (elt) (incf (gethash (funcall every elt) hash-table 0))) sequence) hash-table)) (defun anagramp (&rest rest) "This predicate returns T if all arguments are anagrams, or else NIL." (prog1 t (mapl (lambda (list &aux (first (first list)) (second (second list))) (and second (or (equalp first second) (return-from anagramp)))) (mapcar 'summarize rest))))
The Ada is even more trivial than my previous generic Ada subprogram:
generic type Count_Type is (<>); type Index_Type is (<>); type Domain_Type is (<>); type Vector_Type is array ( Index_Type range <>) of Domain_Type; type Result_Type is array (Domain_Type) of Count_Type; procedure Summarize (Data : in Vector_Type; Output : out Result_Type);
procedure Summarize (Data : in Vector_Type; Output : out Result_Type) is begin Output := (others => Count_Type'First); for D in Data'Range loop Output(Data(D)) := Count_Type'Succ(Output(Data(D))); end loop; end Summarize;
Follows are two test programs, the latter massively complicated by type concerns. Using any modular type for Count_Type is not an option, and so overflow is possible; the Count_Type should be selected to make overflow impossible anyway, for the particulars. The latter program handles this edge case:
with Summarize, Ada.Text_IO; procedure Character_Test is type Count is array (Character) of Natural; procedure Wordplay is new Summarize ( Count_Type => Natural, Index_Type => Positive, Domain_Type => Character, Vector_Type => String, Result_Type => Count); package Integer_IO is new Ada.Text_IO.Integer_IO (Natural); C : Count; begin Wordplay(Ada.Text_IO.Get_Line, C); for I in C'Range loop if C(I) /= 0 then Ada.Text_IO.Put(I); Integer_IO.Put(C(I)); Ada.Text_IO.New_Line; end if; end loop; end Character_Test;
with Summarize, Ada.Text_IO; procedure Alphabet_Test is type Alphabet is ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' '); type Alphabetical is array (Positive range <>) of Alphabet; type Count is array (Alphabet) of Natural; procedure Wordplay is new Summarize ( Count_Type => Natural, Index_Type => Positive, Domain_Type => Alphabet, Vector_Type => Alphabetical, Result_Type => Count); package Integer_IO is new Ada.Text_IO.Integer_IO (Natural); package Enumeration_IO is new Ada.Text_IO.Enumeration_IO (Alphabet); -- What an absurd bother this is; I currently see no other way for making these types compatible. function To_Alphabet (Data : in Character) return Alphabet is Conversion : array (Alphabet) of Character := "abcdefghijklmnopqrstuvwxyz "; begin for C in Conversion'Range loop if Conversion(C) = Data then return C; end if; end loop; end To_Alphabet; -- This conversion function is necessary, for using those nice predefined subprograms for String. function To_Alphabetical (Data : in String) return Alphabetical is A : Alphabetical(Data'Range); begin for C in A'Range loop A(C) := To_Alphabet(Data(C)); end loop; return A; end To_Alphabetical; C : Count; begin loop begin Wordplay(To_Alphabetical(Ada.Text_IO.Get_Line), C); exit; exception when others => Ada.Text_IO.Put_Line("The input had invalid characters or was too long."); end; end loop; for I in C'Range loop Enumeration_IO.Put(I); Integer_IO.Put(C(I)); Ada.Text_IO.New_Line; end loop; end Alphabet_Test;