Index  Comments

My programming so far along this year has disappointed me. I've decided I should release more small works, to regularly program again, rather than only think about it, and to release anything, at all.

I wanted to have a working English Elision for this, but the basic program doesn't require it. This algorithm provides a hash, or summary, intended for words, to make linearly comparing them feasible. Acknowledging the domain of the sequences analyzed makes this easy, whereas the more general problem could seem very hard. The algorithm simply provides a set of counts for the entirety of the domain.

These programs are licensed under the GNU Affero General Public License version three.

The APL is the simplest and easiest to write, too small to warrant a name; the domain is named by d:

  d←'abcdefghijklmnopqrstuvwxyz '
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0
  ⍝ Further analysis, such as determing the total character differences, is trivial.

A primary purpose for this is efficiently detecting anagrams. As is noticed, a comparison tolerance can be introduced to make detection of near-anagrams easy, using varying ideas of what tolerance is.

This Common Lisp pleases me least, as coercion of types such as characters to integers is too vague. Due to this, it simply uses a hash table for count storage, but can handle a much more vague domain:

(defun summarize (sequence &key ((:key every) 'identity) ((:test get) 'eql) ((:size length)))
  "Return a hash-table summarizing the input sequence as counts of each element.
The size argument is a performance hint determining the initial size thereof.
The test argument must be a valid hash-table test: EQ, EQL, EQUAL, or EQUALP."
  (let ((hash-table (apply 'make-hash-table :test get (if length `(:size ,length)))))
    (map nil (lambda (elt) (incf (gethash (funcall every elt) hash-table 0))) sequence)

(defun anagramp (&rest rest)
  "This predicate returns T if all arguments are anagrams, or else NIL."
  (prog1 t (mapl (lambda (list &aux (first (first list)) (second (second list)))
                   (and second (or (equalp first second) (return-from anagramp))))
                 (mapcar 'summarize rest))))

The Ada is even more trivial than my previous generic Ada subprogram:

   type  Count_Type is (<>);
   type  Index_Type is (<>);
   type Domain_Type is (<>);
   type Vector_Type is array ( Index_Type range <>) of Domain_Type;
   type Result_Type is array (Domain_Type) of Count_Type;
procedure Summarize (Data : in Vector_Type; Output : out Result_Type);


procedure Summarize (Data : in Vector_Type; Output : out Result_Type) is
   Output := (others => Count_Type'First);
   for D in Data'Range loop
      Output(Data(D)) := Count_Type'Succ(Output(Data(D)));
   end loop;
end Summarize;

Follows are two test programs, the latter massively complicated by type concerns. Using any modular type for Count_Type is not an option, and so overflow is possible; the Count_Type should be selected to make overflow impossible anyway, for the particulars. The latter program handles this edge case:


with Summarize, Ada.Text_IO;

procedure Character_Test is
   type Count is array (Character) of Natural;
   procedure Wordplay is new Summarize
     ( Count_Type => Natural, Index_Type => Positive, Domain_Type => Character,
      Vector_Type => String, Result_Type => Count);
   package Integer_IO is new Ada.Text_IO.Integer_IO (Natural);
   C : Count;
   Wordplay(Ada.Text_IO.Get_Line, C);
   for I in C'Range loop
      if C(I) /= 0 then
         Ada.Text_IO.Put(I); Integer_IO.Put(C(I)); Ada.Text_IO.New_Line;
      end if;
   end loop;
end Character_Test;


with Summarize, Ada.Text_IO;

procedure Alphabet_Test is
   type Alphabet is ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
                     'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ');
   type Alphabetical is array (Positive range <>) of Alphabet;
   type Count is array (Alphabet) of Natural;
   procedure Wordplay is new Summarize
     ( Count_Type =>      Natural,  Index_Type => Positive, Domain_Type => Alphabet,
      Vector_Type => Alphabetical, Result_Type => Count);
   package     Integer_IO is new Ada.Text_IO.Integer_IO (Natural);
   package Enumeration_IO is new Ada.Text_IO.Enumeration_IO (Alphabet);
   -- What an absurd bother this is; I currently see no other way for making these types compatible.
   function To_Alphabet (Data : in Character) return Alphabet is
      Conversion : array (Alphabet) of Character := "abcdefghijklmnopqrstuvwxyz ";
      for C in Conversion'Range loop
         if Conversion(C) = Data then
            return C;
         end if;
      end loop;
   end To_Alphabet;
   -- This conversion function is necessary, for using those nice predefined subprograms for String.
   function To_Alphabetical (Data : in String) return Alphabetical is
      A : Alphabetical(Data'Range);
      for C in A'Range loop
         A(C) := To_Alphabet(Data(C));
      end loop;
      return A;
   end To_Alphabetical;
   C : Count;
         Wordplay(To_Alphabetical(Ada.Text_IO.Get_Line), C);
         when others => Ada.Text_IO.Put_Line("The input had invalid characters or was too long.");
   end loop;
   for I in C'Range loop
      Enumeration_IO.Put(I); Integer_IO.Put(C(I)); Ada.Text_IO.New_Line;
   end loop;
end Alphabet_Test;