Index  Comments

I find myself at a loss of what to write lately and currently, as none of my newer work is currently in a finished or otherwise presentable state; I believe I'll make this month one of rebuttals and of articles that are in any case spurred by other articles I've read elsewhere.

This article was spurred by Beating C With 80 Lines Of Haskell: Wc.

The partner of this is my 2019-11-11 article which concerns this same basic program in Ada.

As I was writing the Ada implementation, I wrote one in Common Lisp as a reprieve, as I'm still more familiar with this language. I used the most naive approach that would work and, surprisingly, this implementation is competitive with the C implementation on my machines. I figure the buffering done is more than sufficient to trivially optimize this. Follows is the program:

;WC - Count the characters, lines, and words from a file in Common Lisp.
;Copyright (C) 2019 Prince Trippy programmer@verisimilitudes.net .

;This program is free software: you can redistribute it and/or modify it under the terms of the
;GNU Affero General Public License version 3 as published by the Free Software Foundation

;This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
;even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
;See the GNU Affero General Public License for more details.

;You should have received a copy of the GNU Affero General Public License along with this program.
;If not, see <http://www.gnu.org/licenses/>.

(cl:defpackage #:wc
  (:documentation "This package implements a basic semantic count over streams and strings.")
  (:use #:common-lisp)
  (:export #:word-count #:word-count-from-string))

(defun word-count (&optional (stream *standard-input*) &aux (*standard-input* stream))
  "Return the line count, word count, and character length of STREAM in order as multiple values."
  (loop :with read-line := 0 :with word-count := 0 :with count := 0 :with truename := nil
        :for character := (read-char *standard-input* nil) :while character
        :do (case character
              (#\newline (setq truename nil) (incf read-line))
              ((#\space #.(name-char "Tab")) (setq truename nil))
              (t (and (graphic-char-p character) (not truename)
                      (incf word-count) (setq truename t))))
            (incf count)
        :finally (return (values read-line word-count count))))

(defun word-count-from-string (string &key ((:start first) 0) ((:end last)))
  "Return the line count, word count, and character length of STRING in order as multiple values."
  (with-input-from-string (*standard-input* string :start first :end last)
    (word-count)))

I still make the claim that my language of choice is better than C. This implementation took all of a few minutes to write and yet is already so competitive with the C that I feel no need bothering to optimize it in any way; a C programmer may claim that the C is ever so slightly faster or some other such thing and entirely miss the differences concerning ease of development and debugging. The Lisp will not suffer memory flaws or other such things and may be trivially improved, unlike the C.

It does help that wc is such a useless and trivial program that it doesn't rightly benefit from such interfaces only exposed to C which languages such as Common Lisp avoid out of good taste and a sense of proper design. That is to write that POSIX and C don't conspire so effectively against others in implementing this program.

There may be differences between POSIX wc and this involving how characters are treated as words and I'd rather argue the former is erroneous in its treatment of punctuation as ``words'', although it's largely irrelevant for my purposes. I've taken a glance at the standard and it seems my use of that GRAPHIC-CHAR-P is largely sufficient with regards to this, however.

This should all make it rather clear that most POSIX utilities are not and that proper separation of functionality lies along subprogram and not program boundaries in this manner. A criticism would be pointing out that the C program must initialize and parse arguments and other such things, and yet I believe this is more points for the Common Lisp than against, as it's entirely unreasonable to waste so many resources for such a trivial result. That is to write that speed comparisons were performed by testing the Common Lisp function already loaded with the CL:TIME function and the C was tested by using the time program.

I've deigned to measure the performance of my program suitably for presentation here. That wc being measured is GNU wc and the Common Lisp used is SBCL; each gave the following for the line, word, and character counts:

746396 7948248 46636548
C:
real	0m1.260s
user	0m1.235s
sys	0m0.024s
Common Lisp:
1.601 seconds of real time
1.600694 seconds of total run time (1.588926 user, 0.011768 system)
100.00% CPU
4,257,184,792 processor cycles
4,232 bytes consed

I've also decided to see the performance if I declare the count variables as FIXNUMs:

1.377 seconds of real time
1.377323 seconds of total run time (1.365612 user, 0.011711 system)
100.00% CPU
3,663,241,188 processor cycles
3,912 bytes consed