The ``von Neumann architecture'' should be called the ``Alan Turing architecture''; in any case, the bottleneck of this design is the need to repeatedly transfer information between that base of memory and the processor itself, much of it not at all central to the calculations being performed. Review the document ``Can Programming Be Liberated from the von Neumann Style?'' which originated the term.
I've seen the solution to this bottleneck described as making processors more parallel, placing more operations with memory, etc., but never phrased in this simple way I know: It's a lack of structure.
A machine that more intimately knows what it manipulates may process it with fewer instructions, and this is obvious. Were there an ``each'' primitive in hardware with knowledge of an array, or linked structures, it would be capable of trivially processing each element in a predefined fashion without further instructions for doing this. The most structure a typical machine has is an implicit array.
This obviously extends to any knowledge of greater structure given to the machine; I wonder why I've never seen the solution phrased in this way. It's obvious, but also a great leap, and those popular solutions involve merely adding something extremely specific and so limited to the machine, instead.