git @ Cat's Eye Technologies Pophery / master
master

Tree @master (Download .tar.gz)

Pophery

Version 0.1 or something Chris Pressey, Cat's Eye Technologies

Introduction

Pophery is an imperative string-rewriting language. I know right?

In Pophery, each program state is a single string, and a program is simply the initial program state. As execution proceeds, the string is rewritten based on instructions found within the string. Pophery is a "visible" programming language, in the sense that there is no program state that is not part of the string.

Pophery provides primitive instructions which allow the programmer to construct their own control flow mechanisms, including at least conventional backwards-branch looping, but possibly also permitting alternative techniques such as SMITH-style program-extension and Muriel-style quine-continuation.

As a reaction against the proliferation of stack-based esolangs, Pophery's design explicitly avoids having a stack, preferring instead register-like storage in the form of delimited substrings, called slots, which may be accessed directly and indirectly, updated, created, destroyed, and moved about.

Additionally, Pophery has, incidentally during the course of its design, become centrally oriented around the editing metaphors provided both by classic word processors and modern graphical user interfaces — the so-called "copy and paste" operations.

Program Structure

All program state (instructions and variables) are encoded in a single string, which is a finite but unbounded sequence of non-combining Unicode code points. The string may contain any number of locators, which take the form (α) where α is any string which does not contain ( or ) symbols. (In the sequel, Greek letters will denote variables for similar such strings.) Only the rightmost occurrence of the sequence (α) is regarded as the locator, for the purposes of operations on that locator — any other occurrences are ignored.

A pair of locators of the form (^α) and (α$), where (^α) occurs to the left of (α$), is caled a slot. In the sequence (^α)β(α$), α is called the name of the slot, and β is called the contents of the slot.

A slot may contain any number of other locators. In fact, slots can overlap, in the sense that a slot may contain one locator of another slot, but not the other.

A slot can also be referenced indirectly, in which case the contents of the slot gives the name of another slot which is the actual subject of the operation. For example, (^α)β(α$) might refer to a slot (^β)(β$) elsewhere in the same string. We use the terminology slot β to refer to direct access to the slot named β, and slot indirect by β to refer to access to a slot named by the contents of the slot named β.

While the programmer may define, create, and destroy slots as they like, some slots have meaning to Pophery's execution model. Each of these built-in slots has a default name by which it is accessed. However, if a name slot for the built-in slot is present in the program, access is indirect by the name slot. The name slot of a built-in slot named β is named . (A clarifying example will appear shortly.)

A single locator can also sometimes be referenced indirectly; in this case, a slot contains the substring β identifying the locator (β). Locators also support an operation called sliding; they may slide leftward or slide rightward. When sliding rightward (resp. leftward), the character immediately to the right (resp. left) of the locator is transferred to immediately left (resp. right) of that locator. However, there are two exceptions: other locators are disregarded when sliding (they are slid over, and not counted as characters); and when there are no characters to the right of the locator when sliding rightward (resp. left and leftward), neither the locator nor any character moves.

Examples follow. In the program J(X)A, if (X) were to slide leftward the result would be (X)JA and if it were to slide rightward the result would be JA(X).

In J(X)(C)A(D), if (X) were to slide rightward we would have J(C)A(D)(X).

In JA(X), if (X) were to slide rightward, we would still have JA(X).

Finally, in JA(X)(Y), if (X) were to slide rightward, we would still have JA(X)(Y).

An entire slot slides leftward (resp. rightward) when both of its locators slide leftward (resp. rightward.)

Built-in Slots

The most central built-in slot is the instruction slot, from which is fetched the instruction to be executed on any particular rewrite step. The default name of the instruction slot is !. Therefore, in the program (^!)M(!$), the next instruction to be executed will be M. Further, in the program (^`!)k(`!$)(^k)b(k$), the instruction slot, accessed indirectly by `!, is named k, and the next instruction to be executed is b.

Other built-in slots are:

  • The accumulator, by default named ?;
  • The clipboard, by default named %; and
  • The selection, by default named /.

Pursuant to this last built-in slot, when we say a substring is selected, we mean that the selection locators are inserted on either side of it ((^/) on the left and (/$) on the right), and that all other occurrences of these locators elsewhere in the string are removed.

Execution Model

At each rewriting step, the contents of the instruction slot, called the current instruction, are examined. The string is rewritten according to the current instruction. The instruction slot then slides rightward in the string.

Execution halts when there is no instruction slot in the program, or when the contents of the instruction slot have zero length.

When examining the current instruction to determine the command which is executed and how the string will be re-written, we interpret it as follows. We ignore any locators in the current instruction, and we assume it to be one character long — if it is longer, we only regard the leftmost character in it.

Commands

  • 0 through 9 update the accumulator to the literal strings 0 through 9, respectively.
  • X ("cut") erases (updates with the zero-length string) the selection.
  • C ("copy") updates the contents of the clipboard with the contents of the selection.
  • V ("paste") updates the contents of the selection with the contents of the clipboard.
  • S ("select") selects the contents of the slot indirect by the accumulator.
  • A ("select all") selects the contents of the accumulator.
  • L ("left") slides the left locator of the selection leftward.
  • R ("right") slides the left locator of the selection rightward.
  • E ("end") moves the left locator of the selection to immediately to the left of the right locator of the selection, resulting in the selection containing the zero-length string.
  • F ("find") searches everywhere in the contents of the accumulator for the contents of the clipboard. If found, that substring is selected.
  • D ("drag-and-drop") updates the contents of the accumulator with the contents of the selection, then selects the contents of the accumulator.
  • I ("input") waits for a line to appear on standard input, then places it (sans newline) in the accumulator.
  • O ("output") outputs the string in the accumulator to standard output, followed by a newline.

Note that the concepts "standard input" and "standard output" are defined solely by the operating system.

Idioms

We pause to consider some useful idioms constructed from the commands presented thus far.

Assume the inital program defines some slots such as (^0)data(0$) to contain initial data. That data can then be loaded into the accumulator with the sequence 0SCAV, and new data, say the literal string 1, can be stored into slot 0 with 1AC0SV.

To copy from any arbitrary slot (say 0) to another (say 1), we can say 0SC1SV.

Accessing a slot with a longer name, such as (^123)xyz(123$), can be done with the help of a free slot like 0 and a program fragment such as 1AC0SV2AC0SEV3AC0SEV0SCAVSD.

To write data, say (^8)foo(8$), into a slot whose name is stored in another slot, such as (^9)jim(9$), we can say: 8SC9SDSV.

Finally, a complete, if simple, program: the ubiquitous "Hello, world!" can be accomplished very simply like so: (^?)Hello, world!(?$)(^!)O(!$).

Constructing Control Flow Mechanisms

To perform a conditional branch in the program, one would ensure there are slots at the start of each alternative block of code to execute: call them α and β. One would then update slot `! to contain either α or β, making that slot the new instruction slot.

Unfortunately, you can't do that in Pophery as it stands, basically because there aren't enough built-in slots to say "put the value from slot blah into the slot named by the accumulator." TODO: add another built-in slot, and an instruction to either swap its contents with, or copy its contents to or from, and existing slot.

The slots α (and of course β as well) could be anywhere in the program, so a backward branch, and thus a loop, may be affected. The only issue is that the α slot must be re-inserted each time, as, when it is used as the instruction slot, it will begin to move rightward through the program. Also, as it needs to have a different name from the instruction slot currently in use, switching back and forth between two instruction slot names would be a necessity of such a loop.

Pophery Carrier Format: "Tranzy"

Pophery also defines a file format for Pophery programs and their metadata; this file format is called Tranzy. A Tranzy file is a text file consisting of a number of lines. The encoding of characters is not specified. Each line may begin with a # character, or not. Lines which do begin with # are "comment" lines in which metadata may be embedded; these are lines which are being carried in the Tranzy file, but which do not form any part of the Pophery program. The non-comment lines are concatenated (sans newlines, but including other whitespace) to form the Pophery program.

Tranzy does not currently define any metadata which can reside in comment lines, but acknowledges and permits metadata defined by external standards (de facto or otherwise). An example Tranzy file is depicted below.

#!/usr/bin/my-pophery-interpreter -w
# encoding: UTF-8
0@SLX1@SL@SXS
(^0)(0$)(^1)!(1$)

Notes

Pophery came more or less into its present form on or about September 6th, 2010. It has lain about since then, collecting dust.

The name Pophery is a mutant hybrid of the ancient Greek Porphyry, meaning "purple", and Poreef, itself a mutant hybrid of pork and beef featured in a certain The Kids in the Hall skit. (Arguably, my Unlikely language would have been a better candidate for the moniker "Poreef", but unfortunately, I missed that opportunity, and this is the way things turned out.)

While working up to the current design, a design plateau was reached; it had a more machine-language-like feel to it, with slots called the accumulator, the index slot, and the ancillary slot. (For posterity, I've called the language that follows this design Pophery version -1.0, and have retained its implementation in this distribution as minus-one.py, but it is not my focus of interest and will discuss it no further here.) The ancillary slot happened to have operations devised for it which resembled cut, copy, and paste, and was renamed the clipboard for this reason; then everything you see now kind of followed from that.

The "string rewriting" part of the description kind of has a double meaning now. Not only is the imperative execution described in terms of string rewriting (to the point where you could probably implement it straightforwardly, or nearly so, in a language like Thue,) but the imperative instructions also perform operations which are recognizably text editing — which is just a politically correct way of saying "string rewriting", n'est-ce pas?

TODO: Find a way to manipulate data satisfactorily.

One of the original goals of the language design was to support the construction of multiple control flow mechanisms from simple primitives. Due to time and concentration constraints, only the conventional looping-by-backwards-branching mechanism of control flow was explored. However, we will speculate on two other mechanisms, which may be implementable in Pophery or a modest extension thereof, in the next two paragraphs.

To affect SMITH-style self-extension, one would need only make sure there is a slot named β located to the right of the currently-executing code, and then write instructions into it. The instruction slot will eventually slide rightward into it. For authentic SMITH-like behavior, the instructions would be written into β by having a slot ζ encompass the currently executing block of code, and copying the contents of ζ wholesale into β.

To affect Muriel-style quine-continuation, one would need to establish a buffer slot named β, write instructions into it piecemeal until it looks like the desired next leg of the program, and then replace the entire running program with it. This can be done by having a slot named ζ encompass the entire program, and copying β into it. The only subtlety is the instruction slot; once the contents of β have become the entire program, you will want the intended instruction slot in β to become the active instruction slot, which means switching the instruction slot at the same time β is copied into ζ. This would probably necessitate an extension to Pophery.

Happy re: righting!
Chris Pressey
September 6, 2010
Evanston, IL