Skip to main content

Syntactic Noise

  • Compare amount of punctuation used in natural language versus computational languages.

  • Readability issues.

  • Indentation-based versus curly brace-delimited.

  • Overuse of parentheses.

    • Lisps.

    • Nested function calls.

  • Case Study: Comments

    • Why #, //, or ; on every line?

    • Why special syntax for documentation comments. Follow principle of making good things easy instead.

Existence-Oriented Programming

Let us start this conversation with some questions about your favorite or most familiar computational language.

  • Can a function accept optional arguments? If so, does a parameter, corresponding to an optional argument, need to have a default value assigned? If a default is required, can you unambiguously use this default as a sentinel to indicate that the argument was not supplied by the caller? If a default is not required, then what is the mechanism that you can use to detect whether the optional argument was or was not supplied by the caller?

  • Are generators available? If so, how does a generator indicate that it has no more values for its consumer?

  • More generally, can a function have optional return values? If so, then what is the mechanism that you can use to detect whether an optional return value was supplied by the callee?

  • What happens when you try to access an item that does not exist in a collection? (Possibly because the collection is empty.)

With those questions in mind, let us walk through some code samples from various languages to look at how they handle select use cases.

UPDATE (2023-04-29): Please see my conversation with ChatGPT for an alternative exposition on the subject. (The conversation got off to a rocky start since it hallucinated about this blog post in spite of not being able to access it. If you scroll down about a third of the way, I adopt a maieutic approach and things start to go well at that point. And, yes, I converse with it like it is a person, even though I know it is not.) A decent stack memory diagram emerged from that conversation and is being included in this post.

Illustrative Cases

Code snippets, each containing what is believed to be a safe and idiomatic way to perform a particular task in a given language, are used to illustrate each case. Links to the full code, which is runnable, can be found with each example. Furthermore, each piece of code attempts to use only the intrinsic types and standard library of its language. Also note that this is not an attempt to exhaustively catalog how to perform these tasks in every language, but rather to highlight various approaches and behaviors with popular or interesting representatives.

Retrieve Element by Positional Index

Suppose that we wish to do something with the last element from a collection, the elements of which can be accessed by positonal index. Attempting to access the last element of an empty collection can lead to a panic, a raised exception, or undefined behavior, depending on the language and the implementation of the container. Thus, a programmer may need to defend against the empty collection case by explicitly testing for it.

C++ (Undefined Behavior)

Undefined behavior, quite possibly a segmentation violation, on attempt to use an out-of-bounds index, such as any index would be in the case of an empty vector.

cl-existence-oriented/lastpos.cxx (Source)

    if ( !items.empty( ) ) {
        std::cerr << "last item: " << items.back( ) << std::endl;
    }
  • No compilation checks are made about unguarded access to a vector. Burden is on programmer to remember to test that an index is within bounds before accessing an item from a vector.

  • Two separate operations which must be explicitly programmed: emptiness test and access.

  • Inconsistent item access interface compared with std::map type. Burden is on programmer to remember differences.

  • Returns a value which does not need to be unwrapped before use.

Go (Panic)

Panics on attempt to use an out-of-bounds index, such as any index would be in the case of an empty slice.

cl-existence-oriented/lastpos.go (Source)

        count := len(items)
        if count > 0 {
                lastItem := items[count-1]
                fmt.Fprintf(os.Stderr, "last item: %d\n", lastItem)
        }
  • No compilation checks are made about unguarded access to a slice. Burden is on programmer to remember to test that an index is within bounds before accessing an item from a slice.

  • Two separate operations which must be explicitly programmed: emptiness test and access.

  • Inconsistent interface compared with map type. Burden is on programmer to remember differences.

  • Returns a value which does not need to be unwrapped before use.

Python (Raised Exception)

Raises IndexError exception on attempt to use an out-of-bounds index, such as any index would be in the case of an empty sequence.

cl-existence-oriented/lastpos.py (Source)

    if items: print( "last item: {}".format( items[ -1 ] ), file = stderr )
  • Burden is on programmer to test that an index is within bounds before accessing an item from a sequence.

  • Two separate operations which must be explicitly programmed: emptiness test and access.

  • Returns a value which does not need to be unwrapped before use.

  • Reasonably consistent item access interface compared with collections.abc.Mapping types.

Rust (Wrapped Value)

Returns an Option, which must be unwrapped before the accessed value, if it exists, can be used.

cl-existence-oriented/lastpos.rs (Source)

    if let Some(last_item) = items.last() {
        writeln!(stderr(), "last item: {}", last_item).ok();
    }
  • Burden is on programmer to test whether wrapper contains a value.

  • Burden is on programmer to unwrap value.

  • Accidental use of raw Option is possible in some cases.

  • Single operation: both existence information and value returned together.

  • Consistent return value interface compared with rest of standard library.

  • Compile-time safety guarantee against panics and undefined behavior.

Retrieve Entry by Nominative Index

Suppose that we want to retrieve a particular entry from an association table (dictionary, map, etc...) but that we have no guarantee on its presence in that table. Attempting to access the entry without first testing for its existence can lead to various problems, depending on the language and the table implementation.

C++ (Wrapped Entry)

Returns an iterator, which must be unwrapped before the accessed value, if it exists, can be used.

cl-existence-oriented/nomassoc.cxx (Source)

    const auto wrapped_needle = haystack.find( "needle" );
    if ( wrapped_needle != haystack.end( ) ) {
        std::cerr << "needle: " << wrapped_needle->second << std::endl;
    }
  • Burden is on programmer to test whether wrapper contains a value.

  • Burden is on programmer to unwrap value.

  • Accidental use of raw iterator is possible in some cases.

  • One of three different ways to access a value from a map.

  • Single operation: both existence information and value returned together.

C++ (Zero-Initialized Value)

Creates and returns new entry with zero-initialized value if entry is absent.

cl-existence-oriented/nomassoc.cxx (Source)

    if ( haystack.contains( "needle" ) ) { // Requires C++20 std::map.
        std::map<std::string, int> mutable_haystack( haystack );
        std::cerr << "needle: " << mutable_haystack[ "needle" ] << std::endl;
    }
  • If the zero-initialized value can be valid data, then the burden is on programmer to test for presence to disambiguate a valid zero-initialized value from an absent entry.

  • Inconsistent item access interface compared with std::vector type. Burden is on programmer to remember differences.

  • Cannot work with const maps as it must be able to create missing entry (internal mutation of data structure).

  • Two separate operations: existence test and access.

  • Key must be referenced twice: once for the existence test and once for the access. This poses a software maintenance issue since a change of key literal or key variable name would need to happen in two different places.

  • One of three different ways to access a value from a map.

  • Returns a value which does not need to be unwrapped before use.

C++ (Raised Exception)

Raises std::out_of_range exception on attempt to access an absent entry, such as in the empty collection case.

cl-existence-oriented/nomassoc.cxx (Source)

    if ( haystack.contains( "needle" ) ) { // Requires C++20 std::map.
        std::cerr << "needle: " << haystack.at( "needle" ) << std::endl;
    }
  • Burden is on programmer to test whether the entry is present prior to access.

  • Two separate operations: existence test and access.

  • Key must be referenced twice: once for the existence test and once for the access. This poses a software maintenance issue since a change of key literal or key variable name would need to happen in two different places.

  • One of three different ways to access a value from a map.

  • Returns a value which does not need to be unwrapped before use.

Go (Zero Value)

Returns the zero value for the value type if the entry is absent.

cl-existence-oriented/nomassoc.go (Source)

        needle, ok := haystack["needle"]
        if ok {
                fmt.Fprintf(os.Stderr, "needle: %d\n", needle)
        }
  • If the zero value can be valid data, then the burden is on programmer to test the existence boolean to disambiguate a valid zero value from an absent entry.

  • Inconsistent item access interface compared with slice type. Burden is on programmer to remember differences.

  • Returns a value which does not need to be unwrapped before use.

  • Single operation: both existence information and value returned together.

Python (Raised Exception)

Raises KeyError exception on attempt to access an absent entry, such as in the empty collection case.

cl-existence-oriented/nomassoc.py (Source)

    if 'needle' in haystack:
        print( "needle: {}".format( haystack[ 'needle' ], file = stderr ) )
  • Burden is on programmer to test whether the entry is present prior to access.

  • Two separate operations: existence test and access.

  • Key must be referenced twice: once for the existence test and once for the access. This poses a software maintenance issue since a change of key literal or key variable name would need to happen in two different places.

  • Returns a value which does not need to be unwrapped before use.

  • Consistent interface compared with collections.abc.Sequence types.

Python (Sentinel Value)

Returns None if the entry is absent.

cl-existence-oriented/nomassoc.py (Source)

    needle = haystack.get( 'needle' )
    if needle is not None:
        print( f"needle: {needle}", file = stderr )
  • If None can be valid data, then there is an ambiguity problem which cannot be resolved with this approach to access.

  • Even if None is an unambiguous sentinel, a test is still needed against it before the entry value can be used. Burden is on programmer to perform this test.

  • Returns a value which does not need to be unwrapped before use.

  • Single operation: existence information is encoded as the returned value.

Rust (Wrapped Value)

Returns an Option, which must be unwrapped before the accessed value, if it exists, can be used.

cl-existence-oriented/nomassoc.rs (Source)

    if let Some(needle) = haystack.get("needle") {
        writeln!(stderr(), "needle: {}", needle).ok();
    }
  • Burden is on programmer to test whether wrapper contains a value.

  • Burden is on programmer to unwrap value.

  • Accidental use of raw Option possible in some cases.

  • Single operation: both existence information and value returned together.

  • Consistent return value interface compared with rest of standard library.

  • Compile-time safety guarantee against panics or undefined behavior.

Retrieve Once from Iterator

Suppose that we want to get an element from a set without seeking any specific element. Set implementations are usually not indexable by position, as they are not ordered by position, so notions such as "first" or "last" are not that meaningful. And, if we do not know or care about a particular element in the set, then we are not going to retrieve by value either. However, most set implementations provide iterators over themselves and we can take advantage of this... provided we can handle the empty set case properly.

C++ (Wrapped Value)

Returns an iterator, which must be unwrapped before the accessed value, if it exists, can be used.

cl-existence-oriented/next-set-item.cxx (Source)

    const auto wrapped_element = elements.cbegin( );
    if ( wrapped_element != elements.cend( ) ) {
        std::cerr << "set element: " << *wrapped_element << std::endl;
    }
  • Burden is on programmer to test whether wrapper contains a value.

  • Burden is on programmer to unwrap value.

  • Accidental use of raw iterator is possible in some cases.

  • Single operation: both existence information and value returned together.

Python (Raised Exception)

Raises StopIteration exception on attempt to get next value from an exhausted iterator.

cl-existence-oriented/next-set-item.py (Source)

    if elements:
        element = next( iter( elements ) )
        print( f"set element: {element}", file = stderr )
  • Burden is on programmer to test whether the underlying collection is empty prior to iteration over it.

  • Two separate operations: existence test and access.

  • Returns a value which does not need to be unwrapped before use.

Rust (Wrapped Value)

Returns an Option, which must be unwrapped before the accessed value, if it exists, can be used.

cl-existence-oriented/next-set-item.rs (Source)

    let mut iterator = elements.iter();
    if let Some(element) = iterator.next() {
        writeln!(stderr(), "set element: {}", element).ok();
    }
  • Burden is on programmer to test whether wrapper contains a value.

  • Burden is on programmer to unwrap value.

  • Accidental use of raw Option possible in some cases.

  • Single operation: both existence information and value returned together.

  • Consistent return value interface compared with rest of standard library.

  • Compile-time safety guarantee against panics or undefined behavior.

Conditional Concatenation

Suppose that we want to write a function that will concatenate a base string with some optionally-supplied supplemental strings in a particular way. Most languages do not support optional arguments without the use of default values, wrapped values, or a mechanism that avoids parameter declarations.

Kotlin (Nullable Value)

The optional parameters have nullable types and are assigned null by default.

cl-existence-oriented/optional-arguments.kt (Source)

fun format_title(
    base: String, variant: String? = null, version: String? = null
): String {
    val output = mutableListOf(base)
    if (variant != null) {
        variant.length  // Compilation error if not wrapped in null check.
        output.add("[$variant]")
    }
    if (version != null) {
        output.add("($version)")
    }
    return output.joinToString(separator = " ")
}
  • Default value of null must be assigned to each parameter to make it optional.

  • If null can be valid data, then there is an ambiguity problem which cannot be resolved with this approach to optional arguments.

  • Compile-time safety guarantee only covers unguarded member access to a possibly null variable and not use of that variable itself.

  • Unused arguments do not need to be specified at invocation site.

  • Argument values do not need to be unwrapped prior to use.

  • Compile-time safety guarantee againt unguarded member access to possibly null variable.

Python (Sentinel Value)

The default values of the optional parameters are sentinel values.

cl-existence-oriented/optional-arguments.py (Source)

def format_title_sentinels( base, variant = None, version = None ):
    output = [ base ]
    if variant is not None:
        output.append( f"[{variant}]" )
    if version is not None:
        output.append( f"({version})" )
    return ' '.join( output )
  • If sentinel value (None in above case) can be valid data, then there is an ambiguity problem which cannot be resolved with this approach to optional arguments.

  • Even if the sentinel is unambiguous, a test is still needed against it before the argument can be correctly used. Burden is on programmer to perform this test.

  • Unused arguments do not need to be specified at invocation site.

  • Argument values do not need to be unwrapped prior to use.

Python (Raised Exception)

No explicit declaration of optional parameters. Arguments are passed via dictionary of extra arguments that do not bind to any declared parameters. Attempt to access an unbound argument in dictionary will result in a KeyError exception.

cl-existence-oriented/optional-arguments.py (Source)

def format_title_args_dict( base, **nomargs ):
    output = [ base ]
    if 'variant' in nomargs:
        output.append( "[{}]".format( nomargs[ 'variant' ] ) )
    if 'version' in nomargs:
        output.append( "({})".format( nomargs[ 'version' ] ) )
    return ' '.join( output )
  • Burden is on interface maintainer to ensure that usable parameters are properly documented since they will likley not be inferred by an automatic documentation generator.

  • Burden is on interface user to discover usable parameters in code, if they are not properly documented.

  • Test for existence in dictionary of optional arguments needed before optional argument can be used. Burden is on interface user to perform this test.

  • Key must be referenced twice: once for the existence test and once for the access. This poses a software maintenance issue since a change of key literal or key variable name would need to happen in two different places.

  • Unused arguments do not need to be specified at invocation site.

  • Argument values do not need to be unwrapped prior to use.

Rust (Wrapped Value)

Each optional parameter declared as with an Option type. Arguments are either None or a value-bearing Some.

cl-existence-oriented/optional-arguments.rs (Source)

fn format_title(base: &str, variant: Option<&str>, version: Option<&str>) -> String {
    let mut output: Vec<String> = vec![base.to_string()];
    if let Some(data) = variant {
        output.push(format!("[{}]", data));
    }
    if let Some(data) = version {
        output.push(format!("({})", data));
    }
    output.join(" ")
}
  • Burden is on function developer to inspect Option and unwrap it into separate variable, if it exists, before use in function.

  • Accidental use of raw Option possible in some cases.

  • Burden is on function invoker to pass an Option variant for each argument with optional existence at each invocation site. A change of function signature could force an update of each invocation site, which is a code maintenance issue.

  • Consistent interface compared with other parts of the language.

  • Compile-time safety guarantee against panics or undefined behavior.

There is a nice blog post on additional approaches to optional arguments in Rust.

Contemplation

Determing whether a value exists prior to using it is a frequent and pervasive task. We do this again and again in different ways, depending on the langauge and data structures with which we are working. For a task so routine and so common, one would hope that it would be as facile and robust as possible. But, as demonstrated, the current state of affairs is contrary to that desire.

Requirements

Can we do better than the showcased mechanisms? Let us set forth some requirements, based on what we have seen, and then propose a solution from those:

  • No sentinel values. No default values which serve as sentinels for optional existence. (I.e., no ambiguity. Also, as a bonus, reduced dependence on nulls.)

  • No wrapped values. No explicit capture or unwrap of values from algebraic sum types ("enums", like Option in Rust) or nullable box types to use an optionally-existent value.

  • Retrieve and test in one operation. No more than one explicit runtime operation against a container to safely use an optionally-existent value. (I.e., no need to program separate existence test and access operations.)

  • Detect unprotected access during compilation. No panics, raised exceptions, or undefined behavior at runtime from attempting to access an optionally-existent value.

  • Consistency. Single, consistent way of working with optionally-existent values across language and standard library.

Critiques

Rust meets most of the criteria above, except for "no wrapped values". But, unwrapping return values is an ergonomic issue, in spite of the availability of conveniences, such as if let. Also, wrapping optional arguments is another ergonomic issue. These are issues for a programmer, both in the sense that they require additional work to perform very routine operations and in the sense that they reduce the legibility of the code by obfuscating it with machinery not related the problem that it is solving.

Zig is currently less consistent than Rust in its use of optional values, perhaps because of the relative immaturity of its standard library. More importantly, it conflates optional existence with nullability. However, to its credit, it has a ? type prefix and various bits of unwrapping shorthand, such as a capturing if and the orelse operator, which mitigates the ergonomic issues of unwrapping to some extent, but does not eliminate them.

Like Zig, Kotlin conflates optional existence with nullability. And, similar to Zig, it has a ? type suffix and some syntactic sugar for handling nullable types, such as the ?. "safe call" operator and the ?: Elvis operator. One very nice feature of Kotlin is that the compiler will check if you attempt to access a member of a nullable object without guarding the access in appropriate null check first.

The way that the Go map handles entry retrieval is a nice idea in the sense that it collapses two explicit operations (existence test and access) into one. However, it is inconsistently applied with other container types across the language. And, the fact that the existence boolean can be ignored and that the default return value may be a valid piece of data makes it dangerous.

In Python, the dictionary of nominative arguments (**) can be quite powerful and avoids both sentinel values and wrapped values for optional arguments. However, it requires two explicit operations to safely work. And, it loses the documentation that comes from explicit interface declaration.

General Proposal

  • Let programmer mark function parameters which can optionally accept arguments. (Similar to Kotlin and Zig, but without conflating nullability with optional existence.)

  • Let programmer mark which return value slots of a function can be optionally filled. (Similar to Kotlin and Zig, but without conflating nullability with optional existence.)

  • Provide an operator to test whether a variable is bound to a value or not.

  • Perform semantic analysis during compilation to ensure that access to any variable with an optional value has appropriate protection, such as being inside the scope of a conditional which tests for its existence.

    • This is reasonable and achievable using contemporary techniques. (Kotlin already does this.)

    • Need to treat logical disjunction (or) and negation (not) of existence conditions as false protection for all optionally-existent values under test by those conditions. Only single existence conditions or logical conjunction of existence conditions can guarantee safe runtime access for all optionally-existent values under test by those conditions.

  • Allow propagation of an optional argument from one function invocation into another, provided that the corresponding parameter can also accept an optional argument. Propagation is safe because the ultimate invocation target must either ignore the optional argument or else submit it to existence protection as a condition for access to it.

  • Allow propagation of an optional return value out of one function invocation through another, provided that corresponding return value slot can also be optionally filled. Propagation is safe because some invoker in the call chain must ultimately either ignore the optional return value or else submit it to existence protection as a condition for access to it.

  • Generate code such that there is a hidden value, which tracks optional existence in bit fields, pushed on the stack of each function invocation, for the purpose of satisfying existence tests.

    • The CPU cost of making a test against the bit field is almost certainly not more than the cost of the mechanisms implemented in contemporary languages, such as those showcased.

    • The memory overhead of the additional stack slot is almost certainly not more than that of nullable boxes or tagged unions.

    • See stack memory diagram below.

  • Implement generators, including iterators, with optional return values. (Similar to Rust, but without wrapped values.)

  • Implement standard consumers of generators, such as a for .. in loop head, to work with optional existence. (Similar to Rust, but without wrapped values.)

  • Implement indexed (and other more specialized) access to containers provide optional return values, such that absence of return value indicates absence of item for which access was attempted. (Similar to Rust, but without wrapped values.)

Stack memory diagram:

+-------------------+
|                   |
|  caller function  |
|                   |
+-------------------+
|   return address  |
+-------------------+
|     parameter a   | <--- 42 (4 bytes)
+-------------------+
|     parameter b   | <--- not set (4 bytes)
+-------------------+
|     parameter c   | <--- not set (4 bytes)
+-------------------+
|   bit vector for  |
| optional params   | <--- (2 bytes or more, depending on number of optional parameters)
+-------------------+
|                   |
|     local vars    |
|                   |
+-------------------+

Exemplar Language

Below is an informal, partial language definition, which we will use to revisit the llustrative cases to see how a language, satisfying our requirements, might look in action.

  • Uses ? as a prefix to mark type constraints on optional parameters.

  • Provides is? as a prefix unary operator which tests whether a variable is bound to a value. When appearing in the head of a conditional clause, such as an if .. do clause, it denontes semantics that the variable under test is safe to access within the clause. Logical negation or disjunction within the head of the conditional clause removes this guarantee of safety.

  • Provides a with? .. do clause which only executes the body of the clause if each variable declared in the head of the clause is bound to a value.

  • Indicial accesses to items in a collection produce optional return values. If an index is absent, then no value is produced on return.

  • Generators produce optional return values. If no more values can be generated, then no value is produced on return.

Retrieve Element by Positional Index

Application of "maybe do" semantics via a with? .. do clause, dependent on whether a transient variable is assigned from an optional return value. Will only do something with the last element if it exists.

cl-existence-oriented/lastpos.mylang (Source)

    with? last-item = items.[-1] do stderr "last item: {last-item}"
  • Returns a value which does not need to be unwrapped before use.

  • Single operation: both existence information (implicit) and value returned together.

  • Consistent interface for optional values across language.

  • Compile-time safety guarantee againt unguarded access to possibly unbound variable.

Retrieve Entry by Nominative Index

Application of "maybe do" semantics via a with? .. do clause, dependent on whether a transient variable is assigned from an optional return value. Will only do something if the entry is present.

cl-existence-oriented/nomassoc.mylang (Source)

    with? needle = haystack.['needle'] do stderr "needle: {needle}"
  • Returns a value which does not need to be unwrapped before use.

  • Single operation: both existence information (implicit) and value returned together.

  • Consistent interface for optional values across language.

  • Compile-time safety guarantee againt unguarded access to possibly unbound variable.

Retrieve Once from Iterator

Application of "maybe do" semantics via a with? .. do clause, dependent on whether a transient variable is assigned from an optional return value. Will only do something if the iterator returns a value.

cl-existence-oriented/next-set-item.mylang (Source)

    with? element = ( ( elements.as-iterator ).next )
    do stderr "set element: {element}"
  • Returns a value which does not need to be unwrapped before use.

  • Single operation: both existence information (implicit) and value returned together.

  • Consistent interface for optional values across language.

  • Compile-time safety guarantee againt unguarded access to possibly unbound variable.

Conditional Concatenation

Application of unary existential test operator, is?, and specification of function parameters which take optional arguments. Will only execute the corpus for each if .. do clause if the corresponding optional argument has a value.

cl-existence-oriented/optional-arguments.mylang (Source)

let format-title`String` base variant`?Any` version`?Any` does:
    let output = ( Dynstring base )
    if is? variant do output.append "[{variant}]"
    if is? version do output.append "({version})"
    output.as-string ( separator : ' ' )
  • No need to wrap argument values.

  • No need to unwrap argument values.

  • Consistent interface for optional values across language.

  • Compile-time safety guarantee againt unguarded access to possibly unbound variable.

Conclusion

There is a way to work with optionally-existent values that is less intrusive than the mechanisms in use by contemporary computational languages and which, in theory, has no more runtime overhead than those mechanisms. We can avoid exceptions, panics, sentinel values, undefined behavior, wrapped values, and zero values in our alternative, if we are willing to implement some additional semantic analysis during compilation and pay for an extra slot on the stack to store existence-tracking bit fields. And we can provide a clean, consistent interface for optional value access across a language and its standard library, unifying the way in which we handle optional arguments to functions and optionally-returned values from functions, including generators.