WWDC 2022: Swift Regex

Find hereafter a detailed summary of two videos that belong to a taxonomy of some WWDC footages.

The original videos are available on the official Apple website (session 110357, session 110358).

"Learn how you can process strings more effectively when you take advantage of Swift Regex. Come for concise literals but stay for Regex builders — a new, declarative approach to string processing. We'll also explore the Unicode models in String and share how Swift Regex can make Unicode-correct processing easy."

"Go beyond the basics of string processing with Swift Regex. We'll share an overview of Regex and how it works, explore Foundation's rich data parsers and discover how to integrate your own, and delve into captures. We'll also provide best practices for matching strings and wielding Regex-powered algorithms with ease."


The outlines of this speech are indicated hereunder:


Most of the illustrations are parts of the Apple presentations and may be available at the Resources section inside the Overview sheet of each video.

Hereafter, the underlined elements lead directly to the playback of the WWDC video at the appropriate moment.

Overview #

The regular expression (regex) used in Swift may be authored in three different ways including its own derivative Regex that's more readable than the traditional regex syntax.

When the Regex string is known at compile-time, it may be wise to use a regex literal in order to allow the compiler to check for syntax errors.

Swift regex is based on four main outlines.

The RegexBuilder library integrates the regex literals syntax to create a comprehensive writing of the extraction that is also easy to understand by using a Domain-Specific Language similar to SwiftUI code.

For instance, the NegativeLookahead regex component asserts that what immediately follows the current position in the string doesn't match the specified content.

Working mode #

🎬

A regex can de defined as a program to be executed by its underlying Regex engine.

🎬

Some of the most common operations for string processing are parts of the Regex-powered algorithms and are also supplied by the Swift standard library by including the possibility of using the regex according to the Swift's pattern matching syntax.

🎬

The formatters and the parsers provided by Foundation can be embedded in the Regex builder to facilitate the string processing.


Basic elements #

A bunch of elements to be used for building efficient regex is suggested hereafter.

🎬

The Capture struct type extracts portions of the input for later processing while the TryCapture struct type includes a transform closure so as to adapt the returning result through removing the optionality in the output type.

These two struct types represent an efficient way to extract the matched data from the parsed elements.

🎬

A sed-like script may be written inside a regex through an extended delimiter.

This way of authoring a regex permits to use names instead of the positions.

🎬

Called as an atomic non-capturing group, the Local builder is efficient for matching precisely specified elements unlike the global backtracking, the default one for regex, that is useful for a broad search including an approximative matching.


Take a look at the Example section to dive deeper into the different ways to be followed for an efficient regex building.

String & Unicode #

A Character is a collection of Unicode scalars that may contain an invisible codepoint called variation-selector 16 to inform that the preceeding character should be displayed as an emoji.

The String and Character comparisons abide by the Unicode Canonical Equivalence...

... while using the Unicode Scalar name for a string processing is also possible.

Useful instance methods are available to handle an element at the Unicode Scalar level.



Existing parsers #

Foundation provides some useful date parsers that come in handy for string processing.

In order to parse the duration floating-point number, it's possible to implement an existing parser inside a regex through the definition of one's parser type and make it conform to a dedicated protocol.

Example #

First, a good tip to detect and standardise a field separator between strings may be to use a single instance method instead of combining some of them with a regex literal to get the same result.

Then, a complete example including each milestone is detailed to get the appropriate results.

Finally, the use of this elaborate regex renders the proper outcome.