4.1. Files🔗

The smallest unit of compilation in Lean is a single module. Modules correspond to source files, and are imported into other modules based on their filenames. In other words, the names and folder structures of files are significant in Lean code.

4.1.1. Modules🔗

Every Lean file defines a module. A module's name is derived from a combination of its filename and the way in which Lean was invoked: Lean has a root directory in which it expects to find code, and the module's name is the names of the directories from the root to the filename, with dots (.) interspersed and .lean removed. For example, if Lean is invoked with Projects/MyLib/src as its root, the file Projects/MyLib/src/Literature/Novel/SciFi.lean would contain a module named Literature.Novel.SciFi.

Describe case sensitivity/preservation for filenames here

4.1.1.1. Encoding and Representation🔗

Lean modules are Unicode text files encoded in UTF-8. Figure out the status of BOM and Lean Lines may end either with newline characters ("\n", Unicode 'LINE FEED (LF)' (U+000A)) or with a form feed and newline sequence ("\r\n", Unicode 'CARRIAGE RETURN (CR)' (U+000D) followed by 'LINE FEED (LF)' (U+000A)). However, Lean normalizes line endings when parsing or comparing files, so all files are compared as if all their line endings are "\n".

Marginal note: this is to make cached files and #guard_msgs and the like work even when git changes line endings. Also keeps offsets stored in parsed syntax objects consistent.

4.1.1.2. Concrete Syntax🔗

Lean's concrete syntax is extensible. In a language like Lean, it's not possible to completely describe the syntax once and for all, because libraries may define syntax in addition to new constants or inductive types. Rather than completely describing the language here, the overall framework is described, while the syntax of each language construct is documented in the section to which it belongs.

4.1.1.2.1. Whitespace🔗

Tokens in Lean may be separated by any number of whitespace character sequences. Whitespace may be a space (" ", Unicode 'SPACE (SP)' (U+0020)), a valid newline sequence, or a comment. xref Neither tab characters nor carriage returns not followed by newlines are valid whitespace sequences.

4.1.1.2.2. Comments🔗

Comments are stretches of the file that, despite not being whitespace, are treated as such. Lean has two syntaxes for comments:

Line comments

A -- that does not occur as part of a token begins a line comment. All characters from the initial - to the newline are treated as whitespace.

Block comments

A /- that does not occur as part of a token and is not immediately followed by a - character begins a block comment. The block comment continues until a terminating -/ is found. Block comments may be nested; a -/ only terminates the comment if prior nested block comment openers /- have been terminated by a matching -/.

/-- and /-! begin documentation xref rather than comments, which are also terminated with -/ and may contain nested block comments. Even though documentation resembles comments, they are their own syntactic category; their valid placement is determined by Lean's grammar.

4.1.1.2.3. Keywords and Identifiers🔗

An identifier consists of one or more identifier components, separated by '.'.

Identifier components consist of a letter or letter-like character or an underscore ('_'), followed by zero or more identifier continuation characters. Letters are English letters, upper- or lowercase, and the letter-like characters include a range of non-English alphabetic scripts, including the Greek script which is widely used in Lean, as well as the members of the Unicode letter-like symbol block, which contains a number of double-struck characters (including and ) and abbreviations. Identifier continuation characters consist of letters, letter-like characters, underscores ('_'), exclamation marks (!), question marks (?), subscripts, and single quotes ('). As an exception, underscore alone is not a valid identifier.

Identifiers components may also be surrounded by double guillemets ('«' and '»'). Such identifier components may contain any character at all aside from '»', even '«', '.', and newlines. The guillemets are not part of the resulting identifier component, so «x» and x denote the same identifier. «Nat.add», on the other hand, is an identifier with a single component, while Nat.add has two.

Some potential identifier components may be reserved keywords. The specific set of reserved keywords depends on the set of active syntax extensions, which may depend on the set of imported modules and the currently-opened xref/deftech for namespace namespaces; it is impossible to enumerate for Lean as a whole. These keywords must also be quoted with guillemets to be used as identifier components in most syntactic contexts. Contexts in which keywords may be used as identifiers without guillemets, such as constructor names in inductive types, are raw identifier contexts.

Identifiers that contain one or more '.' characters, and thus consist of more than one identifier component, are called hierarchical identifiers. Hierarchical identifiers are used to represent both module names and names in a namespace.

4.1.1.3. Structure🔗

syntax
Parser for a Lean module. We never actually run this parser but instead use the imperative definitions below that
return the same syntax tree structure, but add error recovery. Still, it is helpful to have a `Parser` definition
for it in order to auto-generate helpers such as the pretty printer. module ::=
    Parser for a Lean module. We never actually run this parser but instead use the imperative definitions below that
return the same syntax tree structure, but add error recovery. Still, it is helpful to have a `Parser` definition
for it in order to auto-generate helpers such as the pretty printer. header command*

A module consists of a module header followed by a sequence of commands.

4.1.1.3.1. Module Headers🔗

The module header consists of a sequence of import statements.

syntax
header ::=
    import*

An optional keyword prelude, for use in Lean's implementation, is also allowed:

header ::= ...
    | prelude import*
syntax
prelude ::=
    prelude

The prelude keyword indicates that the module is part of the implementation of the Lean prelude, which is the code that is available without explicitly importing any modules—it should not be used outside of Lean's implementation.

syntax
import ::= ...
    | import ident

Imports the module. Importing a module makes its contents available in the current module, as well as those from modules transitively imported by its imports.

Modules do not necessarily correspond to namespaces. Modules may add names to any namespace, and importing a module has no effect on the set of currently open namespaces.

The imported module name is translated to a filename by replacing dots ('.') in its name with directory separators and appending .lean or .olean. Lean searches its include path for the corresponding intermediate build product or importable module file.

4.1.1.3.2. Commands🔗

Commands are top-level statements in Lean. Some examples are inductive type declarations, theorems, function definitions, namespace modifiers like open or variable, and interactive queries such as #check. The syntax of commands is user-extensible. Specific Lean commands are documented in the corresponding chapters of this manual, rather than being listed here.

Make the index include links to all commands, then xref from here

4.1.1.4. Contents🔗

A module includes an def and xref environment, which includes both the inductive type and constant definitions from an environment and any data stored in xref its environment extensions. As the module is processed by Lean, commands add content to the environment. A module's environment can be serialized to a .olean file, which contains both the environment and a compacted heap region with the run-time objects needed by the environment. This means that an imported module can be loaded without re-executing all of its commands.

4.1.2. Packages, Libraries, and Targets🔗

Lean code is organized into packages, which are units of code distribution. A package may contain multiple libraries or executables.

Code in a package that is intended for use by other Lean packages is organized into libraries. Code that is intended to be compiled and run as independent programs is organized into executables. Together, libraries and executables are referred to as targets in Lake, the standard Lean build tool. section xref

Write Lake section, coordinate this content with it