Blame


1 021d5bb6 2021-08-12 op As part of the regression suite for a project I’m working on, I designed a simple scripting language (which is not even Turing-complete by the way) to create specific situations and test how the program respond. I’ve almost finished the interpreter for it, so it’s the time to start writing tests. How do you edit a file if you don’t have a proper major mode available? You write one!
2 021d5bb6 2021-08-12 op
3 021d5bb6 2021-08-12 op A major mode is a lisp program that manage how the user interacts with the content of a buffer. (Friendly remainder that a buffer may or may not be an actual file; things like dired or elpher are major modes after all, but they’re not the kind of modes I’m interested in now.)
4 021d5bb6 2021-08-12 op
5 021d5bb6 2021-08-12 op Major modes for text files usually do at least three things:
6 021d5bb6 2021-08-12 op * font-lock (i.e. syntax highlighting)
7 021d5bb6 2021-08-12 op * setup the syntax-table
8 021d5bb6 2021-08-12 op * manage indentation (the hardest part)
9 021d5bb6 2021-08-12 op and probably more, like providing useful keybindings and interactions with other packages.
10 021d5bb6 2021-08-12 op
11 021d5bb6 2021-08-12 op I’ve never had to deal with the fontification or syntax tables, nor realised how difficult the indentation can be, so it’s been lots of fun.
12 021d5bb6 2021-08-12 op
13 021d5bb6 2021-08-12 op The difficulty of writing a major mode seems to be at least proportional to the “complexness” of the target language. In my case, the grammar of the language is dead-simple and so the major mode is simple too. cc-mode on the other hand is probably at the other side of the spectrum (well, after all it manages C, C++, Java, AWK and more…)
14 021d5bb6 2021-08-12 op
15 021d5bb6 2021-08-12 op Before describing the elisp implementation, here’s a look at the custom DLS, “nps”:
16 021d5bb6 2021-08-12 op
17 021d5bb6 2021-08-12 op ```
18 021d5bb6 2021-08-12 op include "lib.nps"
19 021d5bb6 2021-08-12 op
20 021d5bb6 2021-08-12 op # consts comes in two flavors
21 021d5bb6 2021-08-12 op const (
22 021d5bb6 2021-08-12 op one = 1
23 021d5bb6 2021-08-12 op two = 2
24 021d5bb6 2021-08-12 op )
25 021d5bb6 2021-08-12 op const foo = "hello there"
26 021d5bb6 2021-08-12 op
27 021d5bb6 2021-08-12 op # procedures works as expected, … is for the rest argument
28 021d5bb6 2021-08-12 op proc message(type, ...) {
29 021d5bb6 2021-08-12 op send(type:u8, ...) # type casts
30 021d5bb6 2021-08-12 op }
31 021d5bb6 2021-08-12 op
32 021d5bb6 2021-08-12 op # it’s a DSL for regression tests after all
33 021d5bb6 2021-08-12 op testing "cooking skills" {
34 021d5bb6 2021-08-12 op message(Make, "me", "a", "sandwitch")
35 021d5bb6 2021-08-12 op m = recv()
36 021d5bb6 2021-08-12 op
37 021d5bb6 2021-08-12 op # asserts comes in two flavors too
38 021d5bb6 2021-08-12 op assert (
39 021d5bb6 2021-08-12 op m.type == What
40 021d5bb6 2021-08-12 op m.content == "Make it yourself."
41 021d5bb6 2021-08-12 op )
42 021d5bb6 2021-08-12 op assert m.id = 5
43 021d5bb6 2021-08-12 op }
44 021d5bb6 2021-08-12 op ```
45 021d5bb6 2021-08-12 op
46 021d5bb6 2021-08-12 op Now let’s jump in to the mode implementation.
47 021d5bb6 2021-08-12 op
48 021d5bb6 2021-08-12 op The elisp file starts with the usual header. I’m enabling the lexical-binding even if it’s the default from emacs 27
49 021d5bb6 2021-08-12 op
50 021d5bb6 2021-08-12 op ```elisp header
51 021d5bb6 2021-08-12 op ;;; nps-mode.el --- major mode for nps -*- lexical-binding: t; -*-
52 021d5bb6 2021-08-12 op ```
53 021d5bb6 2021-08-12 op
54 021d5bb6 2021-08-12 op I’ll also make use of the rx library to write regexps, so
55 021d5bb6 2021-08-12 op ```
56 021d5bb6 2021-08-12 op (eval-when-compile
57 021d5bb6 2021-08-12 op (require 'rx))
58 021d5bb6 2021-08-12 op ```
59 021d5bb6 2021-08-12 op
60 021d5bb6 2021-08-12 op ## fontification
61 021d5bb6 2021-08-12 op
62 021d5bb6 2021-08-12 op i.e. syntax highlighting. There are probably different ways of doing this, but I’ll stick with the simplest one: a bunch of regexps.
63 021d5bb6 2021-08-12 op
64 021d5bb6 2021-08-12 op ```defining the font lock regexps
65 021d5bb6 2021-08-12 op (defconst nps--font-lock-defaults
66 021d5bb6 2021-08-12 op (let ((keywords '("assert" "const" "include" "proc" "testing"))
67 021d5bb6 2021-08-12 op (types '("str" "u8" "u16" "u32")))
68 021d5bb6 2021-08-12 op `(((,(rx-to-string `(: (or ,@keywords))) 0 font-lock-keyword-face)
69 021d5bb6 2021-08-12 op ("\\([[:word:]]+\\)\s*(" 1 font-lock-function-name-face)
70 021d5bb6 2021-08-12 op (,(rx-to-string `(: (or ,@types))) 0 font-lock-type-face)))))
71 021d5bb6 2021-08-12 op ```
72 021d5bb6 2021-08-12 op
73 021d5bb6 2021-08-12 op Yes, I got the number of parenthesis wrong (multiple times) at first.
74 021d5bb6 2021-08-12 op
75 021d5bb6 2021-08-12 op This value will be later set to the buffer-local font-lock-defaults variable. I’ve not yet wrapped my head around the different levels mentioned in the documentation, but the code seems to work. We’re using rx to build a regexp that matches the keywords and using the face ‘font-lock-keyword-face’ for the matches. The zero is there because the regexp doesn’t have any sub-groups.
76 021d5bb6 2021-08-12 op
77 021d5bb6 2021-08-12 op The second entry is slightly more complex and interesting. It matches a symbol followed by an open paren and applies the face ‘font-lock-function-name-face’ to it. The regexp has a sub-group (the \\( and \\) bit) that matches only the symbol, and the number 1 tells font-lock to highlight only the first match and not the whole regexp.
78 021d5bb6 2021-08-12 op
79 021d5bb6 2021-08-12 op The third one is like the first, it highlights the “types”.
80 021d5bb6 2021-08-12 op
81 021d5bb6 2021-08-12 op ## syntax-table
82 021d5bb6 2021-08-12 op
83 021d5bb6 2021-08-12 op This is pure black magic, I can assure you. Nah, just kidding. But it looks like.
84 021d5bb6 2021-08-12 op
85 021d5bb6 2021-08-12 op It’s a very important piece of the major-mode. Various lisps function will inspect the current syntax-table to query over what kind of text the point is. It also interacts with the font-lock and various other parts of Emacs.
86 021d5bb6 2021-08-12 op
87 021d5bb6 2021-08-12 op This is also the part I’m less confident with. Some major-modes I’ve seen add explicit entries for the braces and the quotes, other doesn’t. I’ve decided to be explicit and list all the characters I’m using, just to be sure.
88 021d5bb6 2021-08-12 op
89 021d5bb6 2021-08-12 op The idea is to specify for each character (or range of characters) some properties. These properties are expressed in a very terse notation using a string. To add entries to the syntax table you need to use ‘modify-syntax-entry’: it takes the character (or range), the string description of the properties and the syntax table.
90 021d5bb6 2021-08-12 op
91 021d5bb6 2021-08-12 op The format of the specification is better explained in the elisp manual, but the gist is that is a sequence of character with a special interpretation. The first character identifies the “class” (punctuation, word component, comment delimeter, parenthesis, …), the second if not a space specifies the matching character, and then there are further fields that I won’t use.
92 021d5bb6 2021-08-12 op
93 021d5bb6 2021-08-12 op Just to provide an example before showing the code, in a programming language the syntax entry for the character ‘(’ probably looks like "()":
94 021d5bb6 2021-08-12 op * it’s a parethesis, as the first character is an open paren and
95 021d5bb6 2021-08-12 op * the ‘)’ character is its matching character.
96 021d5bb6 2021-08-12 op The syntax table for ‘)’ instead will look like "((" because
97 021d5bb6 2021-08-12 op * it’s a parenthesis
98 021d5bb6 2021-08-12 op * its matching character is ‘(’
99 021d5bb6 2021-08-12 op
100 021d5bb6 2021-08-12 op So, here’s the syntax table for nps in its all glory:
101 021d5bb6 2021-08-12 op
102 021d5bb6 2021-08-12 op ```
103 021d5bb6 2021-08-12 op (defvar nps-mode-syntax-table
104 021d5bb6 2021-08-12 op (let ((st (make-syntax-table)))
105 021d5bb6 2021-08-12 op (modify-syntax-entry ?\{ "(}" st)
106 021d5bb6 2021-08-12 op (modify-syntax-entry ?\} "){" st)
107 021d5bb6 2021-08-12 op (modify-syntax-entry ?\( "()" st)
108 021d5bb6 2021-08-12 op
109 021d5bb6 2021-08-12 op ;; - and _ are word constituents
110 021d5bb6 2021-08-12 op (modify-syntax-entry ?_ "w" st)
111 021d5bb6 2021-08-12 op (modify-syntax-entry ?- "w" st)
112 021d5bb6 2021-08-12 op
113 021d5bb6 2021-08-12 op ;; both single and double quotes makes strings
114 021d5bb6 2021-08-12 op (modify-syntax-entry ?\" "\"" st)
115 021d5bb6 2021-08-12 op (modify-syntax-entry ?' "'" st)
116 021d5bb6 2021-08-12 op
117 021d5bb6 2021-08-12 op ;; add comments. lua-mode does something similar, so it shouldn't
118 021d5bb6 2021-08-12 op ;; bee *too* wrong.
119 021d5bb6 2021-08-12 op (modify-syntax-entry ?# "<" st)
120 021d5bb6 2021-08-12 op (modify-syntax-entry ?\n ">" st)
121 021d5bb6 2021-08-12 op
122 021d5bb6 2021-08-12 op ;; '==' as punctuation
123 021d5bb6 2021-08-12 op (modify-syntax-entry ?= ".")
124 021d5bb6 2021-08-12 op st))
125 021d5bb6 2021-08-12 op ```
126 021d5bb6 2021-08-12 op
127 021d5bb6 2021-08-12 op ## indentation
128 021d5bb6 2021-08-12 op
129 021d5bb6 2021-08-12 op Indentation at first doesn’t seem like a difficult thing. After all, when we’re staring at code we don’t have the slightest doubt on how a certain line needs to be indented. Turns out, like most other “obvious” things, that coming up with a program that decides how to indent is not that straightforward.
130 021d5bb6 2021-08-12 op
131 021d5bb6 2021-08-12 op In my case fortunately the logic is pretty simple. The level of the indentation is how nested we are in parenthesis multiplied by the tab-width (because yes, nps uses hard tabs), with the exception of a closing parenthesis which gets indented one level less. Take this snippet for instance:
132 021d5bb6 2021-08-12 op
133 021d5bb6 2021-08-12 op ```snippet of nps to show how indentation works
134 021d5bb6 2021-08-12 op proc foo(x) {
135 021d5bb6 2021-08-12 op y = bar(x.id)
136 021d5bb6 2021-08-12 op assert (
137 021d5bb6 2021-08-12 op y.thingy = 3
138 021d5bb6 2021-08-12 op )
139 021d5bb6 2021-08-12 op }
140 021d5bb6 2021-08-12 op ```
141 021d5bb6 2021-08-12 op
142 021d5bb6 2021-08-12 op The first line, the ‘proc’ declaration, is indented at the zeroth column because we aren’t inside a nested pair of parenthesis. The ‘y’ variable is indented one tab level because it’s inside the curly braces. The body of the assert is inside two nested pairs of parenthesis, so it’s indented twice. The closing parenthesis of the assert is indented by only one level because of the special case: it should be two, but since it’s a closing we drop one indentation level.
143 021d5bb6 2021-08-12 op
144 021d5bb6 2021-08-12 op The code for ‘nps-indent-line’ is probably not the prettiest, but seems to work nonetheless:
145 021d5bb6 2021-08-12 op
146 021d5bb6 2021-08-12 op ```
147 021d5bb6 2021-08-12 op (defun nps-indent-line ()
148 021d5bb6 2021-08-12 op "Indent current line."
149 021d5bb6 2021-08-12 op (let (indent
150 021d5bb6 2021-08-12 op boi-p ;begin of indent
151 021d5bb6 2021-08-12 op move-eol-p
152 021d5bb6 2021-08-12 op (point (point))) ;lisps-2 are truly wonderful
153 021d5bb6 2021-08-12 op (save-excursion
154 021d5bb6 2021-08-12 op (back-to-indentation)
155 021d5bb6 2021-08-12 op (setq indent (car (syntax-ppss))
156 021d5bb6 2021-08-12 op boi-p (= point (point)))
157 021d5bb6 2021-08-12 op ;; don't indent empty lines if they don't have the in it
158 021d5bb6 2021-08-12 op (when (and (eq (char-after) ?\n)
159 021d5bb6 2021-08-12 op (not boi-p))
160 021d5bb6 2021-08-12 op (setq indent 0))
161 021d5bb6 2021-08-12 op ;; check whether we want to move to the end of line
162 021d5bb6 2021-08-12 op (when boi-p
163 021d5bb6 2021-08-12 op (setq move-eol-p t))
164 021d5bb6 2021-08-12 op ;; decrement the indent if the first character on the line is a
165 021d5bb6 2021-08-12 op ;; closer.
166 021d5bb6 2021-08-12 op (when (or (eq (char-after) ?\))
167 021d5bb6 2021-08-12 op (eq (char-after) ?\}))
168 021d5bb6 2021-08-12 op (setq indent (1- indent)))
169 021d5bb6 2021-08-12 op ;; indent the line
170 021d5bb6 2021-08-12 op (delete-region (line-beginning-position)
171 021d5bb6 2021-08-12 op (point))
172 021d5bb6 2021-08-12 op (indent-to (* tab-width indent)))
173 021d5bb6 2021-08-12 op (when move-eol-p
174 021d5bb6 2021-08-12 op (move-end-of-line nil))))
175 021d5bb6 2021-08-12 op ```
176 021d5bb6 2021-08-12 op
177 021d5bb6 2021-08-12 op The real workhorse is ‘syntax-ppss’ that tells us how deep in parens we are. A better real-world example is probably the indent-line of the go-mode: it’s obviously more complex, but it’s still manageable.
178 021d5bb6 2021-08-12 op
179 021d5bb6 2021-08-12 op ## abbrev table
180 021d5bb6 2021-08-12 op
181 021d5bb6 2021-08-12 op This is not strictly needed, but it’s nice to have. I’m using abbrev tables for various languages to automatically correct some small typos (like ‘inculde’ instead of ‘include’).
182 021d5bb6 2021-08-12 op
183 021d5bb6 2021-08-12 op ```
184 021d5bb6 2021-08-12 op (defvar nps-mode-abbrev-table nil
185 021d5bb6 2021-08-12 op "Abbreviation table used in `nps-mode' buffers.")
186 021d5bb6 2021-08-12 op
187 021d5bb6 2021-08-12 op (define-abbrev-table 'nps-mode-abbrev-table
188 021d5bb6 2021-08-12 op '())
189 021d5bb6 2021-08-12 op ```
190 021d5bb6 2021-08-12 op
191 021d5bb6 2021-08-12 op ## Completing the mode
192 021d5bb6 2021-08-12 op
193 021d5bb6 2021-08-12 op Now that we have all the pieces, let’s define the mode:
194 021d5bb6 2021-08-12 op
195 021d5bb6 2021-08-12 op ```
196 021d5bb6 2021-08-12 op ;;;###autoload
197 021d5bb6 2021-08-12 op (define-derived-mode nps-mode prog-mode "nps"
198 021d5bb6 2021-08-12 op "Major mode for nps files."
199 021d5bb6 2021-08-12 op :abbrev-table nps-mode-abbrev-table
200 021d5bb6 2021-08-12 op (setq font-lock-defaults nps--font-lock-defaults)
201 021d5bb6 2021-08-12 op (setq-local comment-start "#")
202 021d5bb6 2021-08-12 op (setq-local comment-start-skip "#+[\t ]*")
203 021d5bb6 2021-08-12 op (setq-local indent-line-function #'nps-indent-line)
204 021d5bb6 2021-08-12 op (setq-local indent-tabs-mode t))
205 021d5bb6 2021-08-12 op ```
206 021d5bb6 2021-08-12 op
207 021d5bb6 2021-08-12 op nps mode derives from prog-mode, a generic mode used for programming language. This way, users can easily define keybindings and options only for programming-related buffers and have a consistent experience. The body of the ‘define-derived-mode’ macro is just some code that gets executed when the mode is activated. There, we set the font-lock-defaults that was computed previously, define comment-start and comment-start-skip so functions like ‘comment-dwim’ (M-;) works as expected and setup the ‘indent-line-function’. Then, also enable indent tabs mode because nps uses real hard tabs. That’s it.
208 021d5bb6 2021-08-12 op
209 021d5bb6 2021-08-12 op Registering this mode to the ‘nps’ file extension ensures that Emacs will enable nps-mode automatically:
210 021d5bb6 2021-08-12 op
211 021d5bb6 2021-08-12 op ```
212 021d5bb6 2021-08-12 op ;;;###autoload
213 021d5bb6 2021-08-12 op (add-to-list 'auto-mode-alist '("\\.nps" . nps-mode))
214 021d5bb6 2021-08-12 op ```
215 021d5bb6 2021-08-12 op
216 021d5bb6 2021-08-12 op Sidebar: what are those ‘autoload’ comments? It’s a trick used by Emacs to cheat and not load all the code in a file until it’s needed. Emacs will only evaluate the ‘add-to-list’ and register a ‘nps-mode’ autoload, but won’t evaluate anything else until ‘nps-mode’ is called. The first time that ‘nps-mode’ is called, it’ll make Emacs load the whole ‘nps-mode.el’ file and then call again ‘nps-mode’. This is how Emacs can starts so quickly and still load TONS of emacs-lisp files.
217 021d5bb6 2021-08-12 op
218 021d5bb6 2021-08-12 op Major modes usually defines also some keys and/or integration with other packages (flymake for example). I’m not going do to neither, but it’s still pretty easy. To provide some keys all you have to do is to declare a ‘$mode-map’ variable that holds a keymap, then ‘define-derived-mode’ will take care of enabling it:
219 021d5bb6 2021-08-12 op
220 021d5bb6 2021-08-12 op ```example of a mode-map
221 021d5bb6 2021-08-12 op (defvar nps-mode-map
222 021d5bb6 2021-08-12 op (let ((map (make-sparse-keymap)))
223 021d5bb6 2021-08-12 op (define-key map "C-c c" #'do-stuff)
224 021d5bb6 2021-08-12 op ...
225 021d5bb6 2021-08-12 op map)) ; don’t forget to return the map here!
226 021d5bb6 2021-08-12 op ```
227 021d5bb6 2021-08-12 op
228 021d5bb6 2021-08-12 op ## Wrapping up
229 021d5bb6 2021-08-12 op
230 021d5bb6 2021-08-12 op Writing a major-mode from scratch this way was really interesting in my opinion. The knowledge on how major-mode works and how to write one will probably come in handy in the future, either to write more major-mode for (hopefully) real programming languages or to tweak existing ones.
231 021d5bb6 2021-08-12 op
232 021d5bb6 2021-08-12 op In retrospect, I ended up choosing the hardest possible way to build a major mode. For a project like this, where I’m only interested in basic font-locking, there was at least two other options to choose from:
233 021d5bb6 2021-08-12 op * use generic-mode
234 021d5bb6 2021-08-12 op * derive from cc-mode
235 021d5bb6 2021-08-12 op
236 021d5bb6 2021-08-12 op generic-mode is provide an easy, but limited, way to write major-modes.
237 021d5bb6 2021-08-12 op
238 021d5bb6 2021-08-12 op cc-mode it’s the mode that powers C, C++, Java and (at least) AWK. It’s pretty flexible and it was designed to handle “all” C-like programming languages.
239 021d5bb6 2021-08-12 op
240 021d5bb6 2021-08-12 op However, writing nps-mode from scratch was a pleasant experience and I had some fun hacking in emacs lisp. The implementation is also not too bad and still pretty simple, so it has been worth the time.
241 021d5bb6 2021-08-12 op
242 021d5bb6 2021-08-12 op I’m not sharing the code in this post because it’s part of the aforementioned project that it’s still heavily worked on. The code in this post is everything I wrote in nps-mode.el anyway.
243 021d5bb6 2021-08-12 op
244 021d5bb6 2021-08-12 op Some useful links:
245 021d5bb6 2021-08-12 op
246 021d5bb6 2021-08-12 op => https://www.emacswiki.org/emacs/GenericMode [https] generic-mode
247 021d5bb6 2021-08-12 op => https://nullprogram.com/blog/2020/01/22/ [https] A makefile for Emacs Packages