Notepad++ User Defined Languages

For editing regular text I do like Notepad++, a text editor with colour syntax highlighting for many different programming languages built in – and a lot of other languages for which people have written User Defined Language (UDL) plugins. Neat … only … when I came to try to make one I found little documentation and what there was was fractured, distributed and dammit just incomplete (incomplete = Can’t find even a snifter via Google in an hour). Indeed, in the UDL dialog of the latest notepad++ (Aug 2015) is a link to a temporary documentation site – which ceased in 2012 with a message “New, improved documentation site will replace this one in near future.”

So [cracks knuckles] lets try and work it out.

I’m going to use a real script as a basis – a script used by a 1990′s video game called Duke Nukem 3D which allowed you to customise game-play and later, in Eduke32 form, write similar scripts to do clever things in a game level editor. Yes, i’m using it’s “m32″ script file as an example here, but the principles involved in setting up a UDL will apply to whatever language you are configuring too. First off, how do we load an existing UDL ?

Loading An Existing UDL

You’ll find a collection of third party UDL files on the notepad++’s wiki, download one which will be an xml file or perhaps a zip with one or more XML files in it. Put the xml file in your /users/[user]/AppData/Roaming/Notepad++ folder, or if there are multiple XML files then put them in a subfolder. Now, in notepad++, select the menu [Language] -> [Define your language…]. Up pops an truely enormous dialog – fortunately you can dock it inside of the notepad app if you happen to have a monitor with less than 2000 pixels vertical resolution :-)

Anyhow, click the [Import…] button, select the XML file you downloaded and job done. (If there was more than one XML file, select the one that is not langs.xml or stylers.xml). The installed UDL kicks in when you next open a text file whose file extension matches the one in the UDL.

Rolling Your Own

Now for creating a brand new one yourself. You’ll notice the UDL dialog also has a [Create New…] button. I think you can see where this is heading. Go on, create a new one, you’ll be prompted for the name you want to use for your language. Job done, but for the rest of this article regularly check the [User Language:] combo box on the left to ensure your new UDL is selected. I had instances where the selection changed – probably by accident by me, but changed none the less – which makes life very confusing when you are configuring things and nothing seems to be happening !

First thing to do is enter the file extension to associate with our language, which in this case is “m32″. Also, is your language case sensitive ? Set the [Ignore Case] checkbox appropriately.

Right, at this point, open an example of you language text file – an m32 file in my case – in notepad++. It wont look any different as we haven’t defined anything yet, but handily as you modify the UDL dialog its changes immediately (well, almost immediately) appear in your loaded text document.

Configuring Comments

I think the best place to start is comments, pretty much all scripts/languages support them so I’m guessing your language will too and we can therefor quickly confirm our new language is recognised. These are configured on the very first tab on the UDL dialog, [Folder & Default].

For my example M32 file, comments are actually C-Style, that is, ‘//’ is a comment that extends to the end of the line, whist ‘/*’ starts a comment block that continues until a corresponding ‘*/’.

Comment field text

Setting up the comments fields

So, on the UDL dialog, select the [Comment & Number] tab and you’ll see two groups (among others) related to comments. For [Comment line style], set the [Open:] field to ‘//’. Now configure how you want that to appear by clicking the [Styler] button where you configure how the text should appear. Set it to green and close the styler dialog. You should see your loaded M32 file immediately update comments to green.

If nothing seems to happen then I’ve found it is probably one of three things:

1) I warned you ! Triple check you selected the right user language in the [User Language] combo box on the UDL dialog.

2) Check you have the correct file extensions, both in the UDL dialog and your loaded file. Yes, I really was dumb enough to fall for that one (sigh)

3) Notepad++ not quite initialised. Close your text file in the editor, then close notepad++, then re-open notepad++, then reload your file.

Right, I’m now assuming it worked, so next we do the  [Comment style] group which is similar to the previous [Comment line style] only this time set [Open:] to ‘/*’ and [Close:] to ‘*/’ and configure it’s appearance via the [Styler] button –  I chose gray italics, though you may prefer light green.

Below that is Number style – I’ll return to this later.

Highlighting Keywords

Configuring keywords in UDL

The KeywordsTab

Next, click on the UDL dialog’s [Keywords List]. This is where you will basically type in groups of words that you style a particular way. Whooaa!, slow down, don’t enter any yet. Spend a few minutes thinking about this. You could literally group by colour you intend giving to a particular keyword. Fine. But, an alternative is to consider grouping by the keyword’s type, then styling by that; sure, you may find two groups end up defined with the same appearance, but makes it much easier to redefine colour schemes later. I’m a (semi)organised chap, so went with the latter, but bear in mind the maximum of eight groups will be a limiting factor (though it is more than enough if you choose the first method and simply group by appearance).

Anyhow, I chose the following eight groups; I’m sure some, but not all, will apply to your language. They are

1) Definitions. “Define“, “Include“, that sort of thing. If you are going to implement code folding (discussed later) then dont include e.g. start and end of function blocks in this list.

2) Conditional statements. ‘if‘, ‘while‘, whatever. In my case there are dozens of varieties of conditional statements.

3) Arithmetic. add, div(ide), sin, cos, etc. Now, I included add and div keywords here due to the way my example script works, however for normal programming +, /, *, etc use the [Operators & Delimiters] tab (discussed later).

4) Variables definition (var, array etc)

5) This one will be peculiar to my requirements, but anyhow I used this for keywords related to building maps (a map being a space that a player can walk inside a video game).

6) General functions. Drawing lines, reading the keyboard, string copy, etc

7) Stuff related to sounds (hey, this language is related to video games)

8) Stuff that perhaps should not be in release code, such as keywords that have been marked as deprecated. I styled these with a different background colour (solid red so they really stand out!) . You could include debug-related keywords too – but I will handle these separately later on.

Anyhow, now it is just a process of putting your keywords in the appropriate group. In the screenshot above I put each keyword on a new line, though simply using a space character as a separator works just as well.

The more inquisitive of you may well be wondering what that [Prefix mode] checkbox is. Well it means the keyword just needs to be the starting characters e.g. if you untick this box and set a keyword of “dont” then only the exact word “dont” (give or take case sensitivity) will be treated as keywords. However if you tick this box then that one keyword will also match dontgivemegrief, dontmakemylifemisery, dontforcewindows10onme, etc.

Configuring Numbers

Now for a tricky bit. Numbers. By default, decimal numbers are already recognised. So, on the [Comment & Number] tab, go straight to the [Number Style]’s [Styler] button and set it accordingly – I chose red.

Pound to a penny you have hex numbers too, right ? Well, we need to do a bit of extra work there. First the wrong (or 90% right) way:

Like many languages, my script file precedes hex numbers with ’0x’, so on the [Prefix 1] box, enter ’0x’. Yay ! Numbers such as 0×1234 and 0×9876 are now styled properly too. Ah, but wait a mo, 0x12EF doesn’t work, nor 0xabcd. Let’s guess, that is what the [Extras 1] box if for, it’s for characters that should also be considered a part a number, So, lets put in ‘a b c d …’, including uppercase chars as well to be on the safe side. AND … (drum roll, ka-tisch!) … nothing happens :-(

After much faffing, googling and with ears still ringing with the sound of 10p coins dropping in the swear box, I stumbled across someone who had worked this one out (refer this thread, comment # 8 - thanks Sérgio!). Yes, you put the prefix ’0x’ in the [Prefix 2] box, with the extra characters in the [Extras 1] box !

Configuration should look like this ...

Configuration should look like this …

... or may look like this. You may need to dock the dialog in order to reach this part of the dialog!

… or may look like this. You may need to dock the dialog in order to reach this part of the dialog!

Strings – or Lets Play With Delimiters

C-style string recognition

Setting up C-Style string recognition

What more can we do ? I know, strings. Most languages have strings contained in quotes, so lets handle those. On the [Operators & Delimiters] tab, in the [Delimiter 1 style] group, specify the quote character in with the [Open:] and [Close:] fields.But strings are cleverer than this; in many other languages, you may have a quote character inside the string and it is handled by a preceding escape character such as a backslash. Well, add \” to the [Escape:] field, job done !

Whilst we are on this tab, you should notice that you can style operators, that is the usual “+ – / * &” etc that pretty much every language uses (I think my script is the only example where there are no such things!).

Group Multiple Delimiters

Whilst on this page, time to deal with my language’s debug keywords. I decided to handle them differently from simple keywords as in my case when a debug statement is found I wanted to override all other keywords on the remained of the line. I’m also going to highlight their backgrounds but in a different colour to the deprecated keywords I defined earlier (in my 8th group of keywords). That was my choice, your language may be different. Anyhow, this illustrates how to do multiple fields.

In the [Delimiter 2 style:], for the [Open:] field, i entered “addlogvar Addlog debug“, these being the three debug commands my script uses. I want this to affect all the way to the end of the line, so we need to specify end-of-line for the [Close:] field. How ? That was a tuffy, but is seems “((EOL))” is what you need.

Folding

“What’s folding?” you cry ? Well, it’s that really useful ability in notepad++ and, indeed, many other text editors to collapse blocks of code down, hiding the clutter whilst you concentrate on the bit you are working on.

Code collapsing in action

Code collapsing in action

Fantastic. So, let’s click on the [Folder & Default] tab. There are three categories, [Folding in comment style], [Folding in code 1 style] and [Folding in code 2 style (separators needed)]. Lets ignore comments for now and concentrate on code 1 and code 2. Although it depends on your language, you want the code 2 group.

In my particular script I have four nestable code blocks:

  • { and } (common to many languages)
  • defstate / ends
  • onevent / endevent
  • switch / endswitch

So, in the [Open:] field I put “{ defstate onevent switch” and in the [Close:] field I put “} endswitch ends endevent“. Important: I entered “endswitchbeforeends“, otherwise “endswitch” would have been recognised as “ends” and not look right.

The final settings

Perhaps you are wondering why use the code 2 group and not the code 1 group ? Because code 2 matches exact words (give or take case sensitivity) whilst the code 1 group matches by start of word. For example if I had (in my case fictional) “endsound” keyword then using code group 1 it would recognise the “ends” bit of “endsound“.

Backup – or Packup

You’ll no doubt want to backup your UDL, hey, even share your masterpiece with the rest of the world. Easy, there is an [Export…] button at the top of the UDL dialog.

There’s More

I think you can go further with UDL’s, perhaps provide options such as white or black backgrounds and probably a lot more besides – some of the UDL’s on the notepad++ wiki include additional stylers.xml and langs.xml files and there has to be a reason. Maybe I’ll do a part two to this post, but right now the kettle’s boiled and I’m gasping for a cuppa, see ya.

 

 

 

2 comments

  1. Daniel Atwill says:

    Thank you so much, especially for the ((EOL)) comment about delimiters. I’ve been wanting this for a while!

  2. Darin says:

    Thank you! Thank you! I don’t know how many times I’ve tried to create a UDL in the last 5 or so years and I always hit some part I just didn’t understand (Actually, I still don’t fully understand why we use “Prefix 2:” instead of “Prefix 1:”). Thanks to your work, I now have the best UDL I’ve made. Also, I want to make my own custom VBA UDL and discovered in “Folding in code 2″ that setting “Open:” to ["Property Get" "Property Let" "Property Set" "Function" "Sub" "If " "For " "Do " "Select Case"] and “Close:” to ["End Property" "End Property" "End Property" "End Function" "End Sub" "End If" "Next " "Loop" "End Select"] works!

Leave a Reply to Darin Cancel reply