Contents:

1. Introduction

When showing a plain text document (e.g. an email message) inside a HTML page, it is very often desirable to make links clickable, and do other adjustments. This program provides a function to do just that.

2. Overview

Overview

Parse rules to find URLs inside some text
Available options

text-to-html: func [
 "Prepare a plain text string for inclusion in HTML"
 text [string!]
 /with options [block!] "Specify options for the conversion"
 
 /local text-to-html's locals
] [
 Find URLs in the text and turn them into links, encode everything else to HTML
]

2.1 Available options

The /with refinement can be used to specify the following options:

utf8
set to true to use the html-utf8 encoding instead of the html-ascii encoding for all text.
custom-handler
you can set it to a function that will be called to handle all links in the text. The function must take one argument, an object, that the function can modify. The argument is the object returned by parse-uri (which is called on the link being processed), with the addition of the fields target and contents. The former can be used to set the "target" attribute of the resulting <a> tag; the latter can be used to change the link text, and by default it is the original URL as it appears on the text. Please note that you are responsible for the correct HTML encoding of the contents string, as you are allowed to put arbitrary HTML in there.

Available options

default-options: context [
 utf8: no
 custom-handler: none
]

3. Find URLs in the text and turn them into links, encode everything else to HTML

Not as efficient as I'd like it to be...

Find URLs in the text and turn them into links, encode everything else to HTML

options: make default-options any [options [ ]]
output: make string! length? text
encoding: either options/utf8 ['html-utf8] ['html-ascii]
custom-handler?: any-function? get in options 'custom-handler
parse/all text [
 copy str any [punct | space-char] (if str [encode-text/to str encoding output])
 any [
  copy str url-rule (
   either custom-handler? [
    options/custom-handler obj: make parse-uri join add-scheme str [
     target: none
     contents: encode-text str encoding
    ]
    append output {<a href="}
    encode-text/to form-uri obj encoding output
    append output #"^""
    if string? obj/target [
     append output { target="}
     encode-text/to obj/target encoding output
     append output #"^""
    ]
    repend output [
     #">" obj/contents </a>
    ]
   ] [
    insert insert tail output {<a href="} add-scheme
    encode-text/to str encoding output
    append output {">}
    encode-text/to str encoding output
    append output "</a>"
   ]
  )
  copy str any [punct | space-char] (if str [encode-text/to str encoding output])
  |
  copy str text-rule (encode-text/to str encoding output)
 ]
]
output

3.1 text-to-html's locals

text-to-html's locals

output str encoding custom-handler? obj

4. Parse rules to find URLs inside some text

Parse rules to find URLs inside some text

scheme: [
 "http://" | "https://" | "ftp://" | "mailto:"
]
punct-char: charset {.,!()[];:?{}'"<>}
punct: [punct-char | "¿" | "¡"] ; please note that ¿ and ¡ are strings because they are actually UTF-8 sequences
unreserved: union alpha-char union digit charset "-_~/$&*+="
unreserved+: union unreserved charset "@%"
name-or-host: [some unreserved any [some punct some unreserved]]
url-rule: [
 [scheme (add-scheme: "") | "www." (add-scheme: "http://") | "ftp." (add-scheme: "ftp://")]
 some unreserved+ any [some punct some unreserved+]
 |
 name-or-host #"@" name-or-host (add-scheme: "mailto:")
]
non-space: complement space-char
text-rule: [
 some non-space any [space-char | punct]
]