Prepares a string of plain text for inclusion into an HTML page; makes links clickable, encodes to HTML, and so on.
When showing a plain text document (e.g. an email message) inside a HTML page, it is very often desirable to make links clickable, and do other adjustments. This program provides a function to do just that.
〈Overview〉 ≡
〈Parse rules to find URLs inside some text〉
〈Available options〉
text-to-html: func [
"Prepare a plain text string for inclusion in HTML"
text [string!]
/with options [block!] "Specify options for the conversion"
/local 〈text-to-html's locals〉
] [
〈Find URLs in the text and turn them into links, encode everything else to HTML〉
]
The /with refinement can be used to specify the following options:
〈Available options〉 ≡
default-options: context [
utf8: no
custom-handler: none
]
Not as efficient as I'd like it to be...
〈Find URLs in the text and turn them into links, encode everything else to HTML〉 ≡
options: make default-options any [options [ ]]
output: make string! length? text
encoding: either options/utf8 ['html-utf8] ['html-ascii]
custom-handler?: any-function? get in options 'custom-handler
parse/all text [
copy str any [punct | space-char] (if str [encode-text/to str encoding output])
any [
copy str url-rule (
either custom-handler? [
options/custom-handler obj: make parse-uri join add-scheme str [
target: none
contents: encode-text str encoding
]
append output {<a href="}
encode-text/to form-uri obj encoding output
append output #"^""
if string? obj/target [
append output { target="}
encode-text/to obj/target encoding output
append output #"^""
]
repend output [
#">" obj/contents </a>
]
] [
insert insert tail output {<a href="} add-scheme
encode-text/to str encoding output
append output {">}
encode-text/to str encoding output
append output "</a>"
]
)
copy str any [punct | space-char] (if str [encode-text/to str encoding output])
|
copy str text-rule (encode-text/to str encoding output)
]
]
output
〈text-to-html's locals〉 ≡
output str encoding custom-handler? obj
〈Parse rules to find URLs inside some text〉 ≡
scheme: [
"http://" | "https://" | "ftp://" | "mailto:"
]
punct-char: charset {.,!()[];:?{}'"<>}
punct: [punct-char | "¿" | "¡"] ; please note that ¿ and ¡ are strings because they are actually UTF-8 sequences
unreserved: union alpha-char union digit charset "-_~/$&*+="
unreserved+: union unreserved charset "@%"
name-or-host: [some unreserved any [some punct some unreserved]]
url-rule: [
[scheme (add-scheme: "") | "www." (add-scheme: "http://") | "ftp." (add-scheme: "ftp://")]
some unreserved+ any [some punct some unreserved+]
|
name-or-host #"@" name-or-host (add-scheme: "mailto:")
]
non-space: complement space-char
text-rule: [
some non-space any [space-char | punct]
]