Strings are null terminated sequences of bytes representing sequences of
characters.
The usual ASCII characters are represented with a single bytes. Some
characters are represented with multiple bytes. Most Lush functions deal
with strings as sequences of bytes without regard to their character
interpretation. Exceptions to this rule are indicated when appropriate.
The textual representation of a string is composed of the characters
enclosed between double-quotes. A string may contain macro-characters,
parentheses, semi-colons, as well as any other character. A line
terminating backslash indicates a multi-line string.
The following ``C style'' escape sequences are recognized inside a
string:
- \\ for a single
backslash,
- \" for a double quote,
- \n , \r
, \t , \b
, \f respectively for a linefeed
character (Ascii LF), a carriage return (Ascii CR), a tab character
(Ascii TAB), a backspace character (Ascii BS), and a formfeed character
(Ascii FF),
- \e for a
end-of-file character (Stdio's EOF),
- \^? for a control character
control-? ,
- \ooo for a byte whose octal
representation is ooo .
- \xhh for a byte whose hexadecimal
representation is hh .
- \uhhhh or
\Uhhhhhh for the representation of unicode character
hhhh or hhhhhh in the
current locale. If no such representation exists, the utf8
representation is used.
3.7.0. Basic String Functions
|
|
Like most Lush functions, the basic functions operating on strings do
not modify their arguments. They create instead a new string on the
basis of their arguments.
See: (> n1
n2 )
3.7.0.0. (concat s1 ... sn)
|
[DX] |
Concatenates strings s1 to
sn .
Example:
? (concat "hello" " my friends")
= "hello my friends"
Returns the number of bytes in string s
.
Example:
? (len "abcd")
= 4
3.7.0.2. (mid s n [l])
|
[DX] |
Returns a substring of s composed of
l bytes starting at byte position n
. The position n is a number between 1
and the byte length of the string minus 1. When argument
l is ommitted, function mid
returns characters until the end of the string s
.
Example:
? (mid "alphabet" 3 2)
= "ph"
? (mid "alphabet" 3)
= "phabet"
3.7.0.3. (right s n)
|
[DX] |
Returns a string composed with n
rightmost bytes of s .
Example:
? (right "alphabet" 3)
= "bet"
Returns a string composed with the n
leftmost bytes of s .
Example:
? (left "alphabet" 3)
= "alp"
3.7.0.5. (strins s1 n s2)
|
[DX] |
Insert string s2 at byte
n into the string s1 , and
returns the result. When n is equal to
0, the strins function actually
concatenates s2 and
s1 .
Example:
? (strins "alphabet" 3 "***")
= "alp***habet"
3.7.0.6. (strdel s1 n l)
|
[DX] |
Removes l bytes from string
s1 starting at byte offset n
.
Example:
? (strdel "alphabet" 3 2)
= "alabet"
3.7.0.7. (index s r [n])
|
[DX] |
Searches the first occurrence of the string s
in the string r , starting at byte
position n .
index returns the position of the first match. If such an
occurrence cannot be found, it returns the empty list.
Example:
? (index "pha" "alpha alphabet alphabetical" 4)
= 9
Returns string s with all characters
converted to uppercase according to the current locale.
Example:
? (upcase "alphabet")
= "ALPHABET"
3.7.0.9. (upcase1 s)
|
[DX] |
Returns string s with first character
converted to uppercase according to the current locale.
Example:
? (upcase1 "alphabet")
= "Alphabet"
3.7.0.10. (downcase s)
|
[DX] |
Returns string s with all characters
converted to lowercase according to the current locale.
Example:
? (downcase "alPHABet")
= "alphabet"
Returns the numerical value of s
considered as a number. Returns the empty list if
s does not represent a decimal or hexadecimal number.
Example:
? (val "3.14")
= 3.14
? (val "abcd")
= ()
? (val "0xABCD")
= 43981
Returns the decimal string representation of the number
n .
Example:
? (str (2* 3.14))
= "6.28"
3.7.0.13. (strhex n)
|
[DX] |
Returns the hexadecimal string representation of integer number
n .
Example:
? (strhex 18)
= "0x12"
3.7.0.14. (strgptr p)
|
[DX] |
Returns the hexadecimal string representation of pointer
p preceded by an ampersand.
Returns the value the first byte of string s
. This function causes an error if s
is an empty string.
Example
? (asc "abcd")
= 97
Returns a string containing a single byte whose value is
n . Integer n must be in
range 0 to 255.
Example
? (chr 48)
= "0"
3.7.0.17. (isprint s)
|
[DX] |
Returns t if string
s contains only printable charactersa according to the
current locale.
Example:
? (isprint "alpha bet")
= t
? (isprint "alpha\^Cbet")
= ()
Returns a string representation for the lisp object
l . pname is able to give a
string representation for numbers, strings, symbols, lists, etc...
Example:
? (pname (cons 'a '(b c)))
= "(a b c)"
3.7.0.19. (sprintf format ... args ... )
|
[DX] |
Like the C language function sprintf ,
this function returns a string similar to a format string
format . The following escape sequences, however are replaced
by a representation of the corresponding arguments of
sprintf :
- "%%" is replaced by a
single \%.
- "%l" is replaced by a
representation of a lisp object.
- "%[-][n]s" is replaced by
a string, right justified in a field of length n
if n is specified. When the optional
minus sign is present, the string is left justified.
- "%[-][n]d" is replaced by
an integer, right justified in a field of n
characters, if n is specified. When
the optional minus sign is present, the string is left justified.
- "%[-][n[.m]]c"
where c is one of the characters
e , f or
g , is replaced by a floating point number in a
n character field, with m
digits after the decimal point. e
specifies a format with an exponent, f
specifies a format without an exponent, and g
uses whichever format is more compact. When the optional minus sign is
present, the string is left justified.
Example:
? (sprintf "%5s(%3d) is equal to %6.3f\n" "sqrt" 2 (sqrt 2))
= " sqrt( 2) is equal to 1.414\n"
3.7.0.20. (strip s)
|
[DE] (sysenv.lsh) |
This function deletes the leftmost and rightmost spaces in string
s .
(strip " This sentences is full of spaces. ")
3.7.0.21. (stripl s)
|
[DE] (sysenv.lsh) |
This function deletes the leftmost spaces in string
s .
(stripl " This sentences is full of spaces. ")
3.7.0.22. (stripr s)
|
[DE] (sysenv.lsh) |
This function deletes the rightmost spaces in string
s .
(stripr " This sentences is full of spaces. ")
3.7.1. Regular Expressions (regex)
|
|
A regular expression describes a family of strings built according to
the same pattern. A regular expression is represented by a string which
``matches'' (using certain conventions) any string in the family. TL
provides four regular expression primitives (
regex-match , regex-extract
, regex-seek , and
regex-subst ) and several library functions.
The conventions for describing regular expressions in Lush are quite
similar to those used by the egrep
unix utility:
- An ordinary character matches itself. Some
characters, ( )
\ [
] |
. ?
* and \ have a special
meaning, and should be quoted by prepending a backslash
\ . The string "\\\\"
actually is composed of two backslashes (because backslashes in strings
should be escaped!), and thus matches a single backslash.
- A dot . matches any byte.
- A caret ^ matches the beginning
of the string.
- A dollar sign $ matches the end
of the string.
- A range specification matches any specified byte. For example,
regular expression [YyNn] matches
Y y
N or n , regular expression
[0-9] matches any digit, regular expression
[^0-9] matches any byte that is not a digit, regular
expression []A-Za-z] matches a closing
bracket, or any uppercase or lowercase letter.
- The concatenation of two regular expressions matches the
concatenation of two strings matches regular expression. Regular
expressions can be grouped with parenthesis, and modified by the
? + and
* characters.
- A regular expression followed by a question mark
? matches 0 or 1 instance of the single regular expression.
- A regular expression followed by a plus sign
+ matches 1 or more instances of the single regular
expression.
- A regular expression followed by a star *
matches 0 or more instances of the single regular expression.
- Finally, two regular expressions separated by a bar | match any
string matching the first or the second regular expression.
Parenthesis can be used to group regular expressions. For instance, the
regular expression "(+|-)?[0-9]+(\.[0-9]*)?"
matches a signed number with an optional fractional part. Furthermore,
there is a ``register'' associated with each parenthesized part of a
regular expression. The matching routines use these registers to keep
track of the characters matched by the corresponding part of the regular
expression. This is useful with functions
regex-extract and regex-subst
.
3.7.1.0. (regex-match r s)
|
[DX] |
Returns t if regular expression
r exactly matches the entire string s
. Returns the empty list otherwise.
Example:
? (regex-match "(+|-)?[0-9]+(\\.[0-9]*)?" "-56")
= t
3.7.1.1. (regex-extract r s)
|
[DX] |
If regular expression r matches the
entire string s , this function
returns a list of strings representing the contents of each register,
that is to say the characters matched by each section of the regular
expression r delimited by parenthesis.
This is useful for extracting specific segments of a string.
If the regular expression r does not
match the string s , function
regex-extract returns the empty list. If the regular
expression r matches the string but
does not contain parenthesis, this function retirns a list containing
the initial string s .
Example:
? (regex-extract "(+|-)?([0-9]+)(\\.[0-9]*)?" "-56.23")
= ("-" "56" ".23")
3.7.1.2. (regex-seek r s [start])
|
[DX] |
Searchs the first substring in s that
matches the regular expression r ,
starting at position start in
s . If the argument start
is not provided, string s is searched
from the beginning.
If such a substring is found, regex-seek
returns a list (begin length) , where
begin is the index of the first character of the substring,
and length is the length of the
subscript. The instruction (mid s begin length)
may be used to extract this substring.
If no such substring exists, regex-seek
returns the empty list.
Example:
? (regex-seek "(+|-)?[0-9]+(\\.[0-9]*)?," "a=56.2, b=57,")
= (3 5)
3.7.1.3. (regex-subst r s str)
|
[DX] |
Replaces all substring matching regular expression
r in string str by string
s .
A ``register'' is associated to each piece of the regular expression
r enclosed within parenthesis. Registers are numbered from
%0 to %9 . During each
match, the substring of str matching
each piece of the regular expression is stored into the corresponding
register.
During the replacement process, characters %0
to %9 in the replacement string
s are substited the content of the corresponding register. (A
single % is denoted as
%% ).
Example:
? (regex-subst "([a-h])([1-8])" "%1%0" "e2-e4, d7-d5, d2-d4, d5xd4?")
= "2e-4e, 7d-5d, 2d-4d, 5dx4d?"
3.7.1.4. (regex-rseek r s [n [gr]])
|
[DE] (sysenv.lsh) |
This function seeks recursively the first occurence of
r in s . and returns the
list made of the locations.
When argument n is provided, it seeks
and returns the locations of the n
first occurences and it returns () on
failure.
Optional regex gr defines the allowed
garbage stuff before and between occurences. When
n is not provided, this function checks the garbage stuff
after the occurences too. If unallowed garbage stuff is found, the
function returns () . By default, any
garbage stuff is allowed.
Since even void garbage is checked, a caret "^" is often added to
gr .
3.7.1.5. (regex-split r s [n [gr [neg]]])
|
[DE] (sysenv.lsh) |
This function splits a string s into
occurences of r .
When integer n is provided, this
function provides only the n first
occurences.
When regex gr is provided, garbage is
checked (see function regex-rseek ).
When neg is provided and non nil, this
function returns the garbage stuff instead. When both
n and neg are provided and
non nil, the n garbages before and
between the n first occurences are
returned.
3.7.1.6. (regex-skip r s [n [gr [neg]]])
|
[DE] (sysenv.lsh) |
This function skips the n first
occurences of regex r in a string
s .
When n is equal to 0, it returns
s . When n is lower than 0,
it generates an error. When n is
either nil or undefined, it is set to 1.
When neg is either nil or undefined,
it returns the right residual of s
just following the n th occurence.
When neg is not nil, it returns the
right residual of s begining with the
n th occurence.
When regex gr is provided, garbage is
checked (see function regex-rseek ).
3.7.1.7. (regex-count r s)
|
[DE] (sysenv.lsh) |
This function recursively seeks the occurences of regex
r in string s and returns
the number of occurences found.
3.7.1.8. (regex-tail r s [n [gr [neg]]])
|
[DE] (sysenv.lsh) |
This function seeks recursively the occurences of regex
r in string s .
When neg is either nil or undefined,
it returns the right residual of s
begining before the n th last
occurence.
When neg is non nil, it returns the
right residual of s begining after the
n th last occurence (and thus begining before the
n th garbage.
When n is either nil or undefined, it
is set to 1.
When regex gr is provided, garbage is
checked (see function regex-rseek ).
3.7.1.9. (regex-member rl s)
|
[DE] (sysenv.lsh) |
This function returns the first member of list rl
which is a matching regex for string s
.
3.7.2. International Strings
|
|
Lush contains partial support for multibyte strings using an encoding
specified by the locale. This is work in progress.
3.7.2.0. (locale-to-utf8 s)
|
[DX] |
Converts a string from locale encoding to UTF-8 encoding. This is a best
effort function: The unmodified string is returned if the conversion is
impossible, either because the string s
is incorrect, or because the system does not provide suitable conversion
facilities.
3.7.2.1. (utf8-to-locale-to s)
|
|
Converts a string from UTF-8 encoding to locale encoding. This is a best
effort function: The unmodified string is returned if the conversion is
impossible, either because the string s
is incorrect, or because the system does not provide suitable conversion
facilities.
3.7.2.2. (explode-chars s)
|
[DX] |
Returns a list of integers with the wide character codes of all
characters in the string. This function interprets multibyte sequences
according to the encoding specified by the current locale.
Example (under a UTF8 locale):
? (explode-chars "\xe2\x82\xac")
= (8364)
3.7.2.3. (implode-chars l)
|
[DX] |
Returns a string composed of the characters whose wide character code
are specified by the list of integers l
. Multibyte characters are generated according to the current locale.
For instance, under a UTF8 locale,
Example
? (implode-chars '(8364 50 51 46 53 32 61 32 32 162 50 51 53 48))
= "€23.5 = ¢2350"
3.7.2.4. (explode-bytes s)
|
[DX] |
Returns a list of integers representing the sequence of bytes in string
s , regardless of their character interpretation.
Example
? (explode-bytes "€")
= (226 130 172)
3.7.2.5. (implode-bytes l)
|
[DX] |
Assemble a string composed of the bytes whose value is specified by the
list of integers l , regardless of
their multibyte representation.
Example
? (implode-bytes '(226 130 172 50 51))
= "€23"