replace underscores by spaces, replace non-breaking space by normal space, remove exotic whitespace, normalize duplicate spaces, trim whitespace (any char <= U+0020) from start and end.
replace underscores by spaces, replace non-breaking space by normal space, remove exotic whitespace, normalize duplicate spaces, trim whitespace (any char <= U+0020) from start and end.
Also see WikiTitle.parse().
TODO: better treatment of U+20xx: remove some, replace some by space, others by LF
FIXME: There is no logic to our decoding / encoding of strings, URIs, etc. It's done in too many places. We must set a policy and use distinct classes, not generic strings.
string possibly using '_' instead of ' '
Replacement string array for StringUtils.escape
SH: I added ^?" not sure why they were removed in the first place
Removes Wiki emphasis.
Removes Wiki emphasis.
The given text without the wiki emphasis
encoded MediaWiki page name, e.g. '%C3%89mile_Zola'. Must not include the namespace (e.g. 'Template:').
Replaces multiple spaces (U+0020) by one, removes spaces from start and end, replaces spaces by underscores, and percent-encodes the following characters:
Replaces multiple spaces (U+0020) by one, removes spaces from start and end, replaces spaces by underscores, and percent-encodes the following characters:
TODO CENTRAL STRING MANAGEMENT
"#%<>?[\]^{|}
The result is usable in most parts of a IRI. The ampersand '&' is not escaped though.
Should only be used for canonical MediaWiki page names. Not for fragments, not for queries.
TODO: a canonical MediaWiki page name does not contain multiple spaces. We should not clean spaces but simply throw an exception if the name is not canonical.
Canonical MediaWiki page name, e.g. 'Émile Zola'
Contains several utility functions related to WikiText.