org.dbpedia.extraction.util

WikiUtil

object WikiUtil

Contains several utility functions related to WikiText.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. WikiUtil
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  5. def cleanSpace(name: String): String

    replace underscores by spaces, replace non-breaking space by normal space, remove exotic whitespace, normalize duplicate spaces, trim whitespace (any char <= U+0020) from start and end.

    replace underscores by spaces, replace non-breaking space by normal space, remove exotic whitespace, normalize duplicate spaces, trim whitespace (any char <= U+0020) from start and end.

    Also see WikiTitle.parse().

    TODO: better treatment of U+20xx: remove some, replace some by space, others by LF

    FIXME: There is no logic to our decoding / encoding of strings, URIs, etc. It's done in too many places. We must set a policy and use distinct classes, not generic strings.

    name

    string possibly using '_' instead of ' '

  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  11. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  12. def iriReplacements: Array[String]

    Replacement string array for StringUtils.escape SH: I added ^?" not sure why they were removed in the first place

  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  17. def removeWikiEmphasis(text: String): String

    Removes Wiki emphasis.

    Removes Wiki emphasis.

    text
    returns

    The given text without the wiki emphasis

  18. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  19. def toString(): String

    Definition Classes
    AnyRef → Any
  20. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. def wikiDecode(name: String): String

    name

    encoded MediaWiki page name, e.g. '%C3%89mile_Zola'. Must not include the namespace (e.g. 'Template:').

  24. def wikiEncode(name: String): String

    Replaces multiple spaces (U+0020) by one, removes spaces from start and end, replaces spaces by underscores, and percent-encodes the following characters:

    Replaces multiple spaces (U+0020) by one, removes spaces from start and end, replaces spaces by underscores, and percent-encodes the following characters:

    TODO CENTRAL STRING MANAGEMENT "#%<>?[\]^{|}

    The result is usable in most parts of a IRI. The ampersand '&' is not escaped though.

    Should only be used for canonical MediaWiki page names. Not for fragments, not for queries.

    TODO: a canonical MediaWiki page name does not contain multiple spaces. We should not clean spaces but simply throw an exception if the name is not canonical.

    name

    Canonical MediaWiki page name, e.g. 'Émile Zola'

Inherited from AnyRef

Inherited from Any

Ungrouped