org.dbpedia.extraction

sources

package sources

Visibility
  1. Public
  2. All

Type Members

  1. class CompositeSource extends Source

    A source which is composed of multiple child sources.

    A source which is composed of multiple child sources. Iterates through all pages of all child sources.

  2. class FileSource extends Source

    Reads wiki pages from text files in the file system.

    Reads wiki pages from text files in the file system.

    Exceptions thrown
    FileNotFoundException

    if the given base could not be found

  3. class MemorySource extends Source

    A source which yields pages from a user-defined container

  4. trait Source extends Traversable[WikiPage] with Serializable

    A source of wiki pages.

    A source of wiki pages. TODO: do we need this class? Methods that have a paramater with this type are not as flexible as they could be, for example they cannot be called with List(page). The only advantage of this class seems to be the definition hasDefiniteSize = false, which can easily be added to any Traversable implementations that actually need it. We should clearly document in which cases hasDefiniteSize = false is useful or necessary. The Scala documentation doesn't help much here. Traversable also defines all kinds of methods - for example size() - that call foreach() and iterate over the whole collection, which for most of our sources is a huge waste. Even calling head() causes a big overhead, for example a file or a network connection is opened. Is there a more appropriate Scala type?

  5. class WikipediaDumpParser extends AnyRef

Value Members

  1. object WikiSource

    Fetches pages from a MediaWiki.

  2. object XMLSource extends Serializable

    Loads wiki pages from an XML stream using the MediaWiki export format.

    Loads wiki pages from an XML stream using the MediaWiki export format.

    The MediaWiki export format is specified by http://www.mediawiki.org/xml/export-0.4 http://www.mediawiki.org/xml/export-0.5 http://www.mediawiki.org/xml/export-0.6 http://www.mediawiki.org/xml/export-0.8 etc.

Ungrouped