org.dbpedia.extraction

util

package util

Visibility
  1. Public
  2. All

Type Members

  1. class ByteLogger extends (Long, Boolean) ⇒ Unit

    Logs read bytes.

    Logs read bytes. Meant to be used with CountingInputStream.

  2. class CountingInputStream extends InputStream

    Counts read bytes, sends number to callback function.

  3. class CssConfigurationMap extends JsonConfig

    Created by Chile on 1/30/2017.

  4. class Date extends Ordered[Date]

  5. class DateFinder[T] extends AnyRef

  6. class ExtractionRecorder[T] extends AnyRef

    Created by Chile on 11/3/2016.

  7. abstract class FileLike[T] extends AnyRef

    Allows common handling of java.io.File and java.nio.file.Path

  8. class FileProcessor extends AnyRef

    Recursively iterates through a directory and calls a user-defined function on each file.

    Recursively iterates through a directory and calls a user-defined function on each file.

    Exceptions thrown
    FileNotFoundException

    if the given base could not be found

  9. class Finder[T] extends AnyRef

    Helps to find files and directories in a directory structure as used by the Wikipedia dump download site, for example baseDir/enwiki/20120403/enwiki-20120403-pages-articles.xml.bz2

    Helps to find files and directories in a directory structure as used by the Wikipedia dump download site, for example baseDir/enwiki/20120403/enwiki-20120403-pages-articles.xml.bz2

    TODO: wikiNameSuffix doesn't belong here, it should be part of the Language class (which should be renamed to WikiCode or so)

  10. class JsonConfig extends AnyRef

  11. class Language extends Serializable

    Represents a MediaWiki instance and the language used on it.

    Represents a MediaWiki instance and the language used on it. Initially, this class was only used for xx.wikipedia.org instances, but now we also use it for mappings.dbpedia.org and www.wikidata.org. For each language, there is only one instance of this class. TODO: rename this class to WikiCode or so, distinguish between enwiki / enwiktionary etc.

  12. class LazyWikiCaller extends WikiCaller

  13. class MediaWikiConnector extends AnyRef

    The Mediawiki API connector

  14. class ProxyAuthenticator extends Authenticator

  15. class RecordEntry[T] extends AnyRef

    This class provides the necessary attributes to record either a successful or failed extraction

  16. class RichFile extends FileLike[File]

    Defines additional methods on Files, which are missing in the standard library.

  17. class RichPath extends FileLike[Path]

  18. class RichReader extends AnyRef

  19. class RichStartElement extends AnyRef

    Wrapper class for StartElement so we can use attr and getAttr.

  20. class RichString extends AnyRef

    Defines additional methods on strings, which are missing in the standard library.

  21. class RichWebResource extends FileLike[IRI]

    Provides the same flexibility as RichFile for web resources No output stream available!

    Provides the same flexibility as RichFile for web resources No output stream available!

    Created by Chile on 1/30/2017.

  22. class StringPlusser extends AnyRef

  23. class TransitiveClosure[T] extends AnyRef

    Resolves transitive relations in a graph and removes cycles.

  24. class TurtleEscaper extends AnyRef

    Escapes a Unicode string according to Turtle / N-Triples format.

    Escapes a Unicode string according to Turtle / N-Triples format. TODO: allow StringBuilder to be null, create one if necessary.

  25. class WikiApi extends AnyRef

    Executes queries to the MediaWiki API.

    Executes queries to the MediaWiki API.

    TODO: replace this class by code adapted from WikiDownloader.

  26. class WikiCaller extends AnyRef

    Calls a Wikipedia URL, handles redirects to a different language version, processes the response.

  27. class WikiDisambigReader extends AnyRef

    Reads result of the api.php query above.

  28. class WikiDownloader extends AnyRef

    Downloads all pages for a given list of namespaces from api.php and transforms them into the format of the dump files (because XMLSource understands that format).

    Downloads all pages for a given list of namespaces from api.php and transforms them into the format of the dump files (because XMLSource understands that format).

    TODO: extend this class a bit and replace the XML-handling code in WikiApi.

  29. class WikiInfo extends AnyRef

    Information about a Wikipedia.

  30. class WikiPageEntry extends RecordEntry[WikiPage]

  31. class WikiSettings extends AnyRef

  32. class WikiSettingsReader extends AnyRef

    Reads result of the api.php query above.

    Reads result of the api.php query above.

    Note: we use linked sets and maps to preserve order. Scala currently has no immutable linked collections, so we use mutable ones (which should also improve performance). Calling .toMap to make them immutable would destroy the order, so we simply return them, but as an immutable interface. Malicious users could still downcast and mutate. Meh.

  33. trait Worker[T <: AnyRef] extends AnyRef

  34. class Workers[T <: AnyRef] extends Closeable

    A simple fixed size thread-pool.

    A simple fixed size thread-pool.

    TODO: If a worker thread dies because of an uncaught exception, it just goes away and we may not fully use all CPUs. Maybe we should start a new worker thread? Or use a thread pool who does that for us? On the other hand - what about worker.init() and worker.destroy()? We probably don't want to call them twice. No, I guess it's better to let the thread die. Users can always catch Throwable in their implementation of Worker.process().

    FIXME: If all worker threads die because of uncaught exceptions, the master thread will probably still add tasks to the queue and block forever. When a worker thread dies, it should count down the number of live threads and if none are left interrupt the master thread if it is blocking in process(). But what if there are multiple master threads? Ough. We need more ways to communicate between masters and workers...

  35. class XMLEventAnalyzer extends AnyRef

    Wraps an XMLEventReader in a fluent API.

  36. class XMLEventBuilder extends AnyRef

    Wraps an XMLEventWriter in a fluent API.

Value Members

  1. object Date

  2. object ExtractorUtils

    User: Dimitris Kontokostas Various utils for loading Extractors I don't like this so much but it's the only way to reuse extraction configuration code on multiple modules (dump / server) Created: 5/19/14 11:06 AM

  3. object IOUtils

    TODO: modify the bzip code such that there are no run-time dependencies on commons-compress.

    TODO: modify the bzip code such that there are no run-time dependencies on commons-compress. Users should be able to use .gz files without having commons-compress on the classpath. Even better, look for several different bzip2 implementations on the classpath...

  4. object InfoboxMappingsUtils

    Created by aditya on 6/21/16.

  5. object JsonConfig

  6. object Language extends (String) ⇒ Language with Serializable

  7. object MappingsDownloader

    Download mapping pages for all namespaces from http://mappings.dbpedia.org/ and transform them into the format of the dump files (because XMLSource understands that format).

  8. object NumberUtils

  9. object OntologyDownloader

    Download ontology classes and properties from http://mappings.dbpedia.org/ and transform them into the format of the dump files (because XMLSource understands that format).

    Download ontology classes and properties from http://mappings.dbpedia.org/ and transform them into the format of the dump files (because XMLSource understands that format). Also save the result as OWL.

  10. object RecordSeverity extends Enumeration

  11. object ResourceWorkers

  12. object RichFile

  13. object RichPath

    This class requires the java.nio.file package, which is available since JDK 7.

    This class requires the java.nio.file package, which is available since JDK 7.

    If you want to compile and run DBpedia with an earlier JDK version, delete or blank these two files:

    core/src/main/scala/org/dbpedia/extraction/util/RichPath.scala dump/src/main/scala/org/dbpedia/extraction/dump/clean/Clean.scala

    The launchers 'purge-download' and 'purge-extract' in the dump/ module won't work, but they are not vitally necessary.

  14. object RichReader

  15. object RichStartElement

  16. object RichString

    Defines additional methods on strings, which are missing in the standard library.

  17. object RichWebResource

  18. object SimpleWorkers

  19. object StringUtils

  20. object TurtleUtils

    Helper methods to escape / unescape Turtle / N-Triples.

    Helper methods to escape / unescape Turtle / N-Triples.

    TODO: most of these methods could be much more efficient - they should only create a StringBuffer if the input actually needs to be changed. Otherwise, they should simply return the input string. See StringUtils.escape.

  21. object WikiApi

  22. object WikiDisambigReader

  23. object WikiInfo

    Helper methods to create WikiInfo objects.

  24. object WikiSettingsReader

  25. object WikiUtil

    Contains several utility functions related to WikiText.

  26. object WikidataUtil

    Created by ali on 2/1/15.

  27. object WorkerState extends Enumeration

    Provides the overall state of a worker

  28. object Workers

    Constants for workers.

  29. object XmlUtils

Ungrouped