Class DocumentConverter


  • public class DocumentConverter
    extends Object
    The class that does the heavy lifting for converting a JSoup Document into valid Markdown
    Author:
    Phil DeJarnett
    • Constructor Detail

      • DocumentConverter

        public DocumentConverter​(Options options)
        Creates a DocumentConverted with the given options.
        Parameters:
        options - Options for this converter.
    • Method Detail

      • getOptions

        public Options getOptions()
      • setOutput

        public void setOutput​(BlockWriter output)
      • addInlineNode

        public void addInlineNode​(NodeHandler handler,
                                  String tagnames)
        Customize the processing for a node. This node is added to the inline list and the block list. The inline list is used for nodes that do not contain linebreaks, such as <em> or <strong>. The tagnames is a comma-delimited list of tagnames for which this handler should be applied.
        Parameters:
        handler - The handler for the nodes
        tagnames - One or more tagnames
      • addBlockNode

        public void addBlockNode​(NodeHandler handler,
                                 String tagnames)
        Customize the processing for a node. This node is added to the block list only. The node handler should properly use the BlockWriter.startBlock() and BlockWriter.endBlock() methods as appropriate. The tagnames is a comma-delimited list of tagnames for which this handler should be applied.
        Parameters:
        handler - The handler for the nodes
        tagnames - One or more tagnames
      • convert

        public void convert​(org.jsoup.nodes.Document doc,
                            Writer out)
        Convert a document to the given writer.

        Note: It is up to the calling class to handle closing the writer!

        Parameters:
        doc - Document to convert
        out - Writer to receive the final output
      • convert

        public void convert​(org.jsoup.nodes.Document doc,
                            OutputStream out)
        Convert a document to the given output stream.

        Note: It is up to the calling class to handle closing the stream!

        Parameters:
        doc - Document to convert
        out - OutputStream to receive the final output
      • convert

        public String convert​(org.jsoup.nodes.Document doc)
        Convert a document and return a string. When wanting a final string, this method should always be used. It will attempt to calculate the size of the buffer necessary to hold the entire output.
        Parameters:
        doc - Document to convert
        Returns:
        The Markdown-formatted string.
      • walkNodes

        public void walkNodes​(NodeHandler currentNode,
                              org.jsoup.nodes.Element el)
        Loops over the children of an HTML Element, handling TextNode and child Elements.
        Parameters:
        currentNode - The default node handler for TextNodes and IgnoredHTMLElements.
        el - The parent HTML Element whose children are being looked at.
      • walkNodes

        public void walkNodes​(NodeHandler currentNodeHandler,
                              org.jsoup.nodes.Element el,
                              Map<String,​NodeHandler> nodeList)
        Loops over the children of an HTML Element, handling TextNode and child Elements.
        Parameters:
        currentNodeHandler - The default node handler for TextNodes and IgnoredHTMLElements.
        el - The parent HTML Element whose children are being looked at.
        nodeList - The list of valid nodes at this level. Should be one of blockNodes or inlineNodes
      • getInlineContent

        public String getInlineContent​(NodeHandler currentNode,
                                       org.jsoup.nodes.Element el)
        Recursively processes child nodes and returns the potential output string.
        Parameters:
        currentNode - The default node handler for TextNodes and IgnoredHTMLElements.
        el - The parent HTML Element whose children are being looked at.
        Returns:
        The potential output string.
      • getInlineContent

        public String getInlineContent​(NodeHandler currentNode,
                                       org.jsoup.nodes.Element el,
                                       boolean undoLeadingEscapes)
        Recursively processes child nodes and returns the potential output string.
        Parameters:
        currentNode - The default node handler for TextNodes and IgnoredHTMLElements.
        el - The parent HTML Element whose children are being looked at.
        undoLeadingEscapes - If true, leading escapes are removed
        Returns:
        The potential output string.
      • addLink

        public String addLink​(String url,
                              String recommendedName,
                              boolean image)
        Adds a link to the link set, and returns the actual ID for the link.
        Parameters:
        url - URL for link
        recommendedName - A recommended name for non-simple link IDs. This might be modified.
        image - If true, use "img-" instead of "link-" for simple link IDs.
        Returns:
        The actual link ID for this URL.