Package com.overzealous.remark.convert
Class DocumentConverter
- java.lang.Object
-
- com.overzealous.remark.convert.DocumentConverter
-
public class DocumentConverter extends Object
The class that does the heavy lifting for converting a JSoup Document into valid Markdown- Author:
- Phil DeJarnett
-
-
Constructor Summary
Constructors Constructor Description DocumentConverter(Options options)Creates a DocumentConverted with the given options.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddBlockNode(NodeHandler handler, String tagnames)Customize the processing for a node.voidaddInlineNode(NodeHandler handler, String tagnames)Customize the processing for a node.StringaddLink(String url, String recommendedName, boolean image)Adds a link to the link set, and returns the actual ID for the link.Stringconvert(org.jsoup.nodes.Document doc)Convert a document and return a string.voidconvert(org.jsoup.nodes.Document doc, OutputStream out)Convert a document to the given output stream.voidconvert(org.jsoup.nodes.Document doc, Writer out)Convert a document to the given writer.Map<String,NodeHandler>getBlockNodes()TextCleanergetCleaner()StringgetInlineContent(NodeHandler currentNode, org.jsoup.nodes.Element el)Recursively processes child nodes and returns the potential output string.StringgetInlineContent(NodeHandler currentNode, org.jsoup.nodes.Element el, boolean undoLeadingEscapes)Recursively processes child nodes and returns the potential output string.Map<String,NodeHandler>getInlineNodes()OptionsgetOptions()BlockWritergetOutput()voidsetOutput(BlockWriter output)voidwalkNodes(NodeHandler currentNode, org.jsoup.nodes.Element el)Loops over the children of an HTML Element, handling TextNode and child Elements.voidwalkNodes(NodeHandler currentNodeHandler, org.jsoup.nodes.Element el, Map<String,NodeHandler> nodeList)Loops over the children of an HTML Element, handling TextNode and child Elements.
-
-
-
Constructor Detail
-
DocumentConverter
public DocumentConverter(Options options)
Creates a DocumentConverted with the given options.- Parameters:
options- Options for this converter.
-
-
Method Detail
-
getOptions
public Options getOptions()
-
getCleaner
public TextCleaner getCleaner()
-
getBlockNodes
public Map<String,NodeHandler> getBlockNodes()
-
getInlineNodes
public Map<String,NodeHandler> getInlineNodes()
-
getOutput
public BlockWriter getOutput()
-
setOutput
public void setOutput(BlockWriter output)
-
addInlineNode
public void addInlineNode(NodeHandler handler, String tagnames)
Customize the processing for a node. This node is added to the inline list and the block list. The inline list is used for nodes that do not contain linebreaks, such as<em>or<strong>. The tagnames is a comma-delimited list of tagnames for which this handler should be applied.- Parameters:
handler- The handler for the nodestagnames- One or more tagnames
-
addBlockNode
public void addBlockNode(NodeHandler handler, String tagnames)
Customize the processing for a node. This node is added to the block list only. The node handler should properly use theBlockWriter.startBlock()andBlockWriter.endBlock()methods as appropriate. The tagnames is a comma-delimited list of tagnames for which this handler should be applied.- Parameters:
handler- The handler for the nodestagnames- One or more tagnames
-
convert
public void convert(org.jsoup.nodes.Document doc, Writer out)Convert a document to the given writer.Note: It is up to the calling class to handle closing the writer!
- Parameters:
doc- Document to convertout- Writer to receive the final output
-
convert
public void convert(org.jsoup.nodes.Document doc, OutputStream out)Convert a document to the given output stream.Note: It is up to the calling class to handle closing the stream!
- Parameters:
doc- Document to convertout- OutputStream to receive the final output
-
convert
public String convert(org.jsoup.nodes.Document doc)
Convert a document and return a string. When wanting a final string, this method should always be used. It will attempt to calculate the size of the buffer necessary to hold the entire output.- Parameters:
doc- Document to convert- Returns:
- The Markdown-formatted string.
-
walkNodes
public void walkNodes(NodeHandler currentNode, org.jsoup.nodes.Element el)
Loops over the children of an HTML Element, handling TextNode and child Elements.- Parameters:
currentNode- The default node handler for TextNodes and IgnoredHTMLElements.el- The parent HTML Element whose children are being looked at.
-
walkNodes
public void walkNodes(NodeHandler currentNodeHandler, org.jsoup.nodes.Element el, Map<String,NodeHandler> nodeList)
Loops over the children of an HTML Element, handling TextNode and child Elements.- Parameters:
currentNodeHandler- The default node handler for TextNodes and IgnoredHTMLElements.el- The parent HTML Element whose children are being looked at.nodeList- The list of valid nodes at this level. Should be one of blockNodes or inlineNodes
-
getInlineContent
public String getInlineContent(NodeHandler currentNode, org.jsoup.nodes.Element el)
Recursively processes child nodes and returns the potential output string.- Parameters:
currentNode- The default node handler for TextNodes and IgnoredHTMLElements.el- The parent HTML Element whose children are being looked at.- Returns:
- The potential output string.
-
getInlineContent
public String getInlineContent(NodeHandler currentNode, org.jsoup.nodes.Element el, boolean undoLeadingEscapes)
Recursively processes child nodes and returns the potential output string.- Parameters:
currentNode- The default node handler for TextNodes and IgnoredHTMLElements.el- The parent HTML Element whose children are being looked at.undoLeadingEscapes- If true, leading escapes are removed- Returns:
- The potential output string.
-
addLink
public String addLink(String url, String recommendedName, boolean image)
Adds a link to the link set, and returns the actual ID for the link.- Parameters:
url- URL for linkrecommendedName- A recommended name for non-simple link IDs. This might be modified.image- If true, use "img-" instead of "link-" for simple link IDs.- Returns:
- The actual link ID for this URL.
-
-