public class TextExtractionService extends Object
| Constructor and Description |
|---|
TextExtractionService() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
canHandle(InputStream stream,
org.apache.tika.metadata.Metadata metadata)
Performs a check if the provided content can be handled by currently
configured parsers.
|
void |
extractMetadata(InputStream stream,
org.apache.tika.metadata.Metadata metadata)
Performs the metadata extraction for the specified document stream.
|
boolean |
isEnabled()
Returns
true if the text extraction service is activated. |
String |
parse(InputStream stream,
org.apache.tika.metadata.Metadata metadata)
Performs the content extraction and fills in related metadata for the
specified document stream.
|
String |
parse(InputStream stream,
org.apache.tika.metadata.Metadata metadata,
int characterLimit)
Performs the content extraction and fills in related metadata for the
specified document stream.
|
String |
parse(InputStream stream,
String contentType)
Performs the content extraction for the specified document stream.
|
void |
setAutoDetectType(boolean autoDetectType) |
void |
setConfig(org.springframework.core.io.Resource config)
Provides the Tika configuration resource.
|
void |
setConfigMetadata(org.springframework.core.io.Resource configMetadata) |
void |
setEnabled(boolean enabled)
Set this flag to
true to enable the text extraction service. |
void |
setMaxExtractedCharacters(int maxExtractedCharacters) |
public boolean canHandle(InputStream stream, org.apache.tika.metadata.Metadata metadata) throws IOException
stream - the document stream to be parsed; can be null, in this case
the contentType only will be used to detect appropriate parsermetadata - the metadata containing parser specific information, like
content type, encoding etc.true if there is a parser that can handle provided
contentIOException - in case of the read/write errorspublic void extractMetadata(InputStream stream, org.apache.tika.metadata.Metadata metadata) throws IOException, SAXException, org.apache.tika.exception.TikaException
stream - the document stream to be parsedmetadata - the metadata containing parser specific information, like
content type, encoding etc.IOException - in case of the read/write errorsSAXException - in case of parsing errorsorg.apache.tika.exception.TikaException - in case of parsing errorspublic boolean isEnabled()
true if the text extraction service is activated.true if the text extraction service is activatedpublic String parse(InputStream stream, org.apache.tika.metadata.Metadata metadata) throws IOException, SAXException, org.apache.tika.exception.TikaException
stream - the document stream to be parsedmetadata - the metadata containing parser specific information, like
content type, encoding etc.null if the service is disabledIOException - in case of the read/write errorsSAXException - in case of parsing errorsorg.apache.tika.exception.TikaException - in case of parsing errorspublic String parse(InputStream stream, org.apache.tika.metadata.Metadata metadata, int characterLimit) throws IOException, SAXException, org.apache.tika.exception.TikaException
stream - the document stream to be parsedmetadata - the metadata containing parser specific information, like
content type, encoding etc.characterLimit - the maximum number of characters to extract or -1
to extract full document contentnull if the service is disabledIOException - in case of the read/write errorsSAXException - in case of parsing errorsorg.apache.tika.exception.TikaException - in case of parsing errorspublic String parse(InputStream stream, String contentType) throws IOException, SAXException, org.apache.tika.exception.TikaException
stream - the document stream to be parsedcontentType - the content type of the provided documentnull if the service is disabledIOException - in case of the read/write errorsSAXException - in case of parsing errorsorg.apache.tika.exception.TikaException - in case of parsing errorspublic void setAutoDetectType(boolean autoDetectType)
autoDetectType - the autoDetectType to setpublic void setConfig(org.springframework.core.io.Resource config)
config - the Tika configuration resourcepublic void setConfigMetadata(org.springframework.core.io.Resource configMetadata)
configMetadata - the configMetadata to setpublic void setEnabled(boolean enabled)
true to enable the text extraction service.enabled - the flag to setpublic void setMaxExtractedCharacters(int maxExtractedCharacters)
maxExtractedCharacters - the maxExtractedCharacters to setCopyright © 2004–2020 Jahia Solutions Group SA. All rights reserved.