@Generated(value="software.amazon.awssdk:codegen") public final class SeedUrlConfiguration extends Object implements SdkPojo, Serializable, ToCopyableBuilder<SeedUrlConfiguration.Builder,SeedUrlConfiguration>
Provides the configuration information for the seed or starting point URLs to crawl.
When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index.
| Modifier and Type | Class and Description |
|---|---|
static interface |
SeedUrlConfiguration.Builder |
| Modifier and Type | Method and Description |
|---|---|
static SeedUrlConfiguration.Builder |
builder() |
boolean |
equals(Object obj) |
boolean |
equalsBySdkFields(Object obj) |
<T> Optional<T> |
getValueForField(String fieldName,
Class<T> clazz) |
int |
hashCode() |
boolean |
hasSeedUrls()
For responses, this returns true if the service returned a value for the SeedUrls property.
|
List<SdkField<?>> |
sdkFields() |
List<String> |
seedUrls()
The list of seed or starting point URLs of the websites you want to crawl.
|
static Class<? extends SeedUrlConfiguration.Builder> |
serializableBuilderClass() |
SeedUrlConfiguration.Builder |
toBuilder() |
String |
toString()
Returns a string representation of this object.
|
WebCrawlerMode |
webCrawlerMode()
You can choose one of the following modes:
|
String |
webCrawlerModeAsString()
You can choose one of the following modes:
|
clone, finalize, getClass, notify, notifyAll, wait, wait, waitcopypublic final boolean hasSeedUrls()
isEmpty() method on the property). This is
useful because the SDK will never return a null collection or map, but you may need to differentiate between the
service returning nothing (or null) and the service returning an empty collection or map. For requests, this
returns true if a value for the property was specified in the request builder, and false if a value was not
specified.public final List<String> seedUrls()
The list of seed or starting point URLs of the websites you want to crawl.
The list can include a maximum of 100 seed URLs.
Attempts to modify the collection returned by this method will result in an UnsupportedOperationException.
This method will never return null. If you would like to know whether the service returned this field (so that
you can differentiate between null and empty), you can use the hasSeedUrls() method.
The list can include a maximum of 100 seed URLs.
public final WebCrawlerMode webCrawlerMode()
You can choose one of the following modes:
HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY.
If the service returns an enum value that is not available in the current SDK version, webCrawlerMode
will return WebCrawlerMode.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available
from webCrawlerModeAsString().
HOST_ONLY—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY.
WebCrawlerModepublic final String webCrawlerModeAsString()
You can choose one of the following modes:
HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then
only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link
to.
The default mode is set to HOST_ONLY.
If the service returns an enum value that is not available in the current SDK version, webCrawlerMode
will return WebCrawlerMode.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available
from webCrawlerModeAsString().
HOST_ONLY—crawl only the website host names. For example, if the seed URL is
"abc.example.com", then only URLs with host name "abc.example.com" are crawled.
SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is
"abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.
EVERYTHING—crawl the website host names with subdomains and other domains that the web pages
link to.
The default mode is set to HOST_ONLY.
WebCrawlerModepublic SeedUrlConfiguration.Builder toBuilder()
toBuilder in interface ToCopyableBuilder<SeedUrlConfiguration.Builder,SeedUrlConfiguration>public static SeedUrlConfiguration.Builder builder()
public static Class<? extends SeedUrlConfiguration.Builder> serializableBuilderClass()
public final boolean equalsBySdkFields(Object obj)
equalsBySdkFields in interface SdkPojopublic final String toString()
Copyright © 2023. All rights reserved.