Packages

package analysis

Provides a logical query plan Analyzer and supporting classes for performing analysis. Analysis consists of translating UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a schema Catalog.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. analysis
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class AnalysisContext(catalogAndNamespace: Seq[String] = Nil, nestedViewDepth: Int = 0, maxNestedViewDepth: Int = -1, relationCache: Map[(Seq[String], Option[TimeTravelSpec]), LogicalPlan] = mutable.Map.empty, referredTempViewNames: Seq[Seq[String]] = Seq.empty, referredTempFunctionNames: Set[String] = mutable.Set.empty, referredTempVariableNames: Seq[Seq[String]] = Seq.empty, outerPlan: Option[LogicalPlan] = None) extends Product with Serializable

    Provides a way to keep state during the analysis, mostly for resolving views and subqueries.

    Provides a way to keep state during the analysis, mostly for resolving views and subqueries. This enables us to decouple the concerns of analysis environment from the catalog and resolve star expressions in subqueries that reference the outer query plans. The state that is kept here is per-query.

    Note this is thread local.

    catalogAndNamespace

    The catalog and namespace used in the view resolution. This overrides the current catalog and namespace when resolving relations inside views.

    nestedViewDepth

    The nested depth in the view resolution, this enables us to limit the depth of nested views.

    maxNestedViewDepth

    The maximum allowed depth of nested view resolution.

    relationCache

    A mapping from qualified table names and time travel spec to resolved relations. This can ensure that the table is resolved only once if a table is used multiple times in a query.

    referredTempViewNames

    All the temp view names referred by the current view we are resolving. It's used to make sure the relation resolution is consistent between view creation and view resolution. For example, if t was a permanent table when the current view was created, it should still be a permanent table when resolving the current view, even if a temp view t has been created.

    outerPlan

    The query plan from the outer query that can be used to resolve star expressions in a subquery.

  2. implicit class AnalysisErrorAt extends QueryErrorsBase
  3. class Analyzer extends RuleExecutor[LogicalPlan] with CheckAnalysis with SQLConfHelper with ColumnResolutionHelper

    Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog.

  4. case class AsOfTimestamp(timestamp: Long) extends TimeTravelSpec with Product with Serializable
  5. case class AsOfVersion(version: String) extends TimeTravelSpec with Product with Serializable
  6. class CannotReplaceMissingTableException extends AnalysisException
  7. trait CastSupport extends AnyRef

    Mix-in trait for constructing valid Cast expressions.

  8. trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsBase

    Throws user facing errors when passed invalid queries that fail to analyze.

  9. trait ColumnResolutionHelper extends Logging
  10. trait EmptyFunctionRegistryBase[T] extends FunctionRegistryBase[T]

    A trivial catalog that returns an error when a function is requested.

    A trivial catalog that returns an error when a function is requested. Used for testing when all functions are already filled in and the analyzer needs only to resolve attribute references.

  11. trait ExpressionBuilder extends FunctionBuilderBase[Expression]

    This is a trait used for scalar valued functions that defines how their expression representations are constructed in FunctionRegistry.

  12. case class ExpressionWithUnresolvedIdentifier(identifierExpr: Expression, exprBuilder: (Seq[String]) ⇒ Expression) extends UnaryExpression with Unevaluable with Product with Serializable

    An expression placeholder that holds the identifier clause string expression.

    An expression placeholder that holds the identifier clause string expression. It will be replaced by the actual expression with the evaluated identifier string.

  13. sealed trait FieldName extends LeafExpression with Unevaluable
  14. sealed trait FieldPosition extends LeafExpression with Unevaluable
  15. trait FunctionRegistry extends FunctionRegistryBase[Expression]
  16. trait FunctionRegistryBase[T] extends AnyRef

    A catalog for looking up user defined functions, used by an Analyzer.

    A catalog for looking up user defined functions, used by an Analyzer.

    Note: 1) The implementation should be thread-safe to allow concurrent access. 2) the database name is always case-sensitive here, callers are responsible to format the database name w.r.t. case-sensitive config.

  17. trait GeneratorBuilder extends FunctionBuilderBase[LogicalPlan]

    This is a trait used for table valued functions that defines how their expression representations are constructed in TableFunctionRegistry.

  18. case class GetColumnByOrdinal(ordinal: Int, dataType: DataType) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable
  19. case class GetViewColumnByNameAndOrdinal(viewName: String, colName: String, ordinal: Int, expectedNumCandidates: Int, viewDDL: Option[String]) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable
  20. trait LeafNodeWithoutStats extends LogicalPlan with LeafNode

    A resolved leaf node whose statistics has no meaning.

  21. case class MultiAlias(child: Expression, names: Seq[String]) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable

    Used to assign new names to Generator's output, such as hive udtf.

    Used to assign new names to Generator's output, such as hive udtf. For example the SQL expression "stack(2, key, value, key, value) as (a, b)" could be represented as follows: MultiAlias(stack_function, Seq(a, b))

    child

    the computation being performed

    names

    the names to be associated with each output of computing child.

  22. trait MultiInstanceRelation extends AnyRef

    A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan.

    A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan. It is invalid to have multiple copies of the same attribute produced by distinct operators in a query tree as this breaks the guarantee that expression ids, which are used to differentiate attributes, are unique.

    During analysis, operators that include this trait may be asked to produce a new version of itself with globally unique expression ids.

  23. case class NameParameterizedQuery(child: LogicalPlan, argNames: Seq[String], argValues: Seq[Expression]) extends ParameterizedQuery with Product with Serializable

    The logical plan representing a parameterized query with named parameters.

    The logical plan representing a parameterized query with named parameters.

    child

    The parameterized logical plan.

    argNames

    Argument names.

    argValues

    A sequence of argument values matched to argument names argNames.

  24. case class NamedParameter(name: String) extends LeafExpression with Parameter with Product with Serializable

    The expression represents a named parameter that should be replaced by a literal or collection constructor functions such as map(), array(), struct().

    The expression represents a named parameter that should be replaced by a literal or collection constructor functions such as map(), array(), struct().

    name

    The identifier of the parameter without the marker.

  25. trait NamedRelation extends LogicalPlan
  26. class NoSuchPartitionException extends AnalysisException
  27. class NoSuchPartitionsException extends AnalysisException
  28. sealed trait Parameter extends LeafExpression with Unevaluable
  29. abstract class ParameterizedQuery extends LogicalPlan with UnresolvedUnaryNode

    The logical plan representing a parameterized query.

    The logical plan representing a parameterized query. It will be removed during analysis after the parameters are bind.

  30. class PartitionAlreadyExistsException extends AnalysisException
  31. sealed trait PartitionSpec extends LeafExpression with Unevaluable
  32. class PartitionsAlreadyExistException extends AnalysisException
  33. case class PlanWithUnresolvedIdentifier(identifierExpr: Expression, planBuilder: (Seq[String]) ⇒ LogicalPlan) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    A logical plan placeholder that holds the identifier clause string expression.

    A logical plan placeholder that holds the identifier clause string expression. It will be replaced by the actual logical plan with the evaluated identifier string.

  34. case class PosParameter(pos: Int) extends LeafExpression with Parameter with Product with Serializable

    The expression represents a positional parameter that should be replaced by a literal or by collection constructor functions such as map(), array(), struct().

    The expression represents a positional parameter that should be replaced by a literal or by collection constructor functions such as map(), array(), struct().

    pos

    An unique position of the parameter in a SQL query text.

  35. case class PosParameterizedQuery(child: LogicalPlan, args: Seq[Expression]) extends ParameterizedQuery with Product with Serializable

    The logical plan representing a parameterized query with positional parameters.

    The logical plan representing a parameterized query with positional parameters.

    child

    The parameterized logical plan.

    args

    The literal values or collection constructor functions such as map(), array(), struct() of positional parameters.

  36. case class RelationTimeTravel(relation: LogicalPlan, timestamp: Option[Expression], version: Option[String]) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    A logical node used to time travel the child relation to the given timestamp or version.

    A logical node used to time travel the child relation to the given timestamp or version. The child must support time travel, e.g. a v2 source, and cannot be a view, subquery or stream. The timestamp expression cannot refer to any columns.

  37. case class RelationWrapper(cls: Class[_], outputAttrIds: Seq[Long]) extends Product with Serializable

    A helper class used to detect duplicate relations fast in DeduplicateRelations.

    A helper class used to detect duplicate relations fast in DeduplicateRelations. Two relations are duplicated if:

    1. they are the same class. 2. they have the same output attribute IDs.

    The first condition is necessary because the CTE relation definition node and reference node have the same output attribute IDs but they are not duplicated.

  38. class ResolveCatalogs extends Rule[LogicalPlan] with LookupCatalog

    Resolves the catalog of the name parts for table/view/function/namespace.

  39. class ResolveColumnDefaultInCommandInputQuery extends SQLConfHelper with ColumnResolutionHelper

    A virtual rule to resolve column "DEFAULT" in Project and UnresolvedInlineTable under InsertIntoStatement and SetVariable.

    A virtual rule to resolve column "DEFAULT" in Project and UnresolvedInlineTable under InsertIntoStatement and SetVariable. It's only used by the real rule ResolveReferences.

    This virtual rule is triggered if: 1. The column "DEFAULT" can't be resolved normally by ResolveReferences. This is guaranteed as ResolveReferences resolves the query plan bottom up. This means that when we reach here to resolve the command, its child plans have already been resolved by ResolveReferences. 2. The plan nodes between Project and command are all unary nodes that inherit the output columns from its child. 3. The plan nodes between UnresolvedInlineTable and command are either Project, or Aggregate, or SubqueryAlias.

  40. abstract class ResolveInsertionBase extends Rule[LogicalPlan]
  41. class ResolveReferencesInAggregate extends SQLConfHelper with ColumnResolutionHelper with AliasHelper

    A virtual rule to resolve UnresolvedAttribute in Aggregate.

    A virtual rule to resolve UnresolvedAttribute in Aggregate. It's only used by the real rule ResolveReferences. The column resolution order for Aggregate is: 1. Resolves the columns to AttributeReference with the output of the child plan. This includes metadata columns as well. 2. Resolves the columns to a literal function which is allowed to be invoked without braces, e.g. SELECT col, current_date FROM t. 3. If aggregate expressions are all resolved, resolve GROUP BY alias and GROUP BY ALL. 3.1. If the grouping expressions contain an unresolved column whose name matches an alias in the SELECT list, resolves that unresolved column to the alias. This is to support SQL pattern like SELECT a + b AS c, max(col) FROM t GROUP BY c. 3.2. If the grouping expressions only have one single unresolved column named 'ALL', expanded it to include all non-aggregate columns in the SELECT list. This is to support SQL pattern like SELECT col1, col2, agg_expr(...) FROM t GROUP BY ALL. 4. Resolves the columns in aggregate expressions to LateralColumnAliasReference if it references the alias defined previously in the SELECT list. The rule ResolveLateralColumnAliasReference will further resolve LateralColumnAliasReference and rewrite the plan. This is to support SQL pattern like SELECT col1 + 1 AS x, x + 1 AS y, y + 1 AS z FROM t. 5. Resolves the columns to outer references with the outer plan if we are resolving subquery expressions.

  42. class ResolveReferencesInSort extends SQLConfHelper with ColumnResolutionHelper

    A virtual rule to resolve UnresolvedAttribute in Sort.

    A virtual rule to resolve UnresolvedAttribute in Sort. It's only used by the real rule ResolveReferences. The column resolution order for Sort is: 1. Resolves the column to AttributeReference with the output of the child plan. This includes metadata columns as well. 2. Resolves the column to a literal function which is allowed to be invoked without braces, e.g. SELECT col, current_date FROM t. 3. If the child plan is Aggregate, resolves the column to TempResolvedColumn with the output of Aggregate's child plan. This is to allow Sort to host grouping expressions and aggregate functions, which can be pushed down to the Aggregate later. For example, SELECT max(a) FROM t GROUP BY b ORDER BY min(a). 4. Resolves the column to AttributeReference with the output of a descendant plan node. Spark will propagate the missing attributes from the descendant plan node to the Sort node. This is to allow users to ORDER BY columns that are not in the SELECT clause, which is widely supported in other SQL dialects. For example, SELECT a FROM t ORDER BY b. 5. If the order by expressions only have one single unresolved column named ALL, expanded it to include all columns in the SELECT list. This is to support SQL pattern like SELECT col1, col2 FROM t ORDER BY ALL. This should also support specifying asc/desc, and nulls first/last. 6. Resolves the column to outer references with the outer plan if we are resolving subquery expressions.

    Note, 3 and 4 are actually orthogonal. If the child plan is Aggregate, 4 can only resolve columns as the grouping columns, which is completely covered by 3.

  43. class ResolveReferencesInUpdate extends SQLConfHelper with ColumnResolutionHelper

    A virtual rule to resolve UnresolvedAttribute in UpdateTable.

    A virtual rule to resolve UnresolvedAttribute in UpdateTable. It's only used by the real rule ResolveReferences. The column resolution order for UpdateTable is: 1. Resolves the column to AttributeReference with the output of the child plan. This includes metadata columns as well. 2. Resolves the column to a literal function which is allowed to be invoked without braces, e.g. SELECT col, current_date FROM t. 3. Resolves the column to the default value expression, if the column is the assignment value and the corresponding assignment key is a top-level column.

  44. class ResolveSetVariable extends Rule[LogicalPlan] with ColumnResolutionHelper

    Resolves the target SQL variables that we want to set in SetVariable, and add cast if necessary to make the assignment valid.

  45. case class ResolvedFieldName(path: Seq[String], field: StructField) extends LeafExpression with FieldName with Product with Serializable
  46. case class ResolvedFieldPosition(position: ColumnPosition) extends LeafExpression with FieldPosition with Product with Serializable
  47. case class ResolvedIdentifier(catalog: CatalogPlugin, identifier: Identifier) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved identifier with catalog determined.

  48. case class ResolvedNamespace(catalog: CatalogPlugin, namespace: Seq[String]) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved namespace.

  49. case class ResolvedNonPersistentFunc(name: String, func: UnboundFunction) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved non-persistent (temp or built-in) function.

  50. case class ResolvedPartitionSpec(names: Seq[String], ident: InternalRow, location: Option[String] = None) extends LeafExpression with PartitionSpec with Product with Serializable
  51. case class ResolvedPersistentFunc(catalog: FunctionCatalog, identifier: Identifier, func: UnboundFunction) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved persistent function.

  52. case class ResolvedPersistentView(catalog: CatalogPlugin, identifier: Identifier, viewSchema: StructType) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved persistent views.

  53. case class ResolvedStar(expressions: Seq[NamedExpression]) extends Star with Unevaluable with Product with Serializable

    Represents all the resolved input attributes to a given relational operator.

    Represents all the resolved input attributes to a given relational operator. This is used in the data frame DSL.

    expressions

    Expressions to expand.

  54. case class ResolvedTable(catalog: TableCatalog, identifier: Identifier, table: Table, outputAttributes: Seq[Attribute]) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved table.

  55. case class ResolvedTempView(identifier: Identifier, viewSchema: StructType) extends LogicalPlan with LeafNodeWithoutStats with Product with Serializable

    A plan containing resolved (global) temp views.

  56. type Resolver = (String, String) ⇒ Boolean

    Resolver should return true if the first string refers to the same entity as the second string.

    Resolver should return true if the first string refers to the same entity as the second string. For example, by using case insensitive equality.

  57. trait RewriteRowLevelCommand extends Rule[LogicalPlan]
  58. class SimpleFunctionRegistry extends SimpleFunctionRegistryBase[Expression] with FunctionRegistry
  59. trait SimpleFunctionRegistryBase[T] extends FunctionRegistryBase[T] with Logging
  60. class SimpleTableFunctionRegistry extends SimpleFunctionRegistryBase[LogicalPlan] with TableFunctionRegistry
  61. abstract class Star extends LeafExpression with NamedExpression

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...". A Star gets automatically expanded during analysis.

  62. trait TableFunctionRegistry extends FunctionRegistryBase[LogicalPlan]

    A catalog for looking up table functions.

  63. case class TempResolvedColumn(child: Expression, nameParts: Seq[String], hasTried: Boolean = false) extends UnaryExpression with Unevaluable with Product with Serializable

    An intermediate expression to hold a resolved (nested) column.

    An intermediate expression to hold a resolved (nested) column. Some rules may need to undo the column resolution and use this expression to keep the original column name, or redo the column resolution with a different priority if the analyzer has tried to resolve it with the default priority before but failed (i.e. hasTried is true).

  64. sealed trait TimeTravelSpec extends AnyRef
  65. trait TypeCheckResult extends AnyRef

    Represents the result of Expression.checkInputDataTypes.

    Represents the result of Expression.checkInputDataTypes. We will throw AnalysisException in CheckAnalysis if isFailure is true.

  66. abstract class TypeCoercionBase extends AnyRef
  67. trait TypeCoercionRule extends Rule[LogicalPlan] with Logging
  68. case class UnresolvedAlias(child: Expression, aliasFunc: Option[(Expression) ⇒ String] = None) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable

    Holds the expression that has yet to be aliased.

    Holds the expression that has yet to be aliased.

    child

    The computation that is needs to be resolved during analysis.

    aliasFunc

    The function if specified to be called to generate an alias to associate with the result of computing child

  69. case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable with Product with Serializable

    Holds the name of an attribute that has yet to be resolved.

  70. case class UnresolvedDeserializer(deserializer: Expression, inputAttributes: Seq[Attribute] = Nil) extends UnaryExpression with Unevaluable with NonSQLExpression with Product with Serializable

    Holds the deserializer expression and the attributes that are available during the resolution for it.

    Holds the deserializer expression and the attributes that are available during the resolution for it. Deserializer expression is a special kind of expression that is not always resolved by children output, but by given attributes, e.g. the keyDeserializer in MapGroups should be resolved by groupingAttributes instead of children output.

    deserializer

    The unresolved deserializer expression

    inputAttributes

    The input attributes used to resolve deserializer expression, can be empty if we want to resolve deserializer by children output.

  71. class UnresolvedException extends AnalysisException

    Thrown when an invalid attempt is made to access a property of a tree that has yet to be fully resolved.

  72. case class UnresolvedExtractValue(child: Expression, extraction: Expression) extends BinaryExpression with Unevaluable with Product with Serializable

    Extracts a value or values from an Expression

    Extracts a value or values from an Expression

    child

    The expression to extract value from, can be Map, Array, Struct or array of Structs.

    extraction

    The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.

  73. case class UnresolvedFieldName(name: Seq[String]) extends LeafExpression with FieldName with Product with Serializable
  74. case class UnresolvedFieldPosition(position: ColumnPosition) extends LeafExpression with FieldPosition with Product with Serializable
  75. case class UnresolvedFunction(nameParts: Seq[String], arguments: Seq[Expression], isDistinct: Boolean, filter: Option[Expression] = None, ignoreNulls: Boolean = false) extends Expression with Unevaluable with Product with Serializable

    Represents an unresolved function that is being invoked.

    Represents an unresolved function that is being invoked. The analyzer will resolve the function arguments first, then look up the function by name and arguments, and return an expression that can be evaluated to get the result of this function invocation.

  76. case class UnresolvedFunctionName(multipartIdentifier: Seq[String], commandName: String, requirePersistent: Boolean, funcTypeMismatchHint: Option[String], possibleQualifiedName: Option[Seq[String]] = None) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a function that has yet to be looked up.

    Holds the name of a function that has yet to be looked up. It will be resolved to ResolvedPersistentFunc or ResolvedNonPersistentFunc during analysis of function-related commands such as DESCRIBE FUNCTION name.

  77. case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expression]) extends Expression with Generator with Product with Serializable

    Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator.

    Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator. The analyzer will resolve this generator.

  78. case class UnresolvedHaving(havingCondition: Expression, child: LogicalPlan) extends LogicalPlan with UnresolvedUnaryNode with Product with Serializable

    Represents unresolved having clause, the child for it can be Aggregate, GroupingSets, Rollup and Cube.

    Represents unresolved having clause, the child for it can be Aggregate, GroupingSets, Rollup and Cube. It is turned by the analyzer into a Filter.

  79. case class UnresolvedIdentifier(nameParts: Seq[String], allowTemp: Boolean = false) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a table/view/function identifier that we need to determine the catalog.

    Holds the name of a table/view/function identifier that we need to determine the catalog. It will be resolved to ResolvedIdentifier during analysis.

  80. case class UnresolvedInlineTable(names: Seq[String], rows: Seq[Seq[Expression]]) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    An inline table that has not been resolved yet.

    An inline table that has not been resolved yet. Once resolved, it is turned by the analyzer into a org.apache.spark.sql.catalyst.plans.logical.LocalRelation.

    names

    list of column names

    rows

    expressions for the data

  81. trait UnresolvedLeafNode extends LogicalPlan with LeafNode with UnresolvedNode

    Parent trait for unresolved leaf node types

  82. case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a namespace that has yet to be looked up in a catalog.

    Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to ResolvedNamespace during analysis.

  83. trait UnresolvedNode extends LogicalPlan

    Parent trait for unresolved node types

  84. case class UnresolvedOrdinal(ordinal: Int) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable

    Represents unresolved ordinal used in order by or group by.

    Represents unresolved ordinal used in order by or group by.

    For example:

    select a from table order by 1
    select a   from table group by 1
    ordinal

    ordinal starts from 1, instead of 0

  85. case class UnresolvedPartitionSpec(spec: TablePartitionSpec, location: Option[String] = None) extends LeafExpression with PartitionSpec with Product with Serializable
  86. case class UnresolvedRegex(regexPattern: String, table: Option[String], caseSensitive: Boolean) extends Star with Unevaluable with Product with Serializable

    Represents all of the input attributes to a given relational operator, for example in "SELECT (id)?+.+ FROM ...".

    Represents all of the input attributes to a given relational operator, for example in "SELECT (id)?+.+ FROM ...".

    table

    an optional table that should be the target of the expansion. If omitted all tables' columns are produced.

  87. case class UnresolvedRelation(multipartIdentifier: Seq[String], options: CaseInsensitiveStringMap = CaseInsensitiveStringMap.empty(), isStreaming: Boolean = false) extends LogicalPlan with UnresolvedLeafNode with NamedRelation with Product with Serializable

    Holds the name of a relation that has yet to be looked up in a catalog.

    Holds the name of a relation that has yet to be looked up in a catalog.

    multipartIdentifier

    table name

    options

    options to scan this relation.

  88. case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable with Product with Serializable

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    This is also used to expand structs. For example: "SELECT record.* from (SELECT struct(a,b,c) as record ...)

    target

    an optional name that should be the target of the expansion. If omitted all targets' columns are produced. This can either be a table name or struct name. This is a list of identifiers that is the path of the expansion.

  89. case class UnresolvedSubqueryColumnAliases(outputColumnNames: Seq[String], child: LogicalPlan) extends LogicalPlan with UnresolvedUnaryNode with Product with Serializable

    Aliased column names resolved by positions for subquery.

    Aliased column names resolved by positions for subquery. We could add alias names for output columns in the subquery:

    // Assign alias names for output columns
    SELECT col1, col2 FROM testData AS t(col1, col2);
    outputColumnNames

    the LogicalPlan on which this subquery column aliases apply.

    child

    the logical plan of this subquery.

  90. case class UnresolvedTVFAliases(name: Seq[String], child: LogicalPlan, outputNames: Seq[String]) extends LogicalPlan with UnresolvedUnaryNode with Product with Serializable

    A table-valued function with output column aliases, e.g.

    A table-valued function with output column aliases, e.g.

    // Assign alias names
    select t.a from range(10) t(a);
    name

    user-specified name of the table-valued function

    child

    logical plan of the table-valued function

    outputNames

    alias names of function output columns. The analyzer adds Project to rename the output columns.

  91. case class UnresolvedTable(multipartIdentifier: Seq[String], commandName: String, suggestAlternative: Boolean = false) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a table that has yet to be looked up in a catalog.

    Holds the name of a table that has yet to be looked up in a catalog. It will be resolved to ResolvedTable during analysis.

  92. case class UnresolvedTableOrView(multipartIdentifier: Seq[String], commandName: String, allowTempView: Boolean) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a table or view that has yet to be looked up in a catalog.

    Holds the name of a table or view that has yet to be looked up in a catalog. It will be resolved to ResolvedTable, ResolvedPersistentView or ResolvedTempView during analysis.

  93. case class UnresolvedTableValuedFunction(name: Seq[String], functionArgs: Seq[Expression]) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    A table-valued function, e.g.

    A table-valued function, e.g.

    select id from range(10);
    name

    user-specified name of this table-value function

    functionArgs

    list of function arguments

  94. trait UnresolvedUnaryNode extends LogicalPlan with UnaryNode with UnresolvedNode

    Parent trait for unresolved unary node types

  95. case class UnresolvedView(multipartIdentifier: Seq[String], commandName: String, allowTemp: Boolean, suggestAlternative: Boolean) extends LogicalPlan with UnresolvedLeafNode with Product with Serializable

    Holds the name of a view that has yet to be looked up.

    Holds the name of a view that has yet to be looked up. It will be resolved to ResolvedPersistentView or ResolvedTempView during analysis.

  96. sealed trait ViewType extends AnyRef

    ViewType is used to specify the expected view type when we want to create or replace a view in CreateViewStatement.

Value Members

  1. val caseInsensitiveResolution: (String, String) ⇒ Boolean
  2. val caseSensitiveResolution: (String, String) ⇒ Boolean
  3. def withPosition[A](t: TreeNode[_])(f: ⇒ A): A

    Catches any AnalysisExceptions thrown by f and attaches t's position if any.

  4. object AnalysisContext extends Serializable
  5. object AnsiTypeCoercion extends TypeCoercionBase

    In Spark ANSI mode, the type coercion rules are based on the type precedence lists of the input data types.

    In Spark ANSI mode, the type coercion rules are based on the type precedence lists of the input data types. As per the section "Type precedence list determination" of "ISO/IEC 9075-2:2011 Information technology - Database languages - SQL - Part 2: Foundation (SQL/Foundation)", the type precedence lists of primitive data types are as following: * Byte: Byte, Short, Int, Long, Decimal, Float, Double * Short: Short, Int, Long, Decimal, Float, Double * Int: Int, Long, Decimal, Float, Double * Long: Long, Decimal, Float, Double * Decimal: Float, Double, or any wider Numeric type * Float: Float, Double * Double: Double * String: String * Date: Date, Timestamp * Timestamp: Timestamp * Binary: Binary * Boolean: Boolean * Interval: Interval As for complex data types, Spark will determine the precedent list recursively based on their sub-types and nullability.

    With the definition of type precedent list, the general type coercion rules are as following: * Data type S is allowed to be implicitly cast as type T iff T is in the precedence list of S * Comparison is allowed iff the data type precedence list of both sides has at least one common element. When evaluating the comparison, Spark casts both sides as the tightest common data type of their precedent lists. * There should be at least one common data type among all the children's precedence lists for the following operators. The data type of the operator is the tightest common precedent data type. * In * Except * Intersect * Greatest * Least * Union * If * CaseWhen * CreateArray * Array Concat * Sequence * MapConcat * CreateMap * For complex types (struct, array, map), Spark recursively looks into the element type and applies the rules above. Note: this new type coercion system will allow implicit converting String type as other primitive types, in case of breaking too many existing Spark SQL queries. This is a special rule and it is not from the ANSI SQL standard.

  6. object AssignmentUtils extends SQLConfHelper with CastSupport
  7. object BindParameters extends Rule[LogicalPlan] with QueryErrorsBase

    Finds all named parameters in ParameterizedQuery and substitutes them by literals or by collection constructor functions such as map(), array(), struct() from the user-specified arguments.

  8. object CTESubstitution extends Rule[LogicalPlan]

    Analyze WITH nodes and substitute child plan with CTE references or CTE definitions depending on the conditions below: 1.

    Analyze WITH nodes and substitute child plan with CTE references or CTE definitions depending on the conditions below: 1. If in legacy mode, replace with CTE definitions, i.e., inline CTEs. 2. Otherwise, replace with CTE references CTERelationRefs. The decision to inline or not inline will be made later by the rule InlineCTE after query analysis.

    All the CTE definitions that are not inlined after this substitution will be grouped together under one WithCTE node for each of the main query and the subqueries. Any of the main query or the subqueries that do not contain CTEs or have had all CTEs inlined will obviously not have any WithCTE nodes. If any though, the WithCTE node will be in the same place as where the outermost With node once was.

    The CTE definitions in a WithCTE node are kept in the order of how they have been resolved. That means the CTE definitions are guaranteed to be in topological order base on their dependency for any valid CTE query (i.e., given CTE definitions A and B with B referencing A, A is guaranteed to appear before B). Otherwise, it must be an invalid user query, and an analysis exception will be thrown later by relation resolving rules.

    If the query is a SQL command or DML statement (extends CTEInChildren), place WithCTE into their children.

  9. object CleanupAliases extends Rule[LogicalPlan] with AliasHelper

    Cleans up unnecessary Aliases inside the plan.

    Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level expression in Project(project list) or Aggregate(aggregate expressions) or Window(window expressions). Notice that if an expression has other expression parameters which are not in its children, e.g. RuntimeReplaceable, the transformation for Aliases in this rule can't work for those parameters.

  10. object DecimalPrecision extends Rule[LogicalPlan] with TypeCoercionRule

    Calculates and propagates precision for fixed-precision decimals.

    Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx

    In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:

    Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2)

    To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.

    In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE - Literals INT and LONG get turned into DECIMAL with the precision strictly needed by the value

  11. object DeduplicateRelations extends Rule[LogicalPlan]
  12. object EliminateEventTimeWatermark extends Rule[LogicalPlan]

    Ignore event time watermark in batch query, which is only supported in Structured Streaming.

    Ignore event time watermark in batch query, which is only supported in Structured Streaming. TODO: add this rule into analyzer rule list.

  13. object EliminateSubqueryAliases extends Rule[LogicalPlan]

    Removes SubqueryAlias operators from the plan.

    Removes SubqueryAlias operators from the plan. Subqueries are only required to provide scoping information for attributes and can be removed once analysis is complete.

  14. object EliminateUnions extends Rule[LogicalPlan]

    Removes Union operators from the plan if it just has one child.

  15. object EliminateView extends Rule[LogicalPlan] with CastSupport

    This rule removes View operators from the plan.

    This rule removes View operators from the plan. The operator is respected till the end of analysis stage because we want to see which part of an analyzed logical plan is generated from a view.

  16. object EmptyFunctionRegistry extends EmptyFunctionRegistryBase[Expression] with FunctionRegistry
  17. object EmptyTableFunctionRegistry extends EmptyFunctionRegistryBase[LogicalPlan] with TableFunctionRegistry
  18. object ExtractDistributedSequenceID extends Rule[LogicalPlan]

    Extracts DistributedSequenceID in logical plans, and replace it to AttachDistributedSequence because this expressions requires a shuffle to generate a sequence that needs the context of the whole data, e.g., org.apache.spark.rdd.RDD.zipWithIndex.

  19. object FakeSystemCatalog extends CatalogPlugin
  20. object FakeV2SessionCatalog extends TableCatalog with FunctionCatalog
  21. object FunctionRegistry
  22. object FunctionRegistryBase
  23. object GlobalTempView extends ViewType

    GlobalTempView means cross-session global temporary views.

    GlobalTempView means cross-session global temporary views. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

  24. object HintErrorLogger extends HintErrorHandler with Logging

    The hint error handler that logs warnings for each hint error.

  25. object KeepLegacyOutputs extends Rule[LogicalPlan]

    A rule for keeping the SQL command's legacy outputs.

  26. object LocalTempView extends ViewType

    LocalTempView means session-scoped local temporary views.

    LocalTempView means session-scoped local temporary views. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use db1.view1 to reference a local temporary view.

  27. object NameParameterizedQuery extends Serializable
  28. object PersistedView extends ViewType

    PersistedView means cross-session persisted views.

    PersistedView means cross-session persisted views. Persisted views stay until they are explicitly dropped by user command. It's always tied to a database, default to the current database if not specified.

    Note that, Existing persisted view with the same name are not visible to the current session while the local temporary view exists, unless the view name is qualified by database.

  29. object PullOutNondeterministic extends Rule[LogicalPlan]

    Pulls out nondeterministic expressions from LogicalPlan which is not Project or Filter, put them into an inner Project and finally project them away at the outer Project.

  30. object RemoveTempResolvedColumn extends Rule[LogicalPlan]

    The rule ResolveReferences in the main resolution batch creates TempResolvedColumn in UnresolvedHaving/Filter/Sort to hold the temporarily resolved column with agg.child.

    The rule ResolveReferences in the main resolution batch creates TempResolvedColumn in UnresolvedHaving/Filter/Sort to hold the temporarily resolved column with agg.child.

    If the expression hosting TempResolvedColumn is fully resolved, the rule ResolveAggregationFunctions will - Replace TempResolvedColumn with AttributeReference if it's inside aggregate functions or grouping expressions. - Mark TempResolvedColumn as hasTried if not inside aggregate functions or grouping expressions, hoping other rules can re-resolve it. ResolveReferences will re-resolve TempResolvedColumn if hasTried is true, and keep it unchanged if the resolution fails. We should turn it back to UnresolvedAttribute so that the analyzer can report missing column error later.

    If the expression hosting TempResolvedColumn is not resolved, TempResolvedColumn will remain with hasTried as false. We should strip TempResolvedColumn, so that users can see the reason why the expression is not resolved, e.g. type mismatch.

  31. object ResolveCommandsWithIfExists extends Rule[LogicalPlan]

    A rule for handling commands when the table or temp view is not resolved.

    A rule for handling commands when the table or temp view is not resolved. These commands support a flag, "ifExists", so that they do not fail when a relation is not resolved. If the "ifExists" flag is set to true. the plan is resolved to NoopCommand,

  32. object ResolveExpressionsWithNamePlaceholders extends Rule[LogicalPlan]

    Resolve expressions if they contains NamePlaceholders.

  33. object ResolveHints

    Collection of rules related to hints.

    Collection of rules related to hints. The only hint currently available is join strategy hint.

    Note that this is separately into two rules because in the future we might introduce new hint rules that have different ordering requirements from join strategies.

  34. object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with EvalHelper

    Resolves the identifier expressions and builds the original plans/expressions.

  35. object ResolveInlineTables extends Rule[LogicalPlan] with CastSupport with AliasHelper with EvalHelper

    An analyzer rule that replaces UnresolvedInlineTable with LocalRelation.

  36. object ResolveLambdaVariables extends Rule[LogicalPlan]

    Resolve the lambda variables exposed by a higher order functions.

    Resolve the lambda variables exposed by a higher order functions.

    This rule works in two steps: [1]. Bind the anonymous variables exposed by the higher order function to the lambda function's arguments; this creates named and typed lambda variables. The argument names are checked for duplicates and the number of arguments are checked during this step. [2]. Resolve the used lambda variables used in the lambda function's function expression tree. Note that we allow the use of variables from outside the current lambda, this can either be a lambda function defined in an outer scope, or a attribute in produced by the plan's child. If names are duplicate, the name defined in the most inner scope is used.

  37. object ResolveLateralColumnAliasReference extends Rule[LogicalPlan]

    This rule is the second phase to resolve lateral column alias.

    This rule is the second phase to resolve lateral column alias.

    Resolve lateral column alias, which references the alias defined previously in the SELECT list. Plan-wise, it handles two types of operators: Project and Aggregate. - in Project, pushing down the referenced lateral alias into a newly created Project, resolve the attributes referencing these aliases - in Aggregate, inserting the Project node above and falling back to the resolution of Project.

    The whole process is generally divided into two phases: 1) recognize resolved lateral alias, wrap the attributes referencing them with LateralColumnAliasReference 2) when the whole operator is resolved, or contains Window but have all other resolved, For Project, it unwrap LateralColumnAliasReference, further resolves the attributes and push down the referenced lateral aliases. For Aggregate, it goes through the whole aggregation list, extracts the aggregation expressions and grouping expressions to keep them in this Aggregate node, and add a Project above with the original output. It doesn't do anything on LateralColumnAliasReference, but completely leave it to the Project in the future turns of this rule.

    ** Example for Project: Before rewrite: Project [age AS a, 'a + 1] +- Child

    After phase 1: Project [age AS a, lca(a) + 1] +- Child

    After phase 2: Project [a, a + 1] +- Project [child output, age AS a] +- Child

    ** Example for Aggregate: Before rewrite: Aggregate [dept#14] [dept#14 AS a#12, 'a + 1, avg(salary#16) AS b#13, 'b + avg(bonus#17)] +- Child [dept#14,name#15,salary#16,bonus#17]

    After phase 1: Aggregate [dept#14] [dept#14 AS a#12, lca(a) + 1, avg(salary#16) AS b#13, lca(b) + avg(bonus#17)] +- Child [dept#14,name#15,salary#16,bonus#17]

    After phase 2: Project [dept#14 AS a#12, lca(a) + 1, avg(salary)#26 AS b#13, lca(b) + avg(bonus)#27] +- Aggregate [dept#14] [avg(salary#16) AS avg(salary)#26, avg(bonus#17) AS avg(bonus)#27,dept#14] +- Child [dept#14,name#15,salary#16,bonus#17]

    Now the problem falls back to the lateral alias resolution in Project. After future rounds of this rule: Project [a#12, a#12 + 1, b#13, b#13 + avg(bonus)#27] +- Project [dept#14 AS a#12, avg(salary)#26 AS b#13] +- Aggregate [dept#14] [avg(salary#16) AS avg(salary)#26, avg(bonus#17) AS avg(bonus)#27, dept#14] +- Child [dept#14,name#15,salary#16,bonus#17]

    ** Example for Window: Query: select dept as d, sum(salary) as s, avg(s) over (partition by s order by d) as avg from employee group by dept

    After phase 1: 'Aggregate [dept#17], [dept#17 AS d#15, sum(salary#19) AS s#16L, avg(lca(s#16L)) windowspecdefinition(lca(s#16L), lca(d#15) ASC NULLS FIRST, specifiedwindowframe(..)) AS avg#25] +- Relation spark_catalog.default.employee[dept#17,name#18,salary#19,bonus#20,properties#21] It is similar to a regular Aggregate. All expressions in it are resolved, but itself is unresolved due to the Window expression. The rule allows appliction on this case.

    After phase 2: 'Project [dept#17 AS d#15, sum(salary)#26L AS s#16L, avg(lca(s#16L)) windowspecdefinition(lca(s#16L), lca(d#15) ASC NULLS FIRST, specifiedwindowframe(..)) AS avg#25] +- Aggregate [dept#17], [dept#17, sum(salary#19) AS sum(salary)#26L] +- Relation spark_catalog.default.employee[dept#17,name#18,salary#19,bonus#20,properties#21] Same as Aggregate, it extracts grouping expressions and aggregate functions. Window expressions are completely lifted up to upper Project, free from the current Aggregate.

    Then this rule will apply on the Project, adding another Project below. Till this phase, all lateral column alias references have been resolved and removed. Finally, rule ExtractWindowExpressions will apply on the top Project with window expressions. It is guaranteed that ResolveLateralColumnAliasReference is applied before ExtractWindowExpressions.

  38. object ResolvePartitionSpec extends Rule[LogicalPlan]

    Resolve UnresolvedPartitionSpec to ResolvedPartitionSpec in partition related commands.

  39. object ResolveRowLevelCommandAssignments extends Rule[LogicalPlan]

    A rule that resolves assignments in row-level commands.

    A rule that resolves assignments in row-level commands.

    Note that this rule must be run before rewriting row-level commands into executable plans. This rule does not apply to tables that accept any schema. Such tables must inject their own rules to resolve assignments.

  40. object ResolveTableSpec extends Rule[LogicalPlan]

    This object is responsible for processing unresolved table specifications in commands with OPTIONS lists.

    This object is responsible for processing unresolved table specifications in commands with OPTIONS lists. The parser produces such lists as maps from strings to unresolved expressions. After otherwise resolving such expressions in the analyzer, here we convert them to resolved table specifications wherein these OPTIONS list values are represented as strings instead, for convenience.

  41. object ResolveTimeZone extends Rule[LogicalPlan]

    Replace TimeZoneAwareExpression without timezone id by its copy with session local time zone.

  42. object ResolveUnion extends Rule[LogicalPlan]

    Resolves different children of Union to a common set of columns.

  43. object ResolveWindowTime extends Rule[LogicalPlan]

    Resolves the window_time expression which extracts the correct window time from the window column generated as the output of the window aggregating operators.

    Resolves the window_time expression which extracts the correct window time from the window column generated as the output of the window aggregating operators. The window column is of type struct { start: TimestampType, end: TimestampType }. The correct representative event time of a window is window.end - 1.

  44. object ResolveWithCTE extends Rule[LogicalPlan]

    Updates CTE references with the resolve output attributes of corresponding CTE definitions.

  45. object ResolvedTable extends Serializable
  46. object RewriteDeleteFromTable extends Rule[LogicalPlan] with RewriteRowLevelCommand

    A rule that rewrites DELETE operations using plans that operate on individual or groups of rows.

    A rule that rewrites DELETE operations using plans that operate on individual or groups of rows.

    If a table implements SupportsDeleteV2 and SupportsRowLevelOperations, this rule will still rewrite the DELETE operation but the optimizer will check whether this particular DELETE statement can be handled by simply passing delete filters to the connector. If so, the optimizer will discard the rewritten plan and will allow the data source to delete using filters.

  47. object RewriteMergeIntoTable extends Rule[LogicalPlan] with RewriteRowLevelCommand with PredicateHelper

    A rule that rewrites MERGE operations using plans that operate on individual or groups of rows.

    A rule that rewrites MERGE operations using plans that operate on individual or groups of rows.

    This rule assumes the commands have been fully resolved and all assignments have been aligned.

  48. object RewriteUpdateTable extends Rule[LogicalPlan] with RewriteRowLevelCommand

    A rule that rewrites UPDATE operations using plans that operate on individual or groups of rows.

    A rule that rewrites UPDATE operations using plans that operate on individual or groups of rows.

    This rule assumes the commands have been fully resolved and all assignments have been aligned.

  49. object SessionWindowing extends Rule[LogicalPlan]

    Maps a time column to a session window.

  50. object SimpleAnalyzer extends Analyzer

    A trivial Analyzer with a dummy SessionCatalog and EmptyTableFunctionRegistry.

    A trivial Analyzer with a dummy SessionCatalog and EmptyTableFunctionRegistry. Used for testing when all relations are already filled in and the analyzer needs only to resolve attribute references.

    Built-in function registry is set for Spark Connect project to test unresolved functions.

  51. object StreamingJoinHelper extends PredicateHelper with Logging

    Helper object for stream joins.

    Helper object for stream joins. See StreamingSymmetricHashJoinExec in SQL for more details.

  52. object SubstituteUnresolvedOrdinals extends Rule[LogicalPlan]

    Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression.

  53. object TableFunctionRegistry
  54. object TableOutputResolver
  55. object TimeTravelSpec
  56. object TimeWindowing extends Rule[LogicalPlan]

    Maps a time column to multiple time windows using the Expand operator.

    Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.

  57. object TypeCheckResult
  58. object TypeCoercion extends TypeCoercionBase

    A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.

    A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.

    Notes about type widening / tightest common types: Broadly, there are two cases when we need to widen data types (e.g. union, binary comparison). In case 1, we are looking for a common data type for two or more data types, and in this case no loss of precision is allowed. Examples include type inference in JSON (e.g. what's the column's data type if one row is an integer while the other row is a long?). In case 2, we are looking for a widened data type with some acceptable loss of precision (e.g. there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in union we would cast the decimal into double).

  59. object UnresolvedAttribute extends AttributeNameParser with Serializable
  60. object UnresolvedFunction extends Serializable
  61. object UnresolvedRelation extends Serializable
  62. object UnresolvedSeed extends LeafExpression with Unevaluable with Product with Serializable

    A place holder expression used in random functions, will be replaced after analyze.

  63. object UnresolvedTVFAliases extends Serializable
  64. object UnresolvedTableValuedFunction extends Serializable
  65. object UnsupportedOperationChecker extends Logging

    Analyzes the presence of unsupported operations in a logical plan.

  66. object UpdateAttributeNullability extends Rule[LogicalPlan]

    Updates nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes.

    Updates nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes. This step is needed because users can use a resolved AttributeReference in the Dataset API and outer joins can change the nullability of an AttributeReference. Without this rule, a nullable column's nullable field can be actually set as non-nullable, which cause illegal optimization (e.g., NULL propagation) and wrong answers. See SPARK-13484 and SPARK-13801 for the concrete queries of this case.

  67. object UpdateOuterReferences extends Rule[LogicalPlan]

    The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation.

    The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation. This rule below updates such outer references as AttributeReference referring attributes from the parent/outer query block.

    For example (SQL):

    SELECT l.a FROM l GROUP BY 1 HAVING EXISTS (SELECT 1 FROM r WHERE r.d < min(l.b))

    Plan before the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < min(outer(b#227))) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224] Plan after the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < outer(min(b#227)#249)) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224]

Inherited from AnyRef

Inherited from Any

Ungrouped