public class Execution extends Object implements AccessExecution, org.apache.flink.api.common.Archiveable<ArchivedExecution>, LogicalSlot.Payload
ExecutionVertex can be executed multiple times
(for recovery, re-computation, re-configuration), this class tracks the state of a single
execution of that vertex and the resources.
In several points of the code, we need to deal with possible concurrent state changes and actions. For example, while the call to deploy a task (send it to the TaskManager) happens, the task gets cancelled.
We could lock the entire portion of the code (decision to deploy, deploy, set state to running) such that it is guaranteed that any "cancel command" will only pick up after deployment is done and that the "cancel command" call will never overtake the deploying call.
This blocks the threads big time, because the remote calls may take long. Depending of their locking behavior, it may even result in distributed deadlocks (unless carefully avoided). We therefore use atomic state updates and occasional double-checking to ensure that the state after a completed call is as expected, and trigger correcting actions if it is not. Many actions are also idempotent (like canceling).
| Constructor and Description |
|---|
Execution(Executor executor,
ExecutionVertex vertex,
int attemptNumber,
long startTimestamp,
org.apache.flink.api.common.time.Time rpcTimeout)
Creates a new Execution attempt.
|
| Modifier and Type | Method and Description |
|---|---|
ArchivedExecution |
archive() |
void |
cancel() |
void |
deploy()
Deploys the execution to the previously assigned resource.
|
void |
fail(Throwable t)
This method fails the vertex due to an external condition.
|
AllocationID |
getAssignedAllocationID() |
LogicalSlot |
getAssignedResource() |
TaskManagerLocation |
getAssignedResourceLocation()
Returns the
TaskManagerLocation for this execution. |
ExecutionAttemptID |
getAttemptId()
Returns the
ExecutionAttemptID for this Execution. |
int |
getAttemptNumber()
Returns the attempt number for this execution.
|
Optional<ErrorInfo> |
getFailureInfo()
Returns the exception that caused the job to fail.
|
CompletableFuture<?> |
getInitializingOrRunningFuture()
Gets a future that completes once the task execution reaches one of the states
ExecutionState.INITIALIZING or ExecutionState.RUNNING. |
IOMetrics |
getIOMetrics() |
org.apache.flink.core.io.InputSplit |
getNextInputSplit() |
int |
getParallelSubtaskIndex()
Returns the subtask index of this execution.
|
CompletableFuture<?> |
getReleaseFuture()
Gets the release future which is completed once the execution reaches a terminal state and
the assigned resource has been released.
|
Optional<ResultPartitionDeploymentDescriptor> |
getResultPartitionDeploymentDescriptor(IntermediateResultPartitionID id) |
ExecutionState |
getState()
Returns the current
ExecutionState for this execution. |
long |
getStateTimestamp(ExecutionState state)
Returns the timestamp for the given
ExecutionState. |
long[] |
getStateTimestamps()
Returns the timestamps for every
ExecutionState. |
CompletableFuture<TaskManagerLocation> |
getTaskManagerLocationFuture() |
JobManagerTaskRestore |
getTaskRestore() |
CompletableFuture<ExecutionState> |
getTerminalStateFuture()
Gets a future that completes once the task execution reaches a terminal state.
|
Map<String,org.apache.flink.api.common.accumulators.Accumulator<?,?>> |
getUserAccumulators() |
StringifiedAccumulatorResult[] |
getUserAccumulatorsStringified()
Returns the user-defined accumulators as strings.
|
ExecutionVertex |
getVertex() |
String |
getVertexWithAttempt() |
boolean |
isFinished() |
void |
markFinished() |
void |
notifyCheckpointAborted(long abortCheckpointId,
long latestCompletedCheckpointId,
long timestamp)
Notify the task of this execution about a aborted checkpoint.
|
void |
notifyCheckpointOnComplete(long completedCheckpointId,
long completedTimestamp,
long lastSubsumedCheckpointId)
Notify the task of this execution about a completed checkpoint and the last subsumed
checkpoint id if possible.
|
CompletableFuture<Void> |
registerProducedPartitions(TaskManagerLocation location,
boolean notifyPartitionDataAvailable) |
CompletableFuture<Acknowledge> |
sendOperatorEvent(OperatorID operatorId,
org.apache.flink.util.SerializedValue<OperatorEvent> event)
Sends the operator event to the Task on the Task Executor.
|
void |
setAccumulators(Map<String,org.apache.flink.api.common.accumulators.Accumulator<?,?>> userAccumulators)
Update accumulators (discarded when the Execution has already been terminated).
|
void |
setInitialState(JobManagerTaskRestore taskRestore)
Sets the initial state for the execution.
|
CompletableFuture<?> |
suspend() |
String |
toString() |
void |
transitionState(ExecutionState targetState) |
CompletableFuture<Acknowledge> |
triggerCheckpoint(long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions)
Trigger a new checkpoint on the task of this execution.
|
CompletableFuture<Acknowledge> |
triggerSynchronousSavepoint(long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions)
Trigger a new checkpoint on the task of this execution.
|
boolean |
tryAssignResource(LogicalSlot logicalSlot)
Tries to assign the given slot to the execution.
|
public Execution(Executor executor, ExecutionVertex vertex, int attemptNumber, long startTimestamp, org.apache.flink.api.common.time.Time rpcTimeout)
executor - The executor used to dispatch callbacks from futures and asynchronous RPC
calls.vertex - The execution vertex to which this Execution belongsattemptNumber - The execution attempt number.startTimestamp - The timestamp that marks the creation of this ExecutionrpcTimeout - The rpcTimeout for RPC calls like deploy/cancel/stop.public ExecutionVertex getVertex()
public ExecutionAttemptID getAttemptId()
AccessExecutionExecutionAttemptID for this Execution.getAttemptId in interface AccessExecutionpublic int getAttemptNumber()
AccessExecutiongetAttemptNumber in interface AccessExecutionpublic ExecutionState getState()
AccessExecutionExecutionState for this execution.getState in interface AccessExecution@Nullable public AllocationID getAssignedAllocationID()
public CompletableFuture<TaskManagerLocation> getTaskManagerLocationFuture()
public LogicalSlot getAssignedResource()
public Optional<ResultPartitionDeploymentDescriptor> getResultPartitionDeploymentDescriptor(IntermediateResultPartitionID id)
public boolean tryAssignResource(LogicalSlot logicalSlot)
logicalSlot - to assign to this executionpublic org.apache.flink.core.io.InputSplit getNextInputSplit()
public TaskManagerLocation getAssignedResourceLocation()
AccessExecutionTaskManagerLocation for this execution.getAssignedResourceLocation in interface AccessExecutionpublic Optional<ErrorInfo> getFailureInfo()
AccessExecutiongetFailureInfo in interface AccessExecutionOptional of ErrorInfo containing the Throwable and the
time it was registered if an error occurred. If no error occurred an empty Optional will be returned.public long[] getStateTimestamps()
AccessExecutionExecutionState.getStateTimestamps in interface AccessExecutionpublic long getStateTimestamp(ExecutionState state)
AccessExecutionExecutionState.getStateTimestamp in interface AccessExecutionstate - state for which the timestamp should be returnedpublic boolean isFinished()
@Nullable public JobManagerTaskRestore getTaskRestore()
public void setInitialState(JobManagerTaskRestore taskRestore)
TaskDeploymentDescriptor to the TaskManagers.taskRestore - information to restore the statepublic CompletableFuture<?> getInitializingOrRunningFuture()
ExecutionState.INITIALIZING or ExecutionState.RUNNING. If this task never reaches
these states (for example because the task is cancelled before it was properly deployed and
restored), then this future will never complete.
The future is completed already in the ExecutionState.INITIALIZING state, because
various running actions are already possible in that state (the task already accepts and
sends events and network data for task recovery). (Note that in earlier versions, the
INITIALIZING state was not separate but part of the RUNNING state).
This future is always completed from the job master's main thread.
public CompletableFuture<ExecutionState> getTerminalStateFuture()
getTerminalStateFuture in interface LogicalSlot.Payloadpublic CompletableFuture<?> getReleaseFuture()
public CompletableFuture<Void> registerProducedPartitions(TaskManagerLocation location, boolean notifyPartitionDataAvailable)
public void deploy()
throws JobException
JobException - if the execution cannot be deployed to the assigned resourcepublic void cancel()
public CompletableFuture<?> suspend()
public void fail(Throwable t)
fail in interface LogicalSlot.Payloadt - The exception that caused the task to fail.public void notifyCheckpointOnComplete(long completedCheckpointId,
long completedTimestamp,
long lastSubsumedCheckpointId)
completedCheckpointId - of the completed checkpointcompletedTimestamp - of the completed checkpointlastSubsumedCheckpointId - of the last subsumed checkpoint, a value of CheckpointStoreUtil.INVALID_CHECKPOINT_ID means no
checkpoint has been subsumed.public void notifyCheckpointAborted(long abortCheckpointId,
long latestCompletedCheckpointId,
long timestamp)
abortCheckpointId - of the subsumed checkpointlatestCompletedCheckpointId - of the latest completed checkpointtimestamp - of the subsumed checkpointpublic CompletableFuture<Acknowledge> triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
checkpointId - of th checkpoint to triggertimestamp - of the checkpoint to triggercheckpointOptions - of the checkpoint to triggerpublic CompletableFuture<Acknowledge> triggerSynchronousSavepoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
checkpointId - of th checkpoint to triggertimestamp - of the checkpoint to triggercheckpointOptions - of the checkpoint to triggerpublic CompletableFuture<Acknowledge> sendOperatorEvent(OperatorID operatorId, org.apache.flink.util.SerializedValue<OperatorEvent> event)
@VisibleForTesting public void markFinished()
public void transitionState(ExecutionState targetState)
public String getVertexWithAttempt()
public void setAccumulators(Map<String,org.apache.flink.api.common.accumulators.Accumulator<?,?>> userAccumulators)
userAccumulators - the user accumulatorspublic Map<String,org.apache.flink.api.common.accumulators.Accumulator<?,?>> getUserAccumulators()
public StringifiedAccumulatorResult[] getUserAccumulatorsStringified()
AccessExecutiongetUserAccumulatorsStringified in interface AccessExecutionpublic int getParallelSubtaskIndex()
AccessExecutiongetParallelSubtaskIndex in interface AccessExecutionpublic IOMetrics getIOMetrics()
getIOMetrics in interface AccessExecutionpublic ArchivedExecution archive()
archive in interface org.apache.flink.api.common.Archiveable<ArchivedExecution>Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.