What is Validation?

Some questions keep popping up about validation and I thought I'd try to
clarify it a bit.

What is Validation

I've never seen a good definition for what validation in IS is supposed to be
or what it's supposed to do anywhere. There is a lot of discussion about what
components do when validating, but before launching into a discussion about it,
I thought it would be helpful to give a little history and explain a little of
the philosophy behind validation. This will likely be better than a straight
definition because, hopefully, you'll understand the evolution and reasoning
behind it rather than just a static description.

Early Design Iterations

Early on, I remember walking into perhaps the first design review of what was
later to become the runtime. Among all the proposals in that review, there was a
new method on the task interfaces called Custom or something similar. (Can't
remember, after all, it _has_ been 5 years now) Nobody liked the name but
everyone liked the concept. The idea was that it would be a method that the
runtime would call "sometime" before execution to give the task a chance to "do
stuff" before executing. Pretty vague, yes. I wished all ideas and designs would
just spring into my mind fully formed on the first day of a project. Just
doesn't seem to happen that way. If you know of anyone that does this, let us
know. We're hiring. 🙂

As I recall, Gert Draper, who was our PUM at that time suggested that we call
it Validate. Hmmm, has a nice ring to it. Makes sense, because, typically, what
folks want to do before executing is make sure that execution will succeed. Gert
asked why we need to pass in variables, and connections etc. Well, I said,
somewhat unsure of my answer. After all, this was Gert. 🙂 "Well, because the
tasks may need to use them for the work they do during validation." Knees
knocking, hands shaking, teeth chattering. Gert says, "Oh, OK". Boom? A new
method is born and with that the start of a new concept that needed to be
developed. It's interesting how one little old method can spawn multiple
discussions and philosophical discussions.

Validation caused a whole lot of confusion and discussion. We talked about
"deep validation" vs. "light validation". That debate raged for months. What
type of validation should components do? We talked about what other components
should do. At the time, we still didn't have the notion of extensible log
providers, connections or enumerators. Later, when those were "componentized",
we only passed variables. The logic went, "Why would anyone need connections for
a log provider?" Hmmmm…

The light validation discussion was very interesting. A lot of talk about how
to indicate to the task that the validation was just preliminary vs.
pre-execution validation etc. The thought being that you wouldn't want to
validate a task strictly if the task was being configured or had property
mappings etc. We didn't have the notion of warnings and information events
either. Only errors which were inflexible and for purposes of validation,
terminating. Out of these discussions grew the notions of Warnings, later came
information events. Eventually, we scrapped the whole idea of light validation
vs. deep validation. Warnings enabled tasks to give information and yet still
say they could function in spite of the issue they were reporting. Slowly we
migrated to a point where validation was just validation. No variants. Then we
arrived at the definition we have today.

Validation is what a component does to detect any issues that would cause
it to fail during execution.

This led to a few other "rules":

  • When a component validates, it should always validate that given it's
    current property settings, it will succeed during execution.
  • Components should not return when they find the first error. They should
    continue to validate until finding all errors. This allows for a better picture
    when the whole error stack of errors is visible.
  • Components should emit warnings for cases where the error is not fatal, but
    could cause problems. For example, when the send mail task doesn't have a
    subject. Non-terminative errors.
  • Some others I can't remember right now…

Now the problem is, when do components get validated? Essentially we look at
validation in a number of different ways depending on the situation. If you're
designing a package, you'd like to know that it has errors during design, not
when you go to execute it. So, the designer takes some liberties here and
validates components whenever the UI for the component is modified. When a
package is opened, the designer validates it as well. These are design time
validations that should not be confused with execution time validations. These
are done to ensure that the package writer is alerted ASAP to problems with the
package.

Execution Validation

Execution validation is perhaps where most folks get tripped up. Execution
time validation happens at two key points. When the package is executed and when
the runtime executes tasks in the package. In the designer, this can be
confusing. Because it's validating all the time and because it appears that the
package is running in the same process as the designer, it seems there is no
clear distinction between design time and execution time validation. But the
designer doesn't run the package in it's own process. It actually runs it out of
process (for a number of reasons I won't go into here). So a host process loads
the package and executes it. When the designer calls Execute() on the package,
the package validates. Everything in the package gets validated from the package
down to the containers to the components. This is general validation. Then, when
the runtime calls execute on each task, the TaskHost calls validate again. This
is component validation.

Early vs. Late Validation

Why? This is the confusing part. Since we want folks to always do strict
validation, (remember no light validation), and since packages have the notion
of late configuration or dynamic configuration via property expressions and
foreach loops etc., and since we make the assumption that you would rather have
a task fail validation then to corrupt data or otherwise fail execution in a
destructive way, we validate twice. We validate the whole package and we
validate each individual task right before execution. In fact, if you were to
look at the TaskHost code, you'd see something like this:

ExecResult Execute(parameters)
{
ExecResult = Task.Validate(Parameters)
if(FAILED(ExecResult))
return ExecResult;
return Task.Execute(Parameters)
}

Now, some tasks aren't going to be ready to execute when the package
Execute() method gets called. They may rely on a variable value that gets set by
another task further up the package call chain. They may be waiting for a file
to be generated or dropped etc. So, if there was no way to validate tasks after
Execute() gets called on the package, the package would never run successfully.
However, the runtime needs to know when this is the case. There is no way for
the runtime to know when to validate early or late. Enter the "DelayValidation"
property.

DelayValidation

This property tells the runtime, "Don't validate me until the very last
instant". This property is on all containers and all hosted objects. It's a
simple flag. When the early, package level, validation happens, the runtime
checks this flag. If set to true, the runtime skips validating that part of the
package. Think scope here. If there is a container with multiple children
containers with multiple grandchildren containers, and that container has set
DelayValidation to true, none of it get's validated early. The whole thing gets
skipped.

Late Validation

Later in the package execution, the runtime will call the individual
Execute() method for tasks. Then the runtime will call the Validate() method of
the task implicitly when it calls the Execute() method. Now, if the task fails
validation, you're sure that it would have also failed execution because there
is very little chance that anything will change between the time that the
runtime calls Validate() and Execute().

Validation is important because it gives early warning of critical issues.
It's the way all the nifty little icons pop up in the designer when there's an
error. When you move your mouse cursor over the little red x in a task, and the
error message shows up in the tooltip? That's the result of validation.
Validation keeps your system safer because components check to ensure that the
operation will be successful, or at least has a good chance of success before
performing potentially invasive or damaging operations. Validation is to
packages what compiling is to source code.

Reproduced by kind permission of Kirk Haselden (Microsoft).