The agent observes input-output pairs and learns a function that this learning maps from input to output. For example, the inputs could be camera images, each one accompanied by an output saying "bus" or "pedestrian," etc. This type of learning is known as: