Definition: Metrics used heavily in sequence-to-sequence tasks measuring n-gram overlap between a generated response and a target output. Higher BLEU and ROUGE-1 scores equates to better overlap between the generated and target output.
Calculation: A measure of n-gram overlap. A more lengthy explanation of BLEU provided here. A more lengthy explanation of ROUGE-1 provided here.
Usefulness: Evaluate the accuracy of model outputs in comparison to target outputs, enabling a metric to guide improvement and examination of areas where a model has trouble adhering to expected output.