guccihat 2 days ago

The article concludes that the overall translation score of Llama 4 is below that of Llama 3.3. However, the included table shows that Llama 4 scores better on all subcategories included in the test - coherence, idiomaticity and accuracy.

Something does not add up. The conclusion just states "...downgrade from LLama 3.3 in every respect" without further explanation.

  • smallerize 2 days ago

    Looking at the individual language pages, it does come behind pretty often. And in Japanese for example, it has higher scores but also a much higher refusal rate. The summary page doesn't show a refusal rate column, so not all the data is represented there.