tgoop.com/david_random/450
Create:
Last Update:
llama.cpp用上SYCL之后终于展现出XMX的优势,B580的prefill性能赶上了规格比它大50%的7800 XT,不过decode的效率还是稍微低了点,带宽差距不应该这么大。
当然目前SYCL版本的flash attention kernel看起来性能还是不太好,性能会减半,目测有不少优化空间。提高batch size性能也会立刻锐减。
BY David's random thoughts
Share with your friend now:
tgoop.com/david_random/450
View MORE
Open in Telegram
Telegram News
Polls How to create a business channel on Telegram? (Tutorial) The best encrypted messaging apps Informative
from us