STATUS
This page outlines the current progress of Collective Memory against several WebArena benchmarks for autonomous tasks, passing through the gpt-4-1106-preview
model from OpenAI.
shopping admin
0.00%
map
0.00%
shopping
4.69%
0.00%
gitlab
0.00%
wikipedia
0.00%
Success rates may be artificially low as we begin to fill our benchmark records.