File size: 16,872 Bytes
af8fa93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
{
    "1": {
        "caption": "Figure 1: Overview of this work. We address two core challenges in scientific poster generation: Left: How to create a poster from a paper -we propose PosterAgent (Sec. 4), a framework that transforms long-context scientific papers (20K+ tokens) into structured visual posters; and Right: How to evaluate poster quality -weintroduce the Paper2Poster benchmark (Sec. 3), which enables systematic comparison between agent-generated and author-designed posters.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-1.png",
        "width": 239,
        "height": 271,
        "figure_size": 64769,
        "figure_aspect": 0.8819188191881919
    },
    "3": {
        "caption": "Paper ( 20K tokens )",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-3.png",
        "width": 398,
        "height": 265,
        "figure_size": 105470,
        "figure_aspect": 1.5018867924528303
    },
    "6": {
        "caption": "Figure 2: Data Statistics of Paper2Poster. (a) Word cloud illustrating the diversity of research topics. (b) Textual Token statistics and Figure count statistics for input papers vs. posters provided by authors. Overall, these statistics highlight that Paper2Poster is a multimodal context compression task, requiring effective abstraction of both textual and visual content.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-6.png",
        "width": 564,
        "height": 557,
        "figure_size": 314148,
        "figure_aspect": 1.0125673249551166
    },
    "7": {
        "caption": "Figure 3: Left : Overview of the evaluation framework in Paper2Poster. Middle : We automatically generate multiple-choice questions from each paper using an LLM (o3), forming the our PaperQuiz evaluation. Right : In PaperQuiz, we simulate multiple reader by allowing VLMs-representing different expertise levels ( e.g., student, professor)-to read each generated poster and answer the quiz. The poster that achieves the highest average score is considered the most effective in conveying the paper's content.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-7.png",
        "width": 1983,
        "height": 394,
        "figure_size": 781302,
        "figure_aspect": 5.032994923857868
    },
    "8": {
        "caption": "Figure 4: Illustration of the PosterAgent pipeline. Given an input paper, PosterAgent generates a structured academic poster through three modules: 1. Parser: Extracts key textual and visual assets using a combination of tools and LLM-based summarization, resulting in a structured asset library. 2. Planner: Matches assets and arranges them into coherent layouts, iteratively generating panels with a zoom-in operation. 3. Painter-Commenter: The Painter generates panel-level bullet-content along with executable code, and renders the visual output, while the Commenter-a VLM with in-context reference-provides feedback to ensure layout coherence and prevent content overflow.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-8.png",
        "width": 1972,
        "height": 969,
        "figure_size": 1910868,
        "figure_aspect": 2.0350877192982457
    },
    "9": {
        "caption": "Figure 5: PaperQuiz's Avg. scores across different Reader VLMs (x-axis) for each poster type (legend lines). Refer to Append. Tab. 3 for full model names.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-9.png",
        "width": 769,
        "height": 505,
        "figure_size": 388345,
        "figure_aspect": 1.5227722772277228
    },
    "10": {
        "caption": "Figure 7 presents the average token cost per poster across different methods. Our PosterAgent achieves great token efficiency, using only 101 . 1 K (4o-based) and 47 . 6 K (Qwen-based) tokens-reducing cost by 60% -87% compared to OWL-4o [6]. This translates to just $0 . 55 for 4o and $0 . 0045 for Qwen per poster, highlighting its effectiveness, (see Append. E.2 for further details).",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-10.png",
        "width": 1948,
        "height": 1100,
        "figure_size": 2142800,
        "figure_aspect": 1.770909090909091
    },
    "11": {
        "caption": "Figure 7: Average token consumptions for different methods. Details are provided in Appendix E.1.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-11.png",
        "width": 701,
        "height": 505,
        "figure_size": 354005,
        "figure_aspect": 1.388118811881188
    },
    "12": {
        "caption": "Figure 6: PaperQuiz's Avg scores across different types of posters (x-axis) for readers (colored lines) on human evaluation subset.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-12.png",
        "width": 661,
        "height": 428,
        "figure_size": 282908,
        "figure_aspect": 1.544392523364486
    },
    "13": {
        "caption": "Figure 10: Posters for MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-13.png",
        "width": 960,
        "height": 521,
        "figure_size": 500160,
        "figure_aspect": 1.8426103646833014
    },
    "15": {
        "caption": "(b) PosterAgent-generated poster.(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-15.png",
        "width": 1993,
        "height": 810,
        "figure_size": 1614330,
        "figure_aspect": 2.460493827160494
    },
    "16": {
        "caption": "(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-16.png",
        "width": 945,
        "height": 680,
        "figure_size": 642600,
        "figure_aspect": 1.3897058823529411
    },
    "17": {
        "caption": "(b) PosterAgent-generated poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-17.png",
        "width": 957,
        "height": 708,
        "figure_size": 677556,
        "figure_aspect": 1.3516949152542372
    },
    "18": {
        "caption": "Figure 11: Posters for Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data.(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-18.png",
        "width": 938,
        "height": 620,
        "figure_size": 581560,
        "figure_aspect": 1.5129032258064516
    },
    "19": {
        "caption": "Figure 12: Posters for Conformal Semantic Keypoint Detection with Statistical Guarantees.(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-19.png",
        "width": 1176,
        "height": 596,
        "figure_size": 700896,
        "figure_aspect": 1.9731543624161074
    },
    "20": {
        "caption": "Figure 13: Posters for Neural Tangent Kernels for Axis-Aligned Tree Ensembles.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-20.png",
        "width": 790,
        "height": 598,
        "figure_size": 472420,
        "figure_aspect": 1.3210702341137124
    },
    "22": {
        "caption": "(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-22.png",
        "width": 929,
        "height": 583,
        "figure_size": 541607,
        "figure_aspect": 1.5934819897084047
    },
    "23": {
        "caption": "Figure 16: Posters for Identifying the Context Shift between Test Benchmarks and Production Data.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-23.png",
        "width": 958,
        "height": 646,
        "figure_size": 618868,
        "figure_aspect": 1.4829721362229102
    },
    "24": {
        "caption": "(a) Author-designed poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-24.png",
        "width": 1190,
        "height": 567,
        "figure_size": 674730,
        "figure_aspect": 2.0987654320987654
    },
    "29": {
        "caption": "(a) Direct.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-29.png",
        "width": 896,
        "height": 323,
        "figure_size": 289408,
        "figure_aspect": 2.7739938080495357
    },
    "30": {
        "caption": "(b) Tree.(c) Tree + Commenter.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-30.png",
        "width": 899,
        "height": 644,
        "figure_size": 578956,
        "figure_aspect": 1.3959627329192548
    },
    "31": {
        "caption": "Figure 17: Ablation study on Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval. Text overflow areas are highlighted with red bounding boxes.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-31.png",
        "width": 897,
        "height": 679,
        "figure_size": 609063,
        "figure_aspect": 1.321060382916053
    },
    "33": {
        "caption": "Figure 18: Ablation study on Visual Correspondence Hallucination. Text overflow areas are highlighted with red bounding boxes.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-33.png",
        "width": 895,
        "height": 274,
        "figure_size": 245230,
        "figure_aspect": 3.2664233576642334
    },
    "34": {
        "caption": "(b) Tree.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-34.png",
        "width": 900,
        "height": 511,
        "figure_size": 459900,
        "figure_aspect": 1.761252446183953
    },
    "35": {
        "caption": "(c) Tree + Commenter.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-35.png",
        "width": 901,
        "height": 513,
        "figure_size": 462213,
        "figure_aspect": 1.756335282651072
    },
    "37": {
        "caption": "Figure 19: Ablation study on DARTFormer: Finding The Best Type Of Attention. Text overflow areas are highlighted with red bounding boxes, large blank regions are highlighted with purple bounding boxes.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-37.png",
        "width": 895,
        "height": 747,
        "figure_size": 668565,
        "figure_aspect": 1.1981258366800536
    },
    "39": {
        "caption": "(c) Tree + Commenter.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-39.png",
        "width": 899,
        "height": 1187,
        "figure_size": 1067113,
        "figure_aspect": 0.7573715248525695
    },
    "41": {
        "caption": "Figure 20: Ablation study on CW-ERM: Improving Autonomous Driving Planning with Closed-loop Weighted Empirical Risk Minimization. Text overflow areas are highlighted with red bounding boxes, and large blank regions are highlighted with purple bounding boxes.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-41.png",
        "width": 898,
        "height": 1345,
        "figure_size": 1207810,
        "figure_aspect": 0.6676579925650558
    },
    "43": {
        "caption": "(c) Tree + Commenter.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-43.png",
        "width": 908,
        "height": 1341,
        "figure_size": 1217628,
        "figure_aspect": 0.6771066368381805
    },
    "45": {
        "caption": "Figure 21: Ablation study on DeepJoint: Robust Survival Modelling Under Clinical Presence Shift. Text overflow areas are highlighted with red bounding boxes.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-45.png",
        "width": 894,
        "height": 1234,
        "figure_size": 1103196,
        "figure_aspect": 0.7244732576985413
    },
    "48": {
        "caption": "(c) Tree + Commenter.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-48.png",
        "width": 902,
        "height": 1266,
        "figure_size": 1141932,
        "figure_aspect": 0.7124802527646129
    },
    "49": {
        "caption": "(a) A poster generated by 4o-Image , where substantial corrupted text is generated.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-49.png",
        "width": 949,
        "height": 1409,
        "figure_size": 1337141,
        "figure_aspect": 0.673527324343506
    },
    "50": {
        "caption": "(b) A poster generated by PPTAgent , where meaningless template placeholder text is remained.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-50.png",
        "width": 956,
        "height": 1433,
        "figure_size": 1369948,
        "figure_aspect": 0.6671318911374738
    },
    "51": {
        "caption": "Figure 22: Examples of posters with corrupted text.(a) A poster generated by 4o-Image , where the poster is cutoff horizontally due to incomplete generation.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-51.png",
        "width": 966,
        "height": 887,
        "figure_size": 856842,
        "figure_aspect": 1.0890642615558062
    },
    "52": {
        "caption": "Figure 23: Examples of posters with cutoff.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-52.png",
        "width": 948,
        "height": 962,
        "figure_size": 911976,
        "figure_aspect": 0.9854469854469855
    },
    "53": {
        "caption": "(a) A poster produced by 4o-Image , featuring a figure that is low-resolution, visually corrupted, and unintelligible.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-53.png",
        "width": 968,
        "height": 951,
        "figure_size": 920568,
        "figure_aspect": 1.017875920084122
    },
    "54": {
        "caption": "(b) A poster generated by PPTAgent , where figures are rendered too small to be legible.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-54.png",
        "width": 958,
        "height": 1277,
        "figure_size": 1223366,
        "figure_aspect": 0.750195771339076
    },
    "55": {
        "caption": "Figure 24: Examples of posters with obscure figures.(a) A poster generated by OWL-4o , where there are large blanks on the poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-55.png",
        "width": 954,
        "height": 680,
        "figure_size": 648720,
        "figure_aspect": 1.4029411764705881
    },
    "56": {
        "caption": "Figure 25: Examples of posters with large blanks.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-56.png",
        "width": 955,
        "height": 723,
        "figure_size": 690465,
        "figure_aspect": 1.3208852005532503
    },
    "57": {
        "caption": "(a) A poster generated by OWL-4o , where no figures are inserted into poster.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-57.png",
        "width": 959,
        "height": 549,
        "figure_size": 526491,
        "figure_aspect": 1.7468123861566485
    },
    "58": {
        "caption": "Figure 26: Examples of posters without figures.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-58.png",
        "width": 962,
        "height": 1435,
        "figure_size": 1380470,
        "figure_aspect": 0.670383275261324
    },
    "59": {
        "caption": "(a) A poster generated by PosterAgent-Qwen , where there is text overflowing outside textbox.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-59.png",
        "width": 957,
        "height": 1277,
        "figure_size": 1222089,
        "figure_aspect": 0.7494126859827721
    },
    "60": {
        "caption": "Figure 27: Examples of posters with textual overflow.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-60.png",
        "width": 956,
        "height": 640,
        "figure_size": 611840,
        "figure_aspect": 1.49375
    },
    "61": {
        "caption": "Figure 29: In-context references for the commenter help the VLM better identify whether the current panel falls into a failure case.",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-61.png",
        "width": 1199,
        "height": 828,
        "figure_size": 992772,
        "figure_aspect": 1.4480676328502415
    },
    "63": {
        "caption": "Figure 28: Failure generation examples by Stable Diffusion Ultra model [28].",
        "image_path": "<gpt-5_gpt-5>_images_and_tables/paper/paper-picture-63.png",
        "width": 1193,
        "height": 785,
        "figure_size": 936505,
        "figure_aspect": 1.5197452229299364
    }
}