Yes they can. The size of many codebases is much larger and LLMs can handle thos...

falloutx · 2026-01-14T00:06:54 1768349214

Novel is different from a codebase. In code you can have a relationship between files and most files can be ignored depending on what you're doing. But for a novel, its a sequential thing, in most cases A leads to B and B leads to C and so on.

> Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.

This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things.

pigpop · 2026-01-14T03:01:42 1768359702

Yes, it is possible to do those things and there are benchmarks for testing multimodal models on their ability to do so. Context length is the major limitation but longer videos can be processed in small chunks whose descriptions can be composed into larger scenes.

https://github.com/JUNJIE99/MLVU

https://huggingface.co/datasets/OpenGVLab/MVBench

Ovis and Qwen3-VL are examples of models that can work with multiple frames from a video at once to produce both visual and temporal understanding

https://huggingface.co/AIDC-AI/Ovis2.5-9B

https://github.com/QwenLM/Qwen3-VL

cmcaleer · 2026-01-14T01:34:25 1768354465

You’re moving the goalposts. Gary Marcus’ proposal was being able to ask: Who are the characters? What are their conflicts and motivations? etc.

Which is a relatively trivial task for a current LLM.

daveguy · 2026-01-14T02:11:03 1768356663

The Gary Marcus proposal you refer to was about a novel, and not a codebase. I think GP's point is that motivations require analysis outside of the given (or derived) context window, which LLMs are essentially incapable of doing.