• LLMs can’t reason

    Home » Forums » Newsletter and Homepage topics » LLMs can’t reason

    Author
    Topic
    #2729748

    AI By Michael A. Covington The word is out — large language models, systems like ChatGPT, can’t reason. That’s a problem, because reasoning is what we
    [See the full post at: LLMs can’t reason]

    Viewing 7 reply threads
    Author
    Replies
    • #2729777

      It’s 4:30 am as I read about “LLMs can’t reason.” The submission by Mr. Covington greatly intrigued me in the topics of ChatGPT and LLMs, causing me to read about half of Mr. Wolfram’s treatise early this morning.

      “Thank you,”  Mr. Covington! I plan to read the remainder of Mr. Wolfram’s treatise later today.

      Alan M. Hoffberg,CPA.MBA
      Retired from many vocations in the computer field about 40 years ago
      1 user thanked author for this post.
    • #2729935

      I can’t praise this article enough. While I am a big believer that machine learning based AI already has and will have huge power in the fields to which it is suited, the LLM hype as some universal mind that will surpass humans any day now (which has been promised for decades) seems fundamentally overblown to me, and this is probably the best summary of why this is the case that I have read. It laid out my thoughts on the matter more clearly than I ever could. A really important article and really well done.

      2 users thanked author for this post.
    • #2729987

      I fed both examples through Perplexity.ai, the favorite of one or more AskWoody authors.  Without hesitation, it provided the correct math computation of the kiwi’s and even stated the 5 smaller would still be included.

      I then fed it the “If it were true that all corgis are reptiles, and all reptiles are plants, does it follow that all corgis are plants?” logic question.  At first it rebuked it as being “biologically impossible”, and asked if I wanted to rephrase it in a way that maintains biological accuracy.

      I instructed it, “No, analyze it as if it were true.”

      It then “apologized” and provided the correct logic of “if A equals B, and B equals C, then A equals C”.

      Now, if only it could reason “the Ultimate Question of Life, the Universe, and Everything.”  And do it in less than seven million years…

      6 users thanked author for this post.
      • #2730406

        What a great article. I also fed the math question into Perplexity. The answer came back as 185 for me. I then asked “Why did you subtract 5 smaller kiwis? He picked them and has them.” Answer: “You’re absolutely right, and I apologize for the mistake in my previous answer. I misinterpreted the information about the 5 smaller kiwis. Let’s correct the calculation: Oliver picked a total of 190 kiwis.” and so on.

        1 user thanked author for this post.
        • #2730470

          I put the kiwi question to my offline LLM.  I got this:

          llama3.1:latest
          Let’s break it down step by step:

          On Friday, Oliver picked 44 kiwis.
          On Saturday, Oliver picked 58 kiwis.
          On Sunday, he picked double the number of kiwis he did on Friday, which is:
          2 x 44 = 88 kiwis
          However, since 5 of them were a bit smaller than average, we need to subtract those from the total:
          88 – 5 = 83

          So far, Oliver has:
          Friday: 44
          Saturday: 58
          Sunday: 83

          Adding all these together, Oliver has a total of:
          44 + 58 + 83 = 185 kiwis.

          Is that the answer you were looking for?

          The main question asked is, “How many kiwis does Oliver have? You made the implication that the smaller kiwis were not to be included in the total, however, Oliver does, indeed, have them. There was no mention in the question of discarding the smaller kiwis.

          llama3.1:latest
          Since there’s no indication that Oliver should discard the smaller kiwis, we should count all the kiwis he picked.

          So, on Sunday, Oliver picked double the number of kiwis he did on Friday, which is:
          2 x 44 = 88
          This includes the 5 smaller kiwis.

          Therefore, the total number of kiwis Oliver has is:
          Friday: 44
          Saturday: 58
          Sunday: 88

          And adding all these together gives us:
          44 + 58 + 88 = 190

          I then asked this:

          Are you able to tell me why, in your first answer, you assumed that the five smaller than average kiwis should not be counted?

          llama3.1:latest

          I made an assumption based on a common mathematical convention: when we’re counting items and some of them don’t meet certain criteria (in this case, being “smaller than average”), it’s often implied that those items shouldn’t be included in the total count. This is especially true in problems involving averages or statistics.

          However, as you pointed out, there was no explicit instruction to exclude the smaller kiwis from the count. In retrospect, I should have counted all 88 kiwis picked on Sunday, including the 5 smaller ones.

          It’s a great reminder that assumptions can be problematic, and it’s essential to stick strictly to the problem statement when providing solutions!

           

          Always create a fresh drive image before making system changes/Windows updates; you may need to start over!
          We all have our own reasons for doing the things that we do with our systems; we don't need anyone's approval, and we don't all have to do the same things.
          We were all once "Average Users".

          3 users thanked author for this post.
          • #2730515

            I proposed the same math problem to Llama3.1:70b. Here is its solution:

            llama3.1:70b
            “What a delightful problem!

            Let’s break it down step by step:

            Friday: Oliver picks 44 kiwis.
            Saturday: He adds 58 more kiwis to his collection.

            So far, he has:
            44 (Friday) + 58 (Saturday) = 102 kiwis

            Now, on Sunday, he picks double the number of kiwis he did on Friday. Since he picked 44 kiwis on Friday, he’ll pick twice that amount on Sunday, which is:
            2 x 44 = 88 kiwis

            However, we’re told that 5 of these kiwis are a bit smaller than average. This doesn’t affect the total count, though – they still count as kiwis!

            So, Oliver now has:
            102 (kiwis picked on Friday and Saturday) + 88 (kiwis picked on Sunday) = 190 kiwis

            There you have it! Oliver has a grand total of 190 kiwis.”

            I find this quite interesting, particularly, “However, we’re told that 5 of these kiwis are a bit smaller than average. This doesn’t affect the total count, though – they still count as kiwis!”

            I also put the same question to Copilot online, and got this response:

            “Let’s break it down step-by-step:

            Friday: Oliver picks 44 kiwis.

            Saturday: Oliver picks 58 kiwis.

            Sunday: He picks double the number of kiwis he did on Friday, which is 2×44=882 \times 44 = 88 kiwis. But out of these, 5 were a bit smaller than average, which doesn’t change the total number.

            Now, we add up the total number of kiwis:
            44(Friday)+58(Saturday)+88(Sunday)=19044 (Friday) + 58 (Saturday) + 88 (Sunday) = 190

            So, Oliver has 190 kiwis in total. That’s quite the haul!”

            Copilot also got the corrrect answer, but used some unorthodox context/grammar in its answer.

            I would say that not all LLM’s can be lumped together in any particular category. and/or math word problems aren’t a particularly daunting task in all cases.  My smaller model (8 billion parameters) made an assumption, but my larger model (70 billion parameters) made no such assumption, and even clarified that smaller kiwis are still kiwis, as did Copilot.

            Always create a fresh drive image before making system changes/Windows updates; you may need to start over!
            We all have our own reasons for doing the things that we do with our systems; we don't need anyone's approval, and we don't all have to do the same things.
            We were all once "Average Users".

            1 user thanked author for this post.
    • #2730002

      Thanks for this great article, and not just because it confirms my preconceptions!

      A thought that occurs to me: I wonder what differences there are in LLMs trained using languages other than English.  There’s substantially less material on the Internet written in French or Spanish than in English (and still less in Dutch or Finnish), but perhaps the structure of the language will change the way these systems respond to our questions.  What about Chinese?!

      1 user thanked author for this post.
      • #2730008

        You’ve just asked my favorite computational linguistics question, the question nobody dares to ask.

        Obviously there is much less training material in languages other than English.  But there’s also another problem.

        LLMs treat language as consisting of discrete words in fixed order.  That works well for English and should work well for Chinese, because words have only a few forms in these languages, and the order is fixed.  In Chinese, each word has only 1 form.  In English, a verb can have 5 forms but usually has only 4 (base, -s, -ed, -ing), and nouns have 2 forms (singular and plural).  In French a verb has dozens of forms.  In Russian the nouns also have multiple forms, and the word order is appreciably more variable because noun suffixes, rather than word order, indicate subject and object.  So even if you had the same amount of training material in Russian as in English, it would be less effective because there’s more for the LLM to model.

        Thank you for your interest!

        6 users thanked author for this post.
    • #2730076

      I dual boot, and I have an LLM on each side running on WSL, both are conversational Llama LLM’s.  On one side, I have Llama3.1 8B, with 8 billion parameters, and on the other Llama 3.1 70B, with 70 billion parameters.  I don’t have a GPU, so these run on CPU only (Intel i5-12600K, not overclocked).

      Both use ~60% CPU with bumps to 70% CPU (idle with only Firefox open is 2-4%).  I have 64GB RAM.  The 8B model uses ~40% pretty consistently.  The 70B model uses a bit over 90% (idle with only Firefox open is 16%).  I use Open WebUI to converse, so Firefox is running.

      The models’ footprint grows with use.  The 8B started (with WSL/Ubuntu 24.04 overhead) at 10GB, and is now at 21GB.  The 70B started at 100GB, and is now at 110GB.  I’ve asked both models to optimize their caching and help control their growth.  Once I got Ubuntu and Docker installed, I had the models help me with the command line instructions to install Open WebUI.  They’re very good with that.

      Both work (scrub?) for a bit after a prompt from me before replying.  The 70B model takes quite a while, and its responses are typed at a rate slower than my normal typing speed.  The 8B model types much faster, and with long responses, I can’t keep up and have to wait for it to finish, then scroll down to pick back up where I lost it.

      They are both good conversationalists.  I have asked each model to critique some writing, and in their critiques, both have made minor contextual errors, but not much.  I started with the 8B model, just following my curiosity.  It ran successfully enough that I wanted to see if my no GPU hardware could handle the larger 70B.  It can, just a good bit slower.  But while it’s ‘pondering’ my prompt, I can browse the internet while I’m waiting.

      I find AI interesting and a bit useful, but not at all a necessity.

       

      Always create a fresh drive image before making system changes/Windows updates; you may need to start over!
      We all have our own reasons for doing the things that we do with our systems; we don't need anyone's approval, and we don't all have to do the same things.
      We were all once "Average Users".

      5 users thanked author for this post.
      • #2731431

        of course I have no idea what you are talking about. maybe so commentary would help

        🍻

        Just because you don't know where you are going doesn't mean any road will get you there.
        • #2731485

          The Llama LLM’s are coded in Linux, and can run in Windows Subsystem for Linux, WSL (my installation particulars are here).  In order to get more direct access to the LLM, I install Ubuntu 24.04 in WSL, then Docker in Ubuntu, basically a container manager, then install Olama, a Linux program that can run Llama, in a Docker container, then use Olama to pull a Llama model into the container.

          Once all that is set up, I install Open WebUI, another Linux program, in another Docker container, and I can converse with my LLM via Open WebUI in Firefox using a Localhost port.  From then on, it’s just like posting right here in AskWoody and getting replies to my posts.  In other words, a chatbot, only it’s completely local, running on a VHD and cut off from the internet.

          Open WebUI keeps track of my conversations with the LLM, so I can go back to a previous conversation which allows the LLM to call up the particulars of that conversation through its algorithms and cache mechanisms.

          The larger LLM presented a good semblance of reasoning; “However, we’re told that 5 of these kiwis are a bit smaller than average. This doesn’t affect the total count, though – they still count as kiwis!”

          Was that more than you wanted to know, @wavy?

          Always create a fresh drive image before making system changes/Windows updates; you may need to start over!
          We all have our own reasons for doing the things that we do with our systems; we don't need anyone's approval, and we don't all have to do the same things.
          We were all once "Average Users".

          1 user thanked author for this post.
    • #2730280

      “Machine learning is inherently approximate and always will be.”

      Likewise, human thinking and comprehension.

      1 user thanked author for this post.
      • #2730303

        So when you say 2 + 2 = 4 you are only being approximate?

        Human reasoning is limited.  But humans can perform explicit symbolic reasoning (such as arithmetic or highly explicit logic) that is not approximate.  It is not the easiest kind of thinking, but it is something we do.

        6 users thanked author for this post.
    • #2730503

      So when you say 2 + 2 = 4 you are only being approximate?

      Human reasoning is limited.  But humans can perform explicit symbolic reasoning (such as arithmetic or highly explicit logic) that is not approximate.  It is not the easiest kind of thinking, but it is something we do.

      The example you offer is calculation, not reasoning, thinking nor comprehension.  Machines can already do that. It’s not relevant to my comment.

      I wasn’t disagreeing with you, but simply pointing out that there is nothing precise about human thinking.  If it was, then perhaps machines WOULD eventually learn to “reason”.  That’s what’s so magical and mysterious about human thought.

      I hope and trust that your premise that machines will never be capable of reason as we know it.  But that doesn’t mean that they will not gain capabilities that we cannot imagine at this time, just as they already have– capabilities that could be just as problematic or dangerous.

      Thanks for the article and setting the record straight.

       

      2 users thanked author for this post.
    • #2731477

      Well yes, there has been plenty written on the limitations of LLM’s and how they process data.

      The question I’m interested in is what is the next stage in the development of AI?  What does it entail?  How will it be different than LLM’s?  Researchers are surely working on such technologies.  How do we find out more about this?  Anyone have any links on this?

      Second, we are in the very early stages of developing AI but too many people seem to think that we are much further along that is realistic.  We are at a Model T level technology with a long way to go until we get to automotive technology of the 1970’s, let alone 2020’s.

      I think we first need to understand how our brains work better if we want to take a stab at making “intelligent” and perhaps even conscious AI’s in the same way our 3lb brains are.

      I find it amazing that with hundreds of years of brain research, billions upon billions of dollars spent in this area and all the highly technical tools we have today, we STILL don’t know how data is stored or retrieved in the brain, cannot read such data like a computer memory and have no idea what consciousness is or where it comes from (the sperm, the egg, the union of the two?).

    Viewing 7 reply threads
    Reply To: LLMs can’t reason

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: