AI and Riassunti: Nuances, Limits and Future of Increased Intelligence

AI: Complex Synthesis, Limits and Future

In the current technological landscape, Artificial Intelligence (AI) has established itself as a transformative force, promising to revolutionize every aspect of our professional and personal life. Among its many applications, the ability to quickly sum up long and complex documents has captured the imagination of companies and users, offering the perspective of an unprecedented information management. The idea of delegating to an algorithm the task of distilling text mountains in concise and usable synthesis is undeniably tempting, promising a saving of time and significant resources. However, as often happens with emerging technologies, the reality of their impact and capabilities can be more complex and nurtured than they can initially perceive. Recent studies and field trials are beginning to unveil the profound challenges AI has yet to face, especially when it comes to tasks requiring deep understanding, critical analysis and the ability to grasp the subtlest nuances of human language. Despite enthusiasm and promises, it has become evident that AI, in its present form, is not always up to expectations when the context is complex, meaning is implicit or current accuracy is crucial. This article aims to explore these challenges in depth, analyzing the reasons why AI encounters difficulties in the synthesis of complex content, how it compares with human capabilities in this area and what are the ways for the future, between the evolution of models and the art of engineering of prompts, to make the most of the potential of increased intelligence.

Beyond the Gist: Why Fatigue Artificial Intelligence with Nuance and Complex Context

The experiment conducted by the Australian Securities and Investments Commission (ASIC) highlighted one of the biggest gaps in the current large language models (LLM) in the generation of summaries: their limited ability to analyze and synthesize complex content requiring a deep understanding of context, subtle nuances or implied meaning. This is not an isolated problem in Llama2-70B, the model used in the study, but an intrinsic challenge to the very nature of how LLMs are built and operate. They are essentially text predictive machines, excellent in recognizing and reproducing language patterns based on vast amounts of training data. This skill results in an impressive grammatical fluidity and consistency, but not necessarily in a real “understanding” of the world or the intentions underlying the text. An LLM can identify key words and relevant phrases, but it struggles to interpret the relative weight of such elements, to discern veiled criticisms, implicit recommendations or underlying concerns that an experienced human reader of the domain would instantly grasp. The nuance often resides not so much in what is said explicitly, as in as it is said, in the tone, in the position of a sentence or in the choice of a particular synonym, elements that LLM struggle to weigh outside a statistical framework. For example, a constructive criticism formulated with extreme caution could be statistically less salient than a direct statement, but its importance in the context of a parliamentary investigation could be much greater. In addition, models tend to excel in the *extractive* synthesis, i.e. in identifying and grouping key phrases from the original text, rather than in the *stractive* synthesis, which requires conceptual reformulation and the creation of new sentences that capture the essential meaning without reproposing the original language. The latter is a cognitively more challenging task requiring reasoning, inference and an abstraction capacity that goes beyond the simple recognition of linguistic patterns. The ability of a human to read between the lines, to connect scattered information and to reconstruct a broader meaning based on his knowledge of the world and of the specific domain remains an unsurpassed force point, making AI reassunts often “prolied and useless, limiting itself to repeat what was in the presentation”, as observed by the same evaluators of the ASIC. This gap is not only a matter of efficiency, but of effectiveness and reliability, crucial in contexts where the stake is high.

The Anatomy of Limits of AI: Hallucinations, Irrilevance and the Affidability of the Facts

The observations of the ASIC study that found the presence of incorrect information, lack of relevant details or evidence of irrelevant factsIn addition to the so-called hallucinations, they paint a clear picture of the challenges related to the reliability of the content generated by AI. The hallucinations, in particular, represent one of the most insidious problems of LLM: the model generates grammatically correct and plausible text, but inaccurate or completely invented. This phenomenon comes from the probability nature of LLM, which, in an attempt to predict the sequence of words most likely, can deviate from the current reality when they do not have a concrete knowledge or when the training data is ambiguous or insufficient. We imagine a model that, in summarizing a technical document, invents a parameter or an experimental result because statistically "suits" to the linguistic context, although not present in the original text. For an organization like ASIC, which manages audit documents and consulting with significant legal and financial implications, the inclusion of incorrect information can have disastrous consequences, undermining trust and leading to decisions based on false data. Similarly, the difficulty of AI in distinguishing between relevant and irrelevant information emerges from its inability to understand the *finality* of the summary in a profound human sense. While a prompt may specify to highlight ASIC references or recommendations, the model may not seize the *because* such information is important, dealing with it at the same level as other less critical mentions. This leads to statements that, while containing the keywords required, lack a conceptual hierarchy that only a human with a clear understanding of the objectives can impose. The result is a summary that can be overloaded with secondary details or, worse, omitting crucial insights that, although not explicitly “taken” in the prompt, are fundamental for an informed evaluation. The need for “fact check outputs” or the finding that “the original source material actually presented information better” not only cancel the alleged benefits in terms of time, but increase the workload, transforming the AI from help to an obstacle, as it requires an even more careful and expensive human review, focused not only on validation but on correction and integration, which makes the entire process longer than the manual summary from the beginning.

The Human Touch: Transversal Skills, Critical Thought and the Unreplaceable Value of Experience

The comparison between AI and human reassurances in the ASIC study unequivocally highlighted the superiority of the human approach, with an average score of 12.2 versus 7 on a scale of 15 points. This gap is not random, but it is rooted in the unique cognitive abilities of the human being, which go far beyond mere linguistic elaboration. A human auditor, especially an expert in the field as an ASIC employee, brings to the task of summarizing a baggage of transversal skills and a level of contextual understanding that an LLM cannot replicate. First, there is knowledge of domain: a professional understands the legal, economic and regulatory implications of the information contained in the presentations. It is not limited to identifying a mention of the ASIC, but evaluates its context, tone (critical, purposeful, descriptive) and potential impact, distinguishing between a generic reference and a specific recommendation that requires attention. This expertise allows you to filter noise and focus on critical elements for the purpose of the summary. Then there is the critical thinking and the ability to inference. A human can read between the lines, identify implicit arguments, detect intentional bias or omissions, and even anticipate the questions a reader might ask. For example, if a company has an overly optimistic report, a human expert could notice it and insert a note of caution in the summary, a capacity that an LLM, without critical judgment, would hardly develop. In addition synthesis capacity human is a creative process. It is not only a question of extracting sentences, but of renovating ideas, of reshaping complex concepts in simpler and more accessible terms, and of creating a coherent and logical narrative that serves the specific purpose of the summary. This includes the ability to adapt style and level of detail according to the auditorium (for example, a summary for a manager will be different from one for a technician). Finally, there is reliability assessment of source and information. A human can cross information with his experience and prior knowledge, or identify potential conflicts of interest, elements that directly affect the validity of the content and that an LLM is not equipped to manage independently. All these capabilities give human reassurances a depth, relevance and completeness that algorithms still struggle to match, making them irreplaceable for tasks with high complexity and responsibility.

The Evolution of Language Models: A Quality Salt Beyond Llama2-70B

It is essential to recognize that LLM technology is in constant and rapid evolution, and the limitations observed in the ASIC study, which used Llama2-70B in January-February 2024, may not reflect the capabilities of current cutting-edge models. The AI sector moves at a dizzying speed, and a model considered “state-of-the-art” six months ago could already be exceeded. In fact, the report mentions that Llama2-70B has been “supervised by larger models such as ChatGPT-4o, Claude 3.5 Sonnet and Llama3.1-405B, which achieve better results in many generalized quality assessments.” These new models are not only an increase in parameters (such as Llama3.1-405B, a colossus with 405 billion parameters, an order of magnitude greater than Llama2-70B), but also significant architectural and methodological improvements. One of the most important progress isextension of context windows. The context window refers to the amount of text that the model can “see” and process simultaneously. Llama2-70B had a limited context window, which makes it difficult for the model to maintain consistency on very long documents and identify specific references or shades that are very far away in the text. The most recent models, such as Claude 3.5 Sonnet or GPT-4o, boast context windows that extend for hundreds of thousands of tokens, allowing them to process entire presentations or books in a single pass, drastically improving the ability to “find references in larger documents”, as noted by the authors of the study. This not only reduces the risk of losing relevant information, but also allows a more holistic understanding of interconnections between the different sections of the document. In addition, the latest models have improved their ability to reason, often incorporated through training techniques that encourage the model to “think” step by step (e.g. Chain-of-Thought prompting) or explore different reasoning paths. Even the multimodal capacity, like those of GPT-4o, which integrates text, images and audio, are opening new frontiers, allowing to sum up content that includes charts, tables or other visual information, increasing the wealth and accuracy of summaries. These advances suggest that if the ASIC study was replicated today with top models, the results would probably be very different, not only highlighting the need to consider updated models, but also investing time in optimization and prompt engineering to fully exploit its potential.

The Art of the Engineer of Prompt: Extraordinary Results Guide

If LLM hardware is the engine, prompt engineering is the steering wheel that drives the output to the desired destination. The ASIC study pointed out that “an adequate engineering of prompts, i.e. the careful creation of the questions and tasks presented to the model, is crucial for optimal results.” This point has become a mantra in the field of conversational and generative AI, since the output quality of an LLM is directly proportional to the clarity, precision and completeness of the input prompt. It is no longer a simple question, but to articulate detailed instructions that guide the model to perform a specific task with maximum accuracy and relevance. Prompt engineering techniques evolved rapidly, turning almost into a standing discipline. One of the fundamental techniques is Few-Shot Prompting, where some complete input-output examples are provided for the model to teach him the desired style, format or type of reasoning. This is particularly effective for summaries, showing AI how “good” summaries should appear compared to “bad” for that particular context. Another crucial technique is Chain-of-Thought (CoT) Prompting, which encourages the model to express its process of reasoning step by step before providing the final answer. For synthesis, this means asking the model to identify the key points, then to evaluate the importance, then to connect them and finally to generate the summary. This approach increases not only accuracy but also transparency, allowing users to understand how the model came to a certain conclusion. The Role-Playing or Person Prompting is another powerful tool: you ask the model to hire the person of an expert, for example, “You act as a financial analyst of the ASIC and summarize this document by highlighting the risks of compliance and recommendations”. This channels the model towards a specific focus and tone, replicating, in part, the knowledge of human domain. Finally, the use of negative constraints (e.g. “Do not include information about X”) and iterations of feedback ( refinements) are essential to perfect output. Prompt engineering is therefore not a single act, but an iterative process of experimentation, evaluation and optimization. It requires a deep understanding of both the capabilities of the model and the specific needs of the task, transforming the user from a simple AI consumer to a strategic co-creator of the desired output, fundamental to overcome the limitations of generic and nuances-free summaries.

Implementation of AI for Synthesis in Enterprise Environments: Challenges, Best Practices and Integration Strategies

Integration of AI for synthesis in an enterprise environment, such as that of a government agency or a large company, presents a complex set of challenges ranging beyond the simple choice of the right model or mastering of prompt engineering. To move from a “proof-of-concept” to a scalable and reliable solution, organizations must adopt a holistic approach. One of the most significant challenges is validation and quality control of outputs. As demonstrated by ASIC, even with well-engineered prompts, AI summaries may contain fat errors or lose crucial nuances. This requires the need to implement solid human-in-the-Loop (HITL)* workflows where AI outputs are systematically revised and corrected by human experts before being used. This does not cancel the value of AI, but transforms it into a powerful pre-processing tool that accelerates human work rather than replace it entirely. Another critical concern is the data security and privacy. Feeding internal, often sensitive or confidential documents, LLM hosted on public clouds raises regulatory compliance issues (such as GDPR, CCPA) and exposure risk. Companies must explore solutions such as models hosted in private environments (on-premises or virtual private clouds), the *tokenization* of sensitive data, or the use of *fine-tuned* models on their data but managed with strict security policies. The scalability and cost management are other practical considerations. The generation of summarises for thousands or millions of documents can quickly become expensive in terms of computational resources and API costs, especially with very large models. Organizations need to balance precision needs with economic sustainability, choosing models of size suitable for the task and optimizing the use of APIs. It is essential to identify specific use cases where AI for synthesis can offer maximum value. This could include the first draft of non-critical document summaries, the extraction of specific information from large archives, the automatic categorization of customer feedback or the preparation of preliminary synthesis for legal analysis. The implementation must be accompanied by a robust strategy of management of change and staff training. Employees must be educated on AI capabilities and limits, on how to effectively interact with models (prompt engineering) and how to integrate these tools into their existing workflows. Finally, the ethical and legal implications the use of content generated by AI, especially in regulated sectors, require attention. Who is responsible if an AI summary leads to a legal or financial error? Business policies must address these questions, establishing clear guidelines for the allocation of responsibility and verification of outputs. AI for synthesis is a powerful ally, but only if implemented with careful planning, secure infrastructure and a weighted integration in the existing organizational context.

The Future of Cognitive Collaboration: Towards Increased Intelligence and Hybrid Models

The experience of ASIC, far from being a point of arrival, marks a crucial stage in the path towards a more mature and conscious adoption of AI. The clear message is that the goal is not the complete replacement of human cognitive abilities, but rather their *increase*. We are entering the era ofIncreased Intelligence, where AI acts as an intelligent co-pilot, enhancing human capacity rather than supplanting them. We imagine a future in which a professional does not start from scratch to sum up a complex document, but receives a preliminary draft generated by the AI, with the already highlighted key points and the most relevant sections noted with references to the pages. The task of the human therefore moves from the laborious extraction and initial formulation to a role of *critical reviewer, act validator and refiner of nuances*. This hybrid approach leverages the speed and ability to process AI data to manage repetitive and high-volume activities, freeing humans to focus on high-level analysis, strategic thinking, ethical judgment and decisions that require a deep understanding of the cultural and organizational context. Hybrid models* are another fundamental aspect of this future. These systems could combine the statistical power of LLM with more traditional approaches based on rules or *graphics of knowledge (knowledge graphs)*. These graphs allow you to incorporate verified facts and domain-specific semantic relationships, offering a solid ground to anchor LLM outputs and reduce hallucinations. Imagine an LLM that generates a summary, but then a rules-based system validates it by crossing facts with a certified corporate knowledge database, reporting discrepancies. This not only improves accuracy but also increasesinterpretation and explainability of the AI, allowing to understand *because* some information has been included or excluded. In additioncontinuous learning and customization they'll be key. Models can be constantly fine-tuned with user feedback and specific business data (managed with strict security measures), adapting their synthesis capabilities to the changing needs of the organization and individuals. The creation of customised “recapture agents”, trained on style preferences and goals of individual teams or departments, could lead to a level of accuracy and relevance today unimaginable. In this vision, AI is not a panacea that solves all the problems of synthesis, but a sophisticated tool that, in the hands of human experts, amplifies their efficiency and their ability to produce high quality results in record times, promoting an era of true cognitive collaboration.

Conclusions: Balance Potential and Prudence in the Age of AI

The detailed analysis of the challenges posed by AI in the synthesis of complex content, highlighted by the rigorous study of ASIC, offers us a crucial perspective on the current and future panorama of Artificial Intelligence. In spite of alluring promises and rapid technological advancement, it is clear that AI is not yet an infallible substitute for human capacity to understand, interpret and synthesize information that requires a profound mastery of context, nuances and implied meaning. Hallucinations, difficulty in discerning the relevance and inability to apply a true critical thought remain significant obstacles, especially in contexts where precision and reliability are of primary importance. However, it would be myopic to ignore the exponential progress that AI is making. The evolution of language models, with expanded context windows, improved reasoning capabilities and the emergence of multimodal architectures, promises to overcome many of the limitations observed only a few months ago. At the same time, the refinement of prompt engineering is affirming itself as an indispensable competence, transforming mere interaction with AI into a true art that guides the model towards increasingly accurate and relevant outputs. The future of AI in synthesis, and more generally in cognitive automation, does not reside in a complete alternative to the human brain, but in a synergistic collaboration between man and machine. Organizations will have to adopt a strategic and measured approach, implementing Human-in-the-Loop systems, establishing rigorous validation frameworks and investing in personnel training. AI will excel in managing volumes, extracting raw data and providing initial drafts, freeing human beings for the irreplaceable role of critical auditors, strategic analysts and final decision makers. Ultimately, the ASIC study reminded us that while AI continues to evolve at surprising speeds, its adoption must be guided not only by the enthusiasm for what it can do, but also by a deep understanding of its intrinsic limits. Only by balancing the unlimited potential of AI with a prudent awareness of human capacities, we can forge a future in which technology not only automates, but *increases* collective intelligence, leading to more efficient, accurate and deeply significant results. The path is still long, but the direction is clear: towards an increased intelligence that enhances the best of both worlds.

EnglishenEnglishEnglish