Sitemap

When AI meets Product: April’25 AI Product Updates

Keeping up to date with new AI models, products, ethics, and trends

Anna Via
11 min readMay 4, 2025

Welcome to the April edition of “When AI Meets Product — AI Product Updates”. April brought another busy wave of developments in the world of AI, from new foundational model launches to emerging applications, and important progress around ethics, regulation, and enterprise adoption. In this post you’ll find:

  • Latest updates from GenAI model providers— We saw major launches and announcements from leading model providers (Google, OpenAI, Meta….) but also interesting new features and narrow use of LLMs from the same providers.
  • Interesting new AI applications (H&M digital twins for models, accent conversion and automatic translations, AI-driven search and discovery, and more!) and trends to develop GenAI products with quality (from UX patterns to evals & guardrails).
  • Updates on the impact of AI to ethics and legislation— a paper exploring the complex relationship between AI Act (legal point of view) and responsible AI (technical point of view), the EU’s Continent Action Plan, and Meta announcing they will start training models on EU’s data.
  • AI Enterprise, how to get ready— Covering some great resources and insights on how companies are getting ready internally for the AI revolution (leading by example, sharing prompts and use cases across teams, while balancing with rigorous evaluation and governance).

Let’s get started! 🚀

🧠 Model providers updates

April has been another crazy month in terms of new foundational model versions by the main model providers. At the same time, the same companies are racing to build more narrow products mainly leaning into personalization and agentic behavior.

Model providers — new model versions

  • OpenAI launched the new GPT 4.1 series, which includes GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano. They outperform GPT‑4o and GPT‑4o mini, with significant gains in coding, instruction following, and longer context window. Interestingly, GPT 4.5 is being deprecated in favour of 4.1 — mainly due to its high cost and size. They also released two more models: o3 and o4-mini, defined as the “smartest” models in the o series, which leverages agentic reasoning to solve more complex tasks. One outstanding capability of these new models is the integration for the first time of images in the chain of thought reasoning process.
  • Google released Gemini 2.5, claimed as their “most intelligent AI model”, optimized for enhanced reasoning and coding and based on techniques like reinforcement learning and chain-of-thought prompting.
  • Meta introduced the Llama 4 family, fully embracing native multimodality. It includes Llama 4 Behemont (2T parameters, great for distillation), Llama 4 Maverick (400B parameters, large context window, optimized for real multimodal use cases), and Llama 4 Scout (109B, Lighter-weight, tuned for inference speed and performance).
  • On a more specialized type of models, IBM launched Granite 3.3, a speech to text model with a longer time window compared to OpenAI’s whisper.

It’s a great moment to revisit model comparison arenas and benchmark tests and see what all this means in the race to build of the most capable foundational models. It seems Google’s Gemini 2.5 took a significant advantage on the board, followed by OpenAI’s o3 and 4o. On a more academic note, a recent paper co-authored by leading researchers from both industry and academia delves deep into the progress and challenges around foundational agents — the models designed to autonomously reason and act.

https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard

On the topic of GenAI capabilities, there’s also been a lot of noise around AGI vs non-AGI debates. A recent paper claims that GenAI has now passed the Turing test — meaning that, in a controlled setting, human participants couldn’t tell whether they were speaking to another person or to a machine (using Eliza, GPT-4o, Llama 3.1, and GPT-4.5). Given how impressive these models are at natural conversation, this result isn’t particularly surprising. What’s more interesting is the growing discussion about whether passing the Turing test — or even achieving AGI — actually matters. As Ethan Mollick points out in his last post, AGI is a badly defined and hotly debated term: everyone agrees it relates to AI performing human-level tasks, but there’s no consensus on whether this means matching expert or average human performance, or how many tasks, or of what kind. In practice, we’re seeing AIs today that can outperform human experts in some areas and completely fail at simple tasks. Even if we already had AGI today, it would still take years to fully understand it — and even longer to successfully integrate it into our human world.

Model providers — features and applications

  • Microsoft copilot is going big with the goal to evolve to an AI Companion. This is translated specially into personalization features (by adding memory capabilities, and more knowledge about each user and context to leverage as input for each iteration). Recall (yes, that demo from last year where your laptop captures and indexes everything on screen for future search) is back after some privacy concerts delayed the launch. Other improvements include deep research, integration with pages, a shopping assistant…
  • OpenAI is releasing new features for developers through Agents SDK, to help build agentic AI apps in a “lightweight, easy-to-use package with very few abstractions”. They are also expanding ChatGPT’s memory features both by enabling referencing saved memories (e.g. “remember I am a vegetarian”), and by referencing your own chat history.
  • Google is expanding Code Assist with agents, so they can take multiple steps to accomplish complex programming tasks.
  • Anthropic is joining the trend with its own “deep research” tool, which in this case also includes an integration to Google Docs. bringing Claude into Google Docs to support in-context research and writing.

Model providers — other news

Anthropic has kept publishing great papers full of relevant insights:

  • Their paper Tracing the Thoughts of a Large Language Model offers some insights into how Claude thinks — revealing that it sometimes reasons in a shared conceptual space across languages, plans many words ahead, and often optimizes to agree with users more than strictly follow logic.
  • Their latest Economic Index shows heavy adoption in coding, education, science, and healthcare. When considering extended reasoning capabilities, it is also the computer science field that seems to be taking more advantage of it. Finally there is a distinction between tasks that seem to be able to be done directly (pointing to a full automation) such as translations, vs a lot of required iteration (pointing to augmentation) such as th use editors and copywriters make.

🔥 AI Products, applications & agents

Beyond the foundational models themselves, April has also been a month of remarkable innovation (and a few surprises) in the applications and products built on top of them. Here’s a snapshot of some of the most interesting developments across industries.

Relevant new AI Products and use cases

Booking’s new AI features
  • The Chatbot Comeback — GenAI chatbots are booming, but building them at scale is far from trivial, demanding not just technical excellence, but also strong user-centric thinking and mature risk management. In another recommended talk Consumer-Facing GenAI Chatbots: Lessons in AI Design, Scaling & Brand Safety, several critical challenges are outlined: open-ended questions and complex user journeys, dealing with fast-evolving data from multiple sources, latency vs agentic reasoning, chat memory, costs, observability, guardrails and brand safety.

Building Better AI Products: UX, Evals, and Guardrails

AI is powerful — but designing user-friendly, trustworthy, empowering AI features is still a major challenge. In this post you’ll find a great overview on UX patterns for AI features to help solve challenges like AI perceived as a black box, users not knowing what to ask or how to correctly prompt, AI feeling passive and one-dimensional, need for human validation, and fear of disruption.

Evaluation (Evals) is another major focus area to ensure AI quality and reduce risk. A few relevant resources:

  • Arize’s definitive guide on LLM evals: Covers why evals matter, different techniques (e.g., ground truth comparisons, LLM-as-a-judge), and how to integrate evals across the AI project lifecycle (pre-production, CI/CD, post-production).
  • Toward an Evaluation Science for Generative AI Systems: A call to move beyond towards real-world and task-specific metrics (instead of benchmarking models and aiming to evaluate “general intelligence”), iteratively refining metrics, and establishing institutions (so that they can provide readily available evaluation tools, shared evaluation infrastructure, and standards, similarly to other fields such as FDA in health).
  • The batch on tips for evals: encouraging to start small but start (just with a few examples and a few dimensions, this is way better than nothing), as long as these initial metrics correlate with overall system performance, and allow to understand if a system or prompt version is worse or better than another one.
  • My own post introducing evals to help ensure AI solutions work as expected once deployed in production:

Guardrails are closely tied to evals — acting as real-time quality and safety validations. In the talk Adding Guardrails to Increase Reliability, you can find a great overview of how input and output guards work:

  • Input Guards: Detect PII, proprietary info, jailbreak attempts.
  • Output Guards: Catch hallucinations, NSFW content, sensitive topics.
  • Techniques include regular expressions, small fine tuned ML models (factuality, topic detection, name entity recognition…), secondary LLM evaluations (toxicity, tone of voice, coherence…), and hybrid approaches.

The Stanford’s 2025 AI Index Report insights include how AI model performance continues to break records on complex benchmarks, while they become more efficient and affordable (and still mainly built in the US, closely followed by China). The integration of these models into everyday life accelerates as business adoption is booming, with AI driving record investments and clear productivity gains. Responsible AI practices are expanding, but unevenly across regions, and governments are playing a bigger role through regulation and strategic funding.

⚖️ Ethics & Legislation

It’s complicated: The relationship of algorithmic fairness and non-discrimination regulations in the EU AI Act. This paper explores how efficient AI legislation demands deep interdisciplinary collaboration between law and computer science, but aligning these worlds is not an easy task. Fairness metrics, designed to uncover and correct unjustified treatment between groups in algorithms, often don’t map cleanly to traditional EU non-discrimination law or the broader Fundamental Rights approach that influences the AI Act. The rise of large language models (LLMs) complicates things even further, since their use cases aren’t always predefined, making it harder to predict or assess their impacts in advance. Although the AI Act addresses high-risk systems through data input requirements and output monitoring, questions around consistency and computational feasibility remain open.

On a broader legislative front, the European Commission introduced the AI Continent Action Plan, aiming to accelerate AI leadership. The plan focuses on building large-scale AI infrastructure, improving access to quality data, fostering algorithm development in strategic sectors, strengthening AI skills and talents, and simplifying regulation.

In parallel, Meta announced it will start using European data to train its AI models. This is a relevant move, especially considering how EU regulations like GDPR and the AI Act have historically slowed down tech deployments and use of data in the region. After months of preparation, Meta plans to use public adult data in the platform as well as interactions with Meta’s AI chatbot. While using personal data in AI training isn’t inherently bad (these model’s goals are to learn language and reasoning patterns, not storing explicit user data), LLMs’ complexity introduces risks such as data memorization or unfeasibility to satisfy right-to-be-forgotten requests.

🏢 AI in the Enterprise: How to Get Ready

In a great recent conversation hosted by McKinsey on Product Leadership in the Age of AGI, there were several key insights on how AI is reshaping organizations and how they should adapt to embrace the revolution:

  • AI’s enterprise impact can be grouped into three areas: replacement, augmentation, and automation. And how in all the setups AI can provide additional value.
  • Challenges like the cold-start problem in chatbots and AI apps, emphasizing the value of sharing real examples, prompts, and use cases across teams to accelerate adoption.
  • To truly embed AI, leaders must lead by example, normalize its use, and rethink team setups that were originally designed around older tools and workflows.
  • Rigorous evaluations remain critical — not just to assess AI performance, but also to safeguard around baselines like data quality, permissions, and tech readiness.

On a similar note, Shopify’s CEO recently circulated an internal memo urging a mindset shift across the company to fully embrace AI. New expectations include every employee using AI tools regularly, acceleration of prototyping with AI, sharing learnings and prompts among colleagues, and even using AI adoption as a factor in performance evaluations. Interestingly, employees are now also expected to prove that AI cannot do a task if requesting extra resources or headcount.

Much of this enterprise AI adoption will happen through the growing ecosystem of built-in copilots across enterprise tools. However, this article has some great points on how there is a balance that needs to happen: while CIOs and executives naturally want to push for high adoption to gain efficiency, these tools often involve deep integrations into organizational systems, raising major security concerns. The way forward involves a strong framework for governed access to data, continuous investment in data quality, and a shared responsibility model where employees are aware of risks as well as opportunities.

🛠️ Wrapping it up

That was it from “When AI Meets Product — AI Product Updates”. Another month, another wave of change with AI reshaping products, organizations, and society in real-time. Stay tuned — next month is sure to be just as exciting!

--

--

Anna Via
Anna Via

Written by Anna Via

Machine Learning Product Manager @ Adevinta | Board Member @ DataForGoodBcn

No responses yet