Bringing a Product Mindset to an Infrastructure Platform Team

Stéphane Di Cesare discusses DKB's experience introducing a product mindset within their platform team, explaining their definition of a platform team, the rationale behind the shift, and their journey including challenges, goals, and key learnings around user value, platform definition, maturity models, and effective communication strategies for senior software developers and engineering leaders. By Stéphane Di Cesare

Beyond Chatbots: Architecting Domain-Specific Generative AI for Operational Decision-Making

This article explores the use of domain-specific Generative AI, models that understand operational constraints, real-world dynamics, and business rules to generate executable strategies, not just text descriptions. These models require significantly smaller datasets and fewer parameters, making them cost-effective while enabling AI-driven core business decision intelligence at scale. By Abhishek Goswami

OpenSearch Cluster Topologies for Cost Saving Autoscaling

Amitai Stern discusses cost-saving autoscaling topologies for OpenSearch. He explains the inherent challenges in autoscaling unstructured data systems like OpenSearch and Elasticsearch, using analogies to illustrate the complexities beyond simply adding nodes. He shares architectural patterns (burst indexes, burst clusters) to optimize resource utilization and handle fluctuating loads effectively. By Amitai Stern

Navigating LLM Deployment: Tips, Tricks, and Techniques

Meryem Arik shares best practices for self-hosting LLMs in corporate environments, highlighting the importance of cost efficiency and performance optimization. She details quantized models, batching, and workload optimizations to improve LLM serving. Insights cover model selection and infrastructure consolidation, emphasizing the differences between enterprise and large-scale AI lab deployments. By Meryem Arik

DiRMA: Measuring How Your Organization Manages Chaos

Elevate your disaster recovery strategy with DiRMA—an innovative framework for assessing and enhancing Disaster Recovery Testing (DiRT) maturity across people, processes, and tools. As chaos engineering becomes essential for resilience, DiRMA guides organizations through structured improvement, addressing cultural hurdles and ensuring robust recovery readiness in the face of modern challenges. By Yury Niño Roa

Act One: From Chatbots to AI Agents

In the "Act One: From Chatbots to AI Agents" eMag we’ve curated a collection of articles that explore the exciting transition from the familiar realm of chatbots to the more dynamic and autonomous world of AI agents. The eMag offers both practical insights and forward-looking perspectives on the challenges and opportunities that lie ahead. By InfoQ

AI in the Age of Climate Change

Nischal HP discusses the critical role of data credibility in combating greenwashing and enabling effective climate action. He shares how technology can be used to create verifiable data on carbon sequestration, empowering farmers and corporations to participate in carbon markets. By Nischal HP

Building Efficient Mobile Streaming Apps

This article explores efficient preloading systems for mobile video streaming apps, balancing user experience with technical constraints. We will dive into practical implementation strategies that leverage network intelligence, buffer management techniques, AI-driven preloading, and real-world testing methodologies to enhance video delivery in mobile environments. By Ankit Awasthi

Understanding What Really Matters for Developer Productivity: A Conversation with Lizzie Matusov

In this podcast Michael Stiefel spoke with Lizzie Matusov about the dependency of effective, productive, and satisfied teams on good software architecture. Understanding this relationship requires understanding exactly what software productivity really is, how modern software engineering research has become more rigorous and practical, and how to apply that research to software development. By Lizzie Matusov

Checklist for Kubernetes in Production: Best Practices for SREs

This article provides SREs with a checklist for managing Kubernetes in production environments. It identifies common challenges including resource management, workload placement, high availability, health probes, storage, monitoring, and cost optimization. By implementing consistent GitOps automation across these areas, teams can significantly reduce complexity, and prevent downtime. By Utku Darilmaz

Subscribe to our newsletter

Subscribe to our newsletter and never miss new stories and promotions.