Search

20 results for "llm"

ai · arxiv/cs.AI · 8 min

Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems

A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.

May 2, 2026 Read →
ai · arxiv/cs.AI · 8 min

Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks

How wording changes in evaluation criteria and metric selection alter model rankings on financial text benchmarks, requiring governance over gold-label assumptions.

May 2, 2026 Read →
ai · arxiv/cs.AI · 3 min

Multi-agent framework automates recommendation system tuning

AgenticRecTune uses specialized LLM agents to optimize configuration across pre-ranking, ranking, and re-ranking pipelines without manual tuning.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

May 1, 2026 Read →
ai · arxiv/cs.AI · 8 min

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.

May 1, 2026 Read →
ai · hackernoon · 2 min

HackerNoon's April 2026 Digest: AI Costs, Data Pipelines, and Local Models

A structured pass through HackerNoon's April 29 roundup, surfacing the signal on AI tooling costs, data sourcing, and LLM deployment tradeoffs.

Apr 30, 2026 Read →
ai · arxiv/cs.AI · 4 min

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

A system that recasts claim verification as semantic queries, reducing LLM costs by 3.2x while maintaining accuracy on aggregated data.

Apr 30, 2026 Read →
ai · arxiv/cs.LG · 4 min

Efficient Rationale Retrieval via Student-Teacher Distillation

Rabtriever reduces computational cost of LLM-based document ranking by distilling cross-encoder knowledge into independent query-document encoders.

Apr 28, 2026 Read →
ai · arxiv/cs.AI · 8 min

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.

Apr 27, 2026 Read →
ai · arxiv/cs.LG · 4 min

LLMs use hidden confidence signals to detect and fix their own errors

Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.

Apr 27, 2026 Read →
ai · arxiv/cs.AI · 8 min

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Apr 25, 2026 Read →
ai · arxiv/cs.AI · 6 min

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.

Apr 24, 2026 Read →
engineering · arxiv/cs.AI · 8 min

Automated SysML generation bridges text to engineering models

Hendricks and Cicirello propose a five-step pipeline using NLP and LLMs to convert unstructured documents into SysML diagrams and executable dynamical system models.

Apr 23, 2026 Read →
ai · arxiv/cs.AI · 8 min

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Continual training on low-quality social media text causes measurable cognitive decline in language models, with reasoning and safety capabilities dropping significantly.

Apr 23, 2026 Read →
ai · arxiv/cs.LG · 8 min

Simpler Optimizers Make LLM Unlearning More Robust

Research shows that using lower-order optimization methods during LLM unlearning produces forgetting that resists post-training attacks better than sophisticated gradient-based approaches.

Apr 21, 2026 Read →
ai · arxiv/cs.LG · 4 min

LLMs complement but don't replace classical hyperparameter optimization

A study comparing LLM agents to classical algorithms like CMA-ES and TPE finds hybrid approaches work best for tuning model hyperparameters under compute constraints.

Apr 21, 2026 Read →
ai · arxiv/cs.LG · 6 min

Automating Dataset Creation with LLMs and Search Engines

Researchers propose ADC, a method to build large labeled datasets automatically using language models and web search, reducing manual annotation work and cost.

Apr 21, 2026 Read →
engineering · arxiv/cs.LG · 4 min

Kernel-Level LLM Safety via Logit Inspection

ProbeLogits reads token probabilities before generation to enforce safety policies at the OS level, achieving parity with learned classifiers at 2.5x speed.

Apr 21, 2026 Read →
ai · arxiv/cs.AI · 4 min

Interpretable Traces Don't Guarantee Better LLM Reasoning

Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.

Apr 20, 2026 Read →
ai · arxiv/cs.AI · 5 min

LLMs Can Infer Unspoken Intent in Collaborative Tasks

Researchers tested whether large language models can interpret incomplete instructions by reasoning about a human partner's mental state, matching human performance.

Apr 20, 2026 Read →

Search

Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems

Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks

Multi-agent framework automates recommendation system tuning

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

LLMs Need Feedback Loops to Keep Code and Theory Aligned

HackerNoon's April 2026 Digest: AI Costs, Data Pipelines, and Local Models

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

Efficient Rationale Retrieval via Student-Teacher Distillation

Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data

LLMs use hidden confidence signals to detect and fix their own errors

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Automated SysML generation bridges text to engineering models

Junk Data Degrades LLM Reasoning; Twitter Study Shows Lasting Harm

Simpler Optimizers Make LLM Unlearning More Robust

LLMs complement but don't replace classical hyperparameter optimization

Automating Dataset Creation with LLMs and Search Engines

Kernel-Level LLM Safety via Logit Inspection

Interpretable Traces Don't Guarantee Better LLM Reasoning

LLMs Can Infer Unspoken Intent in Collaborative Tasks