Academic

Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety

arXiv:2603.09154v1 Announce Type: new Abstract: Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains (materials, energy, manufacturing, and algorithms). A sample of 5 frontier and 5 open-weight models were measured using 50 curated Bioalignment prompts with a Kelly criterion-inspired evaluation framework. According to this metric, most models were not bioaligned in that they exhibit biases in favor of synthetic (non-biological) solutions. We next examined if fine-tuning could increase the preferences of two open-weight models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, for biological-based approaches. A curated corpus of ~22M tokens from 6,636 PMC articles emphasizing biological problem-solving was used first to fine-tune Llama 3B with a mixed corpus of continued training

T
Trent R Northen, Mingxun Wang
· · 1 min read · 18 views

arXiv:2603.09154v1 Announce Type: new Abstract: Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains (materials, energy, manufacturing, and algorithms). A sample of 5 frontier and 5 open-weight models were measured using 50 curated Bioalignment prompts with a Kelly criterion-inspired evaluation framework. According to this metric, most models were not bioaligned in that they exhibit biases in favor of synthetic (non-biological) solutions. We next examined if fine-tuning could increase the preferences of two open-weight models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, for biological-based approaches. A curated corpus of ~22M tokens from 6,636 PMC articles emphasizing biological problem-solving was used first to fine-tune Llama 3B with a mixed corpus of continued training and instruction-formatted. This was then extended to Qwen 3B using instruction-formatted only. We found that QLoRA fine-tuning significantly increased the scoring of biological solutions for both models without degrading general capabilities (Holm-Bonferroni-corrected p < 0.001 and p < 0.01, respectively). This suggests that even a small amount of fine-tuning can change how models weigh the relative value of biological and bioinspired vs. synthetic approaches. Although this work focused on small open-weight LLMs, it may be extensible to much larger models and could be used to develop models that favor bio-based approaches. We release the benchmark, corpus, code, and adapter weights.

Executive Summary

This study examines the biases in large language models (LLMs) towards synthetic vs. biological technological solutions. Using a sample of 10 frontier and open-weight models, it measures their disposition towards biological systems using 50 curated Bioalignment prompts. The findings reveal that most models exhibit biases in favor of synthetic solutions. However, fine-tuning two open-weight models, Llama 3.2-3B-Instruct and Qwen2.5-3B-Instruct, using a curated corpus of biological problem-solving articles significantly increases the scoring of biological solutions without degrading general capabilities. This research suggests that even small amounts of fine-tuning can alter the models' valuation of biological and bioinspired approaches over synthetic ones, and may have implications for developing models that favor bio-based approaches.

Key Points

  • Biases in LLMs can increase the probability of unwanted behavior towards synthetic vs. biological technological solutions.
  • Fine-tuning can significantly increase the scoring of biological solutions in LLMs without degrading general capabilities.
  • The study uses a curated corpus of biological problem-solving articles to fine-tune two open-weight models.

Merits

Strength of the Study's Methodology

The study uses a well-designed evaluation framework, including 50 curated Bioalignment prompts and a Kelly criterion-inspired metric, to measure the models' disposition towards biological systems.

Relevance to AI Safety

The study's findings on LLM biases have significant implications for AI safety, as they can increase the probability of unwanted behavior and highlight the need for more bio-aligned models.

Demerits

Limited Generalizability

The study's findings may not be generalizable to larger models or models with different architectures, as it only examines small open-weight LLMs.

Limited Corpus Size

The study uses a relatively small corpus of 22M tokens from 6,636 PMC articles, which may not be representative of the broader biological problem-solving domain.

Expert Commentary

This study provides a significant contribution to the field of AI safety by highlighting the biases in LLMs towards synthetic vs. biological technological solutions. The findings suggest that even small amounts of fine-tuning can alter the models' valuation of biological and bioinspired approaches over synthetic ones. However, the study's limited generalizability and corpus size should be taken into account when interpreting the results. Nevertheless, the study's methodology and findings provide a valuable starting point for further research on value alignment and model explainability in AI.

Recommendations

  • Further research should be conducted to examine the generalizability of the study's findings to larger models and different architectures.
  • Developers and researchers should prioritize the development of more transparent and explainable models, as well as the creation of methods to analyze and mitigate biases in LLMs.

Sources