The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 385,166 peer-reviewed papers from the Astrophysics Data System, pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool’s versatility through case studies, showcasing its application in various research scenarios. The system’s performance is evaluated using custom benchmarks, including single-paper and multipaper tasks. Beyond literature review, pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g., in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying artificial intelligence to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy / Iyer K.G.; Yunus M.; O'Neill C.; Ye C.; Hyk A.; McCormick K.; Ciuca I.; Wu J.F.; Accomazzi A.; Astarita S.; Chakrabarty R.; Cranney J.; Field A.; Ghosal T.; Ginolfi M.; Huertas-Company M.; Jablonska M.; Kruk S.; Liu H.; Marchidan G.; Mistry R.; Naiman J.P.; Peek J.E.G.; Polimera M.; Rodriguez Mendez S.J.; Schawinski K.; Sharma S.; Smith M.J.; Ting Y.-S.; Walmsley M.. - In: ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES. - ISSN 0067-0049. - ELETTRONICO. - 275:(2024), pp. 38.0-38.0. [10.3847/1538-4365/ad7c43]
pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
Ginolfi M.;
2024
Abstract
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 385,166 peer-reviewed papers from the Astrophysics Data System, pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool’s versatility through case studies, showcasing its application in various research scenarios. The system’s performance is evaluated using custom benchmarks, including single-paper and multipaper tasks. Beyond literature review, pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g., in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying artificial intelligence to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



