In May, IBM open sourced its Granite 13B LLM, ideal for enterprise use cases. Now, Armand Ruiz, the VP of product – AI platform at IBM, has revealed the entirety of its comprehensive 6.48 TB dataset used to train Granite 13B. This dataset, after undergoing rigorous pre-processing, was reduced to 2.07 TB, reflecting a 68% reduction. The dataset was meticulously curated from various sources and underwent key steps to ensure the final dataset was of the highest quality for model training. IBM has released four variations of the Granite code model, ranging from 3 to 34 billion parameters, outperforming other comparable models in various tasks.
IBM Releases Comprehensive 6.48TB Dataset for Granite 13B Model
Date:
Updated: [falahcoin_post_modified_date]