AI & ML interests
None defined yet.
Recent Activity
	View all activity
	
			Organization Card
		
		The Stack v2 Training Data
This organization contains the full datasets used to train StarCoder2:
- the-stack-v2-train-full: contains the training data with 600+ programming languages used to train StarCoder2-15B with the files concatenated per repository
- the-stack-v2-train-full-files: same as- the-stack-v2-train-fullbut without repository concatenation which makes filtering files or licenses easier
- the-stack-v2-train-smol: contains the training data with 17 programming languages used to train StarCoder2-3B and 7B with the files concatenated per repository
- the-stack-v2-train-smol-files: same as- the-stack-v2-train-smolbut without repository concatenation which makes filtering files or licenses easier
See the tech report for all the details on the dataset.
			models
			0
		
			
	None public yet
			datasets
			0
		
			
	None public yet
