cstr commited on
Commit
a689cd6
·
verified ·
1 Parent(s): 94cd031

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -6
README.md CHANGED
@@ -1,13 +1,85 @@
1
  ---
2
- title: Conceptnet Normalized
3
- emoji: 👁
4
- colorFrom: indigo
5
- colorTo: pink
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
- pinned: false
10
  license: cc-by-sa-4.0
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Normalized ConceptNet Explorer
3
+ emoji:
4
+ colorFrom: green
5
+ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
+ pinned: true
10
  license: cc-by-sa-4.0
11
+ tags:
12
+ - conceptnet
13
+ - knowledge-graph
14
+ - sqlite
15
+ - normalized
16
+ - gradio
17
+ - fast-queries
18
  ---
19
 
20
+ # Normalized ConceptNet Explorer (V7)
21
+
22
+ This application is a high-performance explorer for a normalized, filtered, and optimized version of the ConceptNet 5.5 knowledge graph.
23
+
24
+ It is designed to be **extremely fast**, returning queries in milliseconds instead of minutes. It queries a 1.78 GB optimized SQLite database with integer-based joins, not the 23.6 GB un-normalized file.
25
+
26
+ ## Features
27
+
28
+ This app provides a full suite of tools to explore the normalized database:
29
+
30
+ - **⚡ Semantic Profile**: Explore relations for any word in real-time. This now runs in ~4 fast SQL queries instead of 24+ slow ones.
31
+ - **⚡ Query Builder**: Build custom queries (start node, relation, end node) that are executed with fast, integer-based joins.
32
+ - **⚡ Raw SQL**: Execute SQL queries directly against the new, normalized database schema (see schema below).
33
+ - **⚡ Schema**: Browse the new, efficient database schema, including all tables, indexes, and row counts.
34
+
35
+ ## How It Works: The Normalized Database
36
+
37
+ This app's speed and correctness come from the new database it queries: [cstr/conceptnet-normalized-multi](https://huggingface.co/datasets/cstr/conceptnet-normalized-multi).
38
+
39
+ This database was created by a V7 normalization script that fixed critical issues found in the original data:
40
+
41
+ 1. **Normalization (Speed & Size)**: The original 23.6 GB `edge` table (34M rows) was bloated with text URLs. The new 1.78 GB `edge_norm` table replaces these with tiny integer foreign keys.
42
+
43
+ 2. **Data Correctness (V7 Fix)**: The original `node` table (28M rows) was used as the source of truth. We migrated all 28M nodes and their authoritative `language` columns.
44
+
45
+ 3. **Preserves Cross-Language Links**: The 34M edges were filtered to keep any edge where at least one node (start or end) was in our 11 target languages (`en`, `de`, `fr`, `it`, `es`, `ar`, `fa`, `grc`, `he`, `la`, `hbo`). This is critical, as it correctly preserves cross-language connections (e.g., `犬 (ja) -> hund (de)`), which were broken in previous attempts.
46
+
47
+ The result is a clean, fast, and data-correct database that contains all relevant connections for our target languages.
48
+
49
+ ## Supported Languages
50
+
51
+ This normalized version includes edges for 11 languages:
52
+ - English (en)
53
+ - German (de)
54
+ - French (fr)
55
+ - Italian (it)
56
+ - Spanish (es)
57
+ - Arabic (ar)
58
+ - Persian (fa)
59
+ - Ancient Greek (grc)
60
+ - Hebrew (he)
61
+ - Latin (la)
62
+ - Biblical Hebrew (hbo)
63
+
64
+ Cross-language connections from other languages to these target languages are preserved.
65
+
66
+ ## Original Dataset Information
67
+
68
+ This work includes data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 4.0) from http://conceptnet.io.
69
+
70
+ For a full list of licenses and attributions for included resources such as WordNet, Open Multilingual WordNet, and Wikimedia projects, please see the original dataset card.
71
+
72
+ ## Citation Information
73
+
74
+ If you use this data in your work, please cite the original ConceptNet 5.5 paper:
75
+
76
+ ```bibtex
77
+ @inproceedings{speer2017conceptnet,
78
+ author = {Robyn Speer and Joshua Chin and Catherine Havasi},
79
+ title = {ConceptNet 5.5: An Open Multilingual Graph of General Knowledge},
80
+ booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
81
+ year = {2017},
82
+ pages = {4444--4451},
83
+ url = {http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972}
84
+ }
85
+ ```