Bag of words - BOW Algorithmus
Damit werden Texte in representative Bektoren umgewandelt und damit kann man einfach besser arbeiten.
Beispiel - BowTest.py
from sklearn.feature_extraction.text import CountVectorizer
Sentences=['Bag of Words model is very useful NLP technique.', 'Bag of Words model is used to extract the features from text.']
vector_count = CountVectorizer()
text_feature = vector_count.fit_transform(Sentences).todense()
print(vector_count.vocabulary_)
print(text_feature)
-------------------------------------------------------------------------
{'bag': 0, 'of': 7, 'words': 15, 'model': 5, 'is': 4, 'very': 14, 'useful': 13, 'nlp': 6, 'technique': 8, 'used': 12, 'to': 11, 'extract': 1, 'the': 10, 'features': 2, 'from': 3, 'text': 9} [[1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1] [1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1]]