let’s just look for bigrams that start with “she” and “he”. We will get some adverbs and modifiers and such as the second word in the bigram, but mostly verbs, the main thing we are interested in.
New Gensim feature : Author-topic modeling. LDA with metadata. | RaRe Technologies
une nouvelle extension pour #gensim qui pourrait être très utile pour des corpus du genre #SPIP : une fois les topics modélisés à partir du #LDA, on sait les associer non seulement aux articles, mais aussi aux tags (mots-clés, auteurs), ce qui permet de savoir quels sont les auteurs proches, les thématiques similaires, etc.
Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half – Variance Explained
this weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data:
Every non-hyperbolic tweet is from iPhone (his staff).
Every hyperbolic tweet is from Android (from him).
Applying Data Science to the Supreme Court: Topic Modeling Over Time with #NMF (and a #D3.js bonus) — Emily Barry
LDA was the obvious choice to do first, as is evident when you google “#topic_modeling algorithm.” (...)
Then I read about Non-negative Matrix Factorization (NMF) and found that in uses similar to mine, its robustness far surpassed LDA. NMF extracts latent features via matrix decomposition, and you can use TFIDF which is a huge plus.
NIFTY is a system that finds mutations of a single piece of information across the daily news cycle. Each day, the system parses through 3.5 million news articles and 2 million mentioned quotes to find the top clusters of quotes through a process called incremental clustering.
#Blast.js separates text in order to facilitate typographic manipulation. It has four delimiters built in: character, word, sentence, and element. Alternatively, Blast can match custom regular expressions and phrases.
Blast’s uses include typographic animation, juxtaposition, styling, search, and analysis.