Hello my friends, today I am presenting a very short article on how natural language and web scrapping can help us to stay tuned with market news and trend.
It’s not news for any of you that I am web scrapping news website for a long time since 2012/2013 my algos are collecting news from different sources on the internet. This week I decided to show you how I use this in my favor and some of the coded used to do it.
First thing I did was prepare my database to work with FULLTEXT search, I did this with the following command:
ALTER TABLE noticias ADD FULLTEXT(data, resumo);
The table name is “noticias”, noticias é news in Portuguese. What I did use to create a special kind of index that makes text search queries faster.
So I defined extra two functions to search in the database, I will not display the MySQL getQuery function, it’s just a wrapper to MySQL so nothing interesting to see.
This function searches for a “term”, some “lookback” “timeframe” ago and the timeframe can be [hour, day, month, year] and will return a list of links ordered by date.
Check this:
def pesquisarnoticia(term,lookback=1,field="link",timeframe="day"):
query ="select {} from noticias where data > now() - interval {} {} and match(titulo, resumo) against('{}' IN BOOLEAN MODE) order by data"
busca = query.format(field,daysago,timeframe,termo)
return [str(x[0]) for x in getQuery(busca,'forex')]
This other function will first use the “paresforex()” function to get all the available symbols I track in my database and then search for news that contains the symbol in the title or in the news body.
def morningnewswords(field="link"):
termos = paresforex()
noticias = {}
for termo in termos:
noticia = pesquisarnoticia(termo, datetime.now().hour, field, "hour")
if len(noticia) > 0:
noticias.update({termo: wordsfreq(1,Noticia = noticia)})
return noticias
The function returns a dictionary of news as values and symbols as keys. Check the results I got running it now!
a = morningnews()
a
Out[8]:
{u'$aapl': ['https://www.investing.com/news/stock-market-news/wework-chief-neumanns-top-lieutenants-step-up-as-successors-1985526',
'https://seekingalpha.com/article/4293410-indias-trump-card-tax-cuts',
'https://www.investing.com/news/stock-market-news/stocks--us-futures-slip-on-trump-impeachment-woes-1985667',
'https://seekingalpha.com/article/4293409-wall-street-breakfast-next-stocks',
'https://seekingalpha.com/article/4293434-korea-h-w-implications-better-expected-fold-sales',
'https://seekingalpha.com/article/4293443-smaller-large-caps-play',
'https://seekingalpha.com/article/4293454-first-trump-un-pelosi-impeachment-kiboshed-rally-excuses-piece-yesterday-quite-subpar',
'https://seekingalpha.com/article/4293462-apple-buybacks-ending'],
u'$qqq': ['https://seekingalpha.com/article/4293396-u-s-equities-take-care-leverage',
'https://seekingalpha.com/article/4293411-market-update-stock-highs-repo-rates',
'https://seekingalpha.com/article/4293409-wall-street-breakfast-next-stocks',
'https://talkmarkets.com/content/us-markets/the-us-dollar-is-declining-amid-political-news?post=235085',
'https://seekingalpha.com/article/4293415-risk-appetite-stymied-dollar-recovers-stocks-slide',
'https://seekingalpha.com/article/4293417-continue-give-bulls-benefit-doubt',
'https://seekingalpha.com/article/4293419-u-s-q3-growth-estimate-ticks-modest-2_2-percent-pace',
'https://seekingalpha.com/article/4293437-consumer-confidence-falls-september',
'https://seekingalpha.com/article/4293446-topical-questions-amid-political-economy',
'https://seekingalpha.com/article/4293448-technically-speaking-safely-navigate-late-stage-bull-market',
'https://seekingalpha.com/article/4293460-data-vs-perception'],
u'$spy': ['https://seekingalpha.com/article/4293396-u-s-equities-take-care-leverage',
'https://seekingalpha.com/article/4293411-market-update-stock-highs-repo-rates',
'https://seekingalpha.com/article/4293409-wall-street-breakfast-next-stocks',
'https://talkmarkets.com/content/us-markets/the-us-dollar-is-declining-amid-political-news?post=235085',
'https://seekingalpha.com/article/4293415-risk-appetite-stymied-dollar-recovers-stocks-slide',
'https://seekingalpha.com/article/4293417-continue-give-bulls-benefit-doubt',
'https://seekingalpha.com/article/4293419-u-s-q3-growth-estimate-ticks-modest-2_2-percent-pace',
'https://seekingalpha.com/article/4293437-consumer-confidence-falls-september',
'https://seekingalpha.com/article/4293441-trump-safe-now-market-strategy-emerges',
'https://seekingalpha.com/article/4293444-momentum-value-switching-strategy',
'https://seekingalpha.com/article/4293446-topical-questions-amid-political-economy',
'https://seekingalpha.com/article/4293448-technically-speaking-safely-navigate-late-stage-bull-market',
'https://www.bloomberg.com/news/articles/2019-09-25/credit-suisse-spy-probe-is-set-to-decide-fate-of-top-executives?srnd=markets-vp',
'https://seekingalpha.com/article/4293460-data-vs-perception'],
u'audnzd': ['https://www.home.saxo/insights/content-hub/articles/2019/09/25/fx-update-triple-whammy-dents-global-sentiment'],
u'audusd': ['https://talkmarkets.com/content/us-markets/audusd-daily-analysis--wednesday-sept-25?post=235055',
'https://www.home.saxo/insights/content-hub/articles/2019/09/25/fx-update-triple-whammy-dents-global-sentiment'],
u'dxy': ['https://www.marketwatch.com/story/heres-how-a-move-towards-impeaching-trump-might-hurt-investors-says-allianz-2019-09-25?mod=currencies'],
u'eurchf': ['https://www.home.saxo/insights/content-hub/articles/2019/09/25/fx-update-triple-whammy-dents-global-sentiment'],
u'eurusd': ['https://www.home.saxo/insights/content-hub/articles/2019/09/25/fx-update-triple-whammy-dents-global-sentiment',
'https://talkmarkets.com/content/us-markets/equities-lower-on-trump-impeachment-news?post=235093'],
u'gbpusd': ['https://talkmarkets.com/content/us-markets/risk-off-sentiment-back-on-trump-impeachment-inquiry?post=235051',
'https://talkmarkets.com/content/us-markets/equities-lower-on-trump-impeachment-news?post=235093'],
u'usdcad': ['https://talkmarkets.com/content/us-markets/usdcad-daily-analysis--wednesday-sept-25?post=235068']}
Remember our trade of the week on Orange Juice? This is how I found the articles related to it:
noticia = pesquisarnoticia('"ORANGE JUICE"',field="link",daysago=30)
noticia
Out[19]:
['https://seekingalpha.com/article/4288855-wall-street-breakfast-florida-prepares-dorian-disaster',
'https://seekingalpha.com/article/4288907-hurricane-dorian-amazon-burning-potential-commodity-impacts',
'https://seekingalpha.com/article/4289052-wall-street-breakfast-moved-markets-week',
'https://www.independent.co.uk/news/business/news/us-china-trade-war-trump-new-tariffs-xi-jinping-expensive-a9087391.html',
'https://www.economist.com/finance-and-economics/2019/07/06/a-new-trade-deal-has-fomo-as-its-secret-sauce',
'https://seekingalpha.com/article/4291220-world-burning-hurricanes-developing-coffee-dryness-brazil',
'https://www.independent.co.uk/news/world/americas/us-politics/trump-trade-war-iowa-state-fair-china-pork-soybeans-a9058916.html',
'https://talkmarkets.com/content/us-markets/toppy-tuesday-as-usual--why-did-china-go-home?post=234960']
This function is a little bit more interesting, it basically counts which words appear the most with the term I searched. Let’s try with “dollar”:
def wordsfreq(termo,lookback=1,termo2=None,noticia=None):
if noticia is None:
noticia = pesquisarnoticia(termo, lookback)
noticiaunica = ' '.join([noticia[0].lower() for noticia in noticia])
sentences = stok.sentences_from_text(noticiaunica)
if termo2 is not None:
sentences = [sen for sen in sentences if termo2 in sen]
d = []
for sent in sentences:
for w in word_tokenize(sent):
if w not in stopwords.words('english') and len(w)>2:
d.append(w)
bot = nltk.FreqDist(d)
both_most_common = bot.most_common()
d = list(itertools.chain(*(sorted(ys) for k, ys in itertools.groupby(both_most_common, key=lambda t: t[1]))))
return d
Let’s run and see the 20 words that appear the most in the news that contains the word dollar
words = wordsfreq("dollar")
words[:20]
Out[5]:
[(u'trade', 201),
(u'trump', 190),
(u'said', 189),
(u'market', 173),
(u'dollar', 152),
(u'president', 144),
(u'u.s.', 141),
(u'gold', 122),
(u'bank', 112),
(u'china', 106),
(u'impeachment', 106),
(u'would', 100),
(u'fell', 97),
(u'2019', 93),
(u'company', 92),
(u'stocks', 91),
(u'tuesday', 87),
(u'markets', 84),
(u'per', 84),
(u'also', 83)]
That’s it, my friends, this week I am launching a functionality inside the chatroom where users will be able to conduct the news of search with a command like
/newssearch "words I want"
When I say coding is powerful is no joke at all this can leverage your trading a lot, if you need assistance coding with a trading idea you have, just leave a reply that I will contact you.
I hope you all liked it and if you have suggestions about articles please leave a comment.
My best regards
Leo Hermoso
1
Mate i just wanted to say thank you, i’m getting passioned about python and coding thanks to you, being able to analyze this type of data can be very helpful in trading, please keep posting content, i’m loving it, will soon join your chat too
Hey bro! Nice to hear that! Check this link you will like it
www.nltk.org/howto/