很久没写应用了,其实python操作SEO相关的脚本我一直在写,今天给的大家分享一个SEO伪原创的时候,需要提取提取文章的高频词的概念。这个做外贸SEO的时候需要用到。
上代码:
import nltk
from nltk.probability import FreqDist

binfo = '''
shibang Mibile Crusher offers optimum set-up flexibility, from coarse to fine bashing, and is cost efficient. The application improves working safety, reduces the need for quarry highway maintenance, and gives coal minng significantly better access to material information. A further benefit is this : waste material can be separated on-site. shibang Mibile Crusher can be arranged to provide a two-stage crushing and evaluating system, as a three-stage abrasive, secondary and tertiary mashing and screening method, or as three independent units. Getting crushing and evaluating process on small wheels really boosts approach efficiency. shibang Mibile Crusher can be used for all mobile mashing applications, opening up clients opportunities. shibang Collection Cell Crusher is wholly adaptable to all cellular crushing needs.The item sets up a new variety of business opprtunities for trades-people,quarry operators, recycling along with mining applications.
'''
textinfo = nltk.word_tokenize(binfo)##分词
tagged = nltk.pos_tag(textinfo)##词性
fdist1 = FreqDist(textinfo)
minfo = dict(fdist1)
info = list(set([k.lower() for k,v in tagged if v == 'NN']))##所有名词
kinfo = [(k,minfo.get(k)) for k in info]
kinfo.sort(key=lambda k:k[1],reverse=True)
print ",".join([m[0] for m in kinfo[:5]])

输出的结果就是:shibang,crushing,material,mashing,process
这是出现频率最高的5个名词。
其实搜索引擎看待我们的文章内容的中心意思的时候,也可能根据高频词来判断的。类似tag标签的概念。

python视频教程

相关文章: