The Educator's PLN

The personal learning network for educators

Before using Big Data, you need to extract Web Data

I want to share an interesting article about data scaping that you might need in your business. The article below is mainly reprinted from here

Nowadays, big data is not new to us. Some of us use big data almost everyday, but how to extract web data that is high-volume in a short time? we will talk something about it.

 

Advances in data gathering, computing power and connectivity mean that we have more information than ever before at our fingertips. IBM estimates that by 2020 there will be 300 times more information in the world than there was in 2005.”  – John Hsu, Guardian Journalist

 

Large volume of data will stay in WEB and APP. So we can say, web data capture is part of big data architecture and offers the basic data source for big data architecture.

 

When we want to make text corpus, we need artificial intelligence to fetch data needed.

 

When we do some consumer behavior analysis, we need to collect comments on social media platforms.

 

When we make marketing pricing strategies, we need to track the prices and collect the data.

 

When we want to win at betting, we need to do extract enough gambling historical data to do analysis.

 

To accomplish these things above, we need hundreds of thousands of data. But most of the data on the Internet is unstructured data, and it sounds quite troublesome to extract such kind of data. In this case, you need someone who is good at writing a web crawler, developer for example, to create such a crawler for you to extract web data you need. Besides, you need to test the code after you finish writing before you spend most of your time and energy to collecting data, for a whole day with some cups of tasteful coffee. Don't you think that it's boring?

 

We can go online and ask for help. Google web data extractor and you will find many useful tools available for you to meet your different needs. And you have to pay for the service or purchase their packages. Maybe you are familiar with import.io, mozenda or other tools, but now, at this moment, it’s time for you to experience Octoparse, a totally free yet powerful program. It would only charge for a small fee when you need a lot of cloud servers to help you gather information, and it provide adequate support for users. I love this software since it can extract what I want from web pages and want to recommend it to you if you need to capture high-volume web data.

Views: 162

Comment

You need to be a member of The Educator's PLN to add comments!

Join The Educator's PLN

About

Thomas Whitby created this Ning Network.

Latest Activity

David Chiles updated their profile
yesterday
David Chiles posted a video
yesterday
Profile IconKaren D Honeycutt, Jessica J Lenhart, Erin and 11 more joined The Educator's PLN
yesterday
Thomas Whitby's blog post was featured

Where’s the Silver Lining for Education?

With the cloud of the Corona Virus hanging over us and growing by the hour, it is difficult to see any silver lining. Health and safety are our greatest concerns. The stakes are high and the consequences may be fatal to too many. Anything I discuss here shouldIn the past, many discussions by several education leaders have sometimes suggested the idea of education reform…See More
Friday
Thomas Whitby posted a blog post

Where’s the Silver Lining for Education?

With the cloud of the Corona Virus hanging over us and growing by the hour, it is difficult to see any silver lining. Health and safety are our greatest concerns. The stakes are high and the consequences may be fatal to too many. Anything I discuss here shouldIn the past, many discussions by several education leaders have sometimes suggested the idea of education reform…See More
Friday
Christine Hinkley posted an event

OLC Accelerate 2020 at Walt Disneyworld Swan & Dolphin Resort

November 17, 2020 at 8am to November 20, 2020 at 5pm
The OLC Accelerate conference emphasizes the most innovative and impactful research and effective practices in the field of online, digital and blended learning. Supporting administrators, designers, and educators alike, Accelerate offers a comprehensive list of sessions and activities tailored to addressing the challenges and goals of our entire community. Our curation of conference tracks and exhibits promises a cross-section of the prime topics in our field, offering exciting programming…See More
Mar 18
Christine Hinkley updated an event

OLC Innovate 2020 - Virtual Conference at Online

June 15, 2020 to June 26, 2020
Together we will build new foundations for stronger, better higher education environments. And because innovation scales best when ideas are shared, our work sessions will explore digital technologies and adapted teaching behaviors aimed at informing policy, inspiring leadership, and evolving practice at all levels impacting institutions, universities and colleges.See More
Mar 18
Daniela McVicker liked Thomas Whitby's blog post Is Online Learning The Answer For The Coronavirus Closed Schools?
Mar 16

© 2020   Created by Thomas Whitby.   Powered by

Badges  |  Report an Issue  |  Terms of Service