SAP Blog
Sharknado Social Media Analysis with SAP HANA and Predictive Analysis

Sharknado Social Media Analysis with SAP HANA and Predictive Analysis

Mining Social Media data for customer feedback is perhaps one of the greatest untapped opportunities for customer analysis in many organizations today. Social Media data is freely available and allows organizations to personally identify and interact directly with customers to resolve any potential dissatisfaction. In today’s blog post, I’ll discuss using SAP Data Services, SAP HANA, and SAP Predictive Analysis to collect, process, visualize, and analyze social media data to the recent social media phenomenon Sharknadosharknado

Collecting Social Media Data with SAP Data Services 

While I’ll be focusing primarily on the analysis of social media data in this blog post, social media data can be collected from any source with an open API by using Python scripting within a User-Defined Transform. In this example, I’ve collected Twitter data using the basic outline provided by SAP in the Data Services Text Data Processing Blueprints available on the SAP Community Network, updated it for the REST version 1.1 Twitter API.  This process consists of 2 dataflows, the first tracks search terms and constructs (Get_Search_Tasks transform) and executes (Search_Twitter transform) a Twitter search query to store the data pictured below. In addition to the raw text of the tweet, some metadata is available, including user name, time, and location information (if the user has made it publicly available).


Once the raw tweet data has been collected, I can use either the Text Data Processing transform in SAP Data Services or the Voice of Customer text analysis process in SAP HANA. While both processes give the same result, SAP Data Services is also able to perform preliminary summarization and transformations on the parsed data within the same dataflow. In this case, I will run text analysis in SAP HANA by running the command below in SAP HANA Studio.

Create FullText Index “VOC” On ()



This results in a table called $TA_VOC in the same schema as the source table, as shown below.


To download full PDF and Continue Reading…

Hilary BlissAbout Hillary Bliss
Hillary is the Analytics Practice Lead at Decision First Technologies, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.