6 Ways Data Science and IT Developers Can Work in Tandem

6 Ways Data Science and IT Developers Can Work in Tandem

Data science empowers software engineers and IT developers to extract meaningful insights from the processes and information they encounter during application development. But it also involves communicating the results of data analysis to other stakeholders in a software project – many of whom won’t have the technical level of understanding or expertise of an IT administrator or professional programmer.

So, at all stages of the software lifecycle, it’s necessary for data scientists and developers to be able to work together, understand each other’s points of view, and communicate effectively.

Data Science in Software Development

When developers are incubating their initial ideas for a new piece of software or an IT system, data science can be there to explore the ramifications and likely outcomes  – outcomes like incorporating a particular feature, or of how one functionality plays off against another, or even what data will be produced or can be leverage.

During programming and testing, the work of data scientists helps in collating results and making sense out of the figures of merit. Coupled with appropriate visualization techniques, data science can mold these results and insights from streams of numbers into stories that can be leveraged by financiers, marketers, sales personnel, and other stakeholders in the software ecosystem that hail from non-technical or non-computer related backgrounds. The user base for a particular system or product will likely also include people from a range of cultural and educational backgrounds.

For stakeholders, data scientists can provide tangible evidence of how revenue and business value are being generated, and insight into how and where actions are necessary to sustain good levels of performance or to make improvements.

Clearly, there’s a role for data science throughout the software development lifecycle – so it makes operational and economic sense to have data scientists and developers working together amicably at all stages of the process.

Finding A Common Language or Environment

Harmony and collaboration between software engineers and data scientists may be desirable, but it’s not necessarily that easy to achieve. In part, this derives from actual differences in the way the two disciplines operate, and perceptions that the two groups may have about the way their counterparts think and work. This graphic from CodeMentor sums it up neatly:

(Image source: codementor.io)

You’ll notice that big data frameworks are a tool common to both disciplines – and in nearly all industries, data-driven intelligence is now an essential part of day-to-day operations, whether it’s used for supply chain management or personalized marketing. It therefore makes sense to establishing and use a shared set of tools and languages for developers and data scientists.

Setting Up A Data Lake

The construction of a data lake is one way of making production data from the development process readily available to data scientists and software engineers alike. This lake is a common pool of information, set up in an environment separate from the production platform. Because it will be a repository for information generated throughout the lifecycle, the data lake must have the potential to store vast quantities of records – so a dedicated data center or cloud environment is best.

The data scientists will decide on the best way for information to be stored and optimized for the queries they expect to run in the near term, and as future ideas develop. Since much of the data will come from the main application that the developers are working on, the teams will need to collaborate on finding the best ways for data to flow into the lake in its raw form.

This design process should take into account factors like the data, schema, level of data compression (if any), and whether information flows will stream in real-time or enter via scheduled dumps. Each team’s level of responsibility for monitoring data flows should also be established.

Making the Right Tools Available

Creating a common environment for developers and data scientists requires tools that enable them to work on the same data sets simultaneously, writing and sharing code or rich text. Notebooks make this possible.

For online operations, open source platforms like PixieDust (a helper library for Jupyter notebooks) enable developers to explore data analysis models without having to learn or code in the statistical languages favored by data science. Originally created by data scientists as one-off scripts, Jupyter notebooks also allow for the offline analysis of data sets and algorithms.

Monitoring and Evaluation

Throughout the software development lifecycle, data science algorithms must trace the path from raw data to interpreted information to some kind of value. Both the work of the data scientists and the developers has to be assessed and observed at all stages. And this observation and evaluation need to be built into the development environment from the beginning.

The very process of setting up this scenario creates opportunities for collaboration in and of itself. On the one hand, the software engineers get a chance to build a framework that embeds the work of the data scientists in a pipeline combining various datasets and algorithms. On the other, data scientists play an integral part in its construction by setting parameters and framing the right kinds of questions.

Using Data Scientists and Developers in Cross-Functional Teams

The final piece of the collaboration puzzle comes with the formation of cross-functional teams consisting of representatives from both camps.

For one thing, having data scientists embedded in a development team (or developers attached to a data science unit) fosters debate and active communication between the members. It also promotes understanding, allowing software engineers and data scientists to better appreciate the needs of each other. In addition, having mixed groups of professionals within the same unit enables those practitioners to step in immediately with their particular skill sets if issues or opportunities arise.

Business units with experience of working with cross-functional teams also stress the importance of allowing a degree of flexibility for the data science members (who may occasionally need to branch out and explore particular topics in isolation for a while), and of creating a forum where various teams can meet to share ideas and knowledge.

At the end of the day, the aim is to enable data science professionals and software developers to use their unique skills to the best advantage of the team – in an environment that promotes creativity, and where trust and respect can build up between the team members as new knowledge and insights are acquired and invested back into the product.

August 15, 2019

Storytelling with Data: Using Visualizations to Help Stakeholders Understand Data Findings

Storytelling with Data: Using Visualizations to Help Stakeholders Understand Data Findings

The job of the data scientist is to acquire data, clean it, analyze it, make sense of it, and most crucially of all, communicate its meaning to an audience of (usually) non-data scientists. Effective communication is critical in data science.

Data scientists are highly-trained, highly-skilled individuals – and they need to be. The language of data science is highly complex and esoteric. Data modeling and analysis is complicated, and datasets are difficult to understand – especially for non-technical people. Furthermore, as a data scientist works with data, that data is sometimes stored as comma-separated values (CSV) files, Excel files, or otherwise in [No]SQL databases, the Hadoop Distributed File System (HDFS), and the like. As with all data science work, it’s not how the data is stored – nor the actual data itself – that is inherently valuable. Rather, the value is found in the insights that can be drawn from it.

Data isn’t easy to decipher– especially when dealing with large volumes of the stuff. This is where data visualization comes in. Data visualization is the process of presenting data in a visual context – using pictures to understand data, in other words. This is vital even for the data scientist to comprehend, let alone for effective communication between the data science team and relevant stakeholders. Indeed, for data scientists to produce truly actionable insights, the findings and observations have to be made available to the stakeholders tasked with acting on them.

Data visualization enables different stakeholders and decision-makers to understand the significance of data by presenting it not as volumes and volumes of records, but in easy-to-interpret graphs, charts, maps, dashboards, and other visualizations. Visualizations provide a consumable way to see and understand trends, patterns, correlations and outliers in data. In this way, data visualizations don’t only reveal insights – they help make those insights actionable.

 

Why Is Data Visualization So Effective in Data Science?

The simple answer to this question is because of the way the human brain processes information. Since data science business projects usually involve a lot of information to process, the human brain often is unable to handle the volume. According to Dell EMC, organizations managed an average of 9.7 petabytes of data in 2018, a 569% increase compared with the 1.45 petabytes they handled in 2016. With so much data, it’s practically impossible to wade through it all line-by-line and pick out patterns and trends – but with data visualization tools and techniques, insights are much easier to see and grasp. The reason is that our brains process visual information much faster than text-based information – 60,000 times faster, in fact, according to estimates. Here’s a visual to help that data sink in quicker…

(Image source: killervisualstratergies.com)

 

…And here’s an example exercise from Study.com to prove the point.

Question: Looking at the following table, what month recorded the highest sales?

Obviously, it’s December – but it took you a few seconds to read through the figures to find the answer. By comparison, look at this simple visual representation of the same data and ask yourself the same question…

(Image source: study.com)

 

… You got the answer almost instantaneously, right? And you can also see the peaks and troughs throughout the year – the larger story the data tells.

We all know that time is money in business. Organizations that can make better sense of their data quicker are more competitive in the marketplace. Why? Because they can see trends, patterns, and make informed, evidence-based decisions sooner than their rivals. Data visualization helps this to happen.

Consider a marketing team working across 20 or more ad and social media platforms. The team needs to know the effectiveness of its various campaigns so it can optimize spend and targeting – and it needs this information quickly to remain competitive. The process could be completed manually by going into each system, pulling out the various reports, combining the data, and then analyzing on a spreadsheet – but it would take an age to pore through all the metrics and draw any meaningful conclusions. However, utilizing data visualization tools, all sources of data can be automatically connected, and visualizations immediately produced to be presented to the team, allowing its members to draw on-the-spot comparisons and conclusions about each campaign’s performance.

 

It’s All About Fast and Clear Communication

There are many different types of visualizations – line plots, scatter plots, histograms, box plots, bar charts, the list goes on. They may seem simple – but it is precisely this simplicity that makes them so valuable when presenting data science findings to stakeholders.

(Image source: blog.qlik.com – click to access this chart)

 

In a recent interview with DataquestKristen Sosulski, Clinical Associate Professor of Information, Operations, and Management Sciences at New York University Stern School of Business – and author of Data Visualization Made Simple– makes the point that while very few people can look at a spreadsheet and draw quick and accurate conclusions about what the data says, anyone can compare the size of bars on a bar chart, or follow the trend on a line graph.

Sosulski explains that while data visualization is a key skill at every stage of the data science process, it becomes critical at the point of communication. “There are a lot of angles that you can take with visualization, and ways to look at it,” she says. “I think about data visualization as something that we have in the toolkit to help people better understand our insights and our data. Just on a human level, visualizations allow us to perceive information a lot more clearly when they’re well designed.”

 

Conclusion

Using visual representations, data scientists can open the eyes of key stakeholders and decision-makers, allowing them to understand clearly and quickly what a dataset is revealing, how a model will help solve a business problem, and what impact the scientist’s proposals and discoveries will have on the organization. Emerging trends – both in the business and in the market – can be pinpointed quickly, outliers spotted, relationships and patterns identified, and the whole data story communicated engagingly in a way that gets the message across without unnecessary delay.

Without visualizations, all you have is data on a spreadsheet. All insights remain buried. The beauty of data science is that it reveals the true value of all those petabytes of data organizations are now managing. But without using data visualizations to communicate the important insights a data science project discovers, that value will be forever lost.

August 2, 2019

Disrupting the Cloud with Serverless

Disrupting the Cloud with Serverless

 

Serverless may indeed be the new black. It made Gartner’s 2019 list for top 10 trends in Infrastructure and Operations. An annual growth rate of 75 percent qualified serverless as the fastest-growing cloud service model in RightScale’s 2018 State of the Cloud report. And deployment of serverless computing technologies among global enterprises is expected to hit 20 percent by 2020, up from the current 5 percent.

Despite the nomenclature, computing will continue to involve servers. Developers, however, will no longer have to be involved with provisioning, deploying and monitoring servers. Those administrative tasks will now be handled by a new pedigree of services such as AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, and IBM OpenWhisk.

 

Serverless, FaaS & BaaS

Serverless broadly refers to applications that circumvent the need for an always-on server component by using third-party hosted services to manage server-side logic and state. There are currently two types of serverless deployments; Functions-as-a-service (FaaS), provided by the new pedigree of services mentioned above, or Backend-as-a-service (BaaS) providers like Google Firebase, AWS DynamoDB, BaaSBox, and Backendless.

BaaS provides a complete online backend service, such as storage, social networking integration, or locations services, so that developers don’t have to develop another backend for each service that the applications use or access.

FaaS, on the other hand, is a serverless approach to application development, that enables developers to write and deploy code without having to manage their own infrastructure. The developer defines events and sets triggers and the service provider ensures that the right amount of infrastructure is delivered to execute the code. The code is executed only when backend functions are invoked by events triggered by users, dies once the task is completed, and the customer is charged only for the duration of the execution.

SOURCE: https://www2.deloitte.com/content/dam/Deloitte/tr/Documents/technology-media-telecommunications/Serverless%20Computing.pdf

 

The serverless model represents a significant shift even from the traditional cloud service model where customers had to reserve and pay for a predetermined amount of bandwidth or server space, irrespective of usage. It is therefore easy to see the value in a model that delivers on-demand execution, dynamic scaling based on application loads and pay-per-use pricing.

As a fully-managed service, serverless ensures that all infrastructure related issues are shifted from the developer to the vendor. Let’s now take a look at some key benefits of the serverless model.

 

Benefits of serverless

Efficient utilization: The serverless model ensures infinite scalability and built-in high availability and the pay-per-use feature eliminates idle time as customers only pay when functions are executed. Dynamic provisioning ensures that the infrastructure scales automatically based on application loads, making it especially attractive in the case of applications with wildly inconsistent usage patterns.

 

Enhance developer productivity: Freed from the demands of infrastructure management, developers can focus fully on the functions that they are expected to deliver without having to worry about server capacity. Every function that comprises the applications can be updated independently, or all at one go, without having to change the entire application. This makes it much quicker and easier to update, patch, fix or add new features to an application. Serverless also accelerates go-to-market times by drastically simplifying the process of writing, testing, and deploying code.

 

Reduced latency: A serverless architecture makes it possible for developers to reduce latency by running code closer to the end user. Since application functions are not tied to a centralized origin server, they can now be shifted to servers closer to the end user to decrease latency.

 

Improved security: It is also argued that constraining the developer to using only code constructs that work within the serverless context can produce code that is aligned with security, governance, and best practice protocols.

In addition to all this, there is also the idea floated by Gartner that serverless computing can improve the productivity of infrastructure and operations (I&O) administrators. According to the analyst, the event-driven architecture of FaaS is perfectly suited to automate cloud infrastructure operations.

 

SOURCE: https://www2.deloitte.com/content/dam/Deloitte/tr/Documents/technology-media-telecommunications/Serverless%20Computing.pdf

 

Notwithstanding all these advantages, there are still some key challenges associated with serverless that may not make it the ideal choice for all applications and situations.

Key serverless challenges

Development challenges: The programming approach for serverless applications can be quite different from even that of traditional cloud services. Moreover, there is currently a lot of variation in the programming languages and frameworks supported by different serverless service provider. Then there are limitations imposed by some vendors on code size, memory, etc., all of which seriously limit the type of applications that can be built for this environment.

Testing and debugging can also be challenging in this model. The primary reason for this is the dearth of testing tools for developers that can exactly emulate cloud events in a local development environment. Even debugging gets complicated because of the limited visibility into backend processes.

 

Performance challenges: A traditional always-on server can process user queries instantly. Since serverless code is shut down after a function has been invoked and the task is completed, it has to be booted up again the next time it is triggered. Functions that have been inactive for a while will then have to be cold started, which can slow down response times and increase latency. Hence, this model may not be suitable for applications that are used rarely yet still need to respond extremely quickly to events. This ephemerality of the service, combined with the resource limitations imposed by vendors, can render serverless unsuitable for applications with larger processing requirement.

 

Security challenges: There have been several critical security risks that have been identified for serverless applications, primary among these being vendor security and multi-tenancy. With vendors taking complete control of the backend where the actual execution occurs, developers have little visibility into the security protocols in place. This can be of particular concern with respect to applications that use sensitive data. Multi-tenancy raises similar concerns as serverless applications in all likelihood run on shared infrastructure and can result in data being exposed.

Finally, there is the challenge of vendor lock-in. With every vendor offering their own combination of programming languages, workflows and features, it can become extremely cumbersome to migrate applications from one platform to another.

Serverless represents the latest in a long trend of abstractions that have transitioned enterprise IT from physical servers to virtual machines and containers. Now, infrastructure itself is being abstracted away.

Conclusion

Today, serverless is the hottest cloud computing trend with the Microsoft CEO declaring it the core of the future of distributed computing. But serverless is still a nascent technology with its promise of cost transparency, agility, and productivity being counterbalanced by several significant challenges that still have to be addressed. Yet, adoption predictions are through the roof, with the rider that as extraordinary as serverless may be it may just not be appropriate for every use case.

June 6, 2019

Leveraging the Full Potential of the Cloud Computing Model with Cloud-Native Development

Leveraging the Full Potential of the Cloud Computing Model with Cloud-Native Development

 

Cloud-native or cloud-hosted, that is the question. And depending on where you stand, cloud-native is either just the flavor of the season or the future of software development.

The buzz around cloud-native applications is definitely rising, but cloud-hosted applications remain the norm, at least for the time being. A Capgemini study estimated a 15 percent adoption rate for cloud-native applications, with a projected rise to 32 percent by 2020. Even among cloud-native leaders, the study found only 15 percent are developing new applications in a cloud-native environment and only 20 percent of these new applications adopt a cloud-native approach.

The cloud-hosted model is a middle ground approach that combines traditional on-premise application development with a more contemporary preference for cloud deployments. This approach does have its advantages – in the case of security-conscious industries that handle large volumes of sensitive datafor instance. However, it also impairs an organization’s ability to fully leverage the potential and the possibilities afforded by the cloud computing model.

 

Defining cloud-native

By contrast, cloud-native applications are built, tested, and staged on the clouds. According to the Cloud Native Computing Foundation (CNCF), an open source software foundation driving the development of cloud-native computing, this modern approach enables organizations to run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Since these applications are purpose-built and optimized for the cloud, they offer greater agility, resilience, and portability across clouds.

The CNCF defines three essential criteria for cloud-native development:

  1. It is container-based, so every component is packaged in its own container to facilitate reproducibility, transparency, and resource isolation.
  2. It is dynamically orchestrated, so the containers are actively scheduled and managed to optimize resource utilization.
  3. It is microservices-oriented, so applications are segmented into microservices to enhance the overall agility and maintainability of applications.

 

Containers, microservices, and dynamic orchestration

Containers: A container is a self-contained package of software that includes everything required for the application to run isolated from and independent of its operating environment. This makes container-based applications easy to deploy across diverse target environments such as a private data center or a public cloud. Container technologies such as Docker, CoreOS rkt, Mesos and LXC make it much easier for companies to develop, deploy and migrate applications across a hybrid computing environment. Plus, they offer some real business benefits including accelerated deployment times, faster time to market, lower infrastructure costs, higher operational efficiency, and they simplify the continuous deployment of microservices.

Microservices: More and more organizations are moving from large monolithic application architectures to a microservices architecture that is faster, agile, and easier to maintain. In cloud-native development, multi-functional applications are broken down into several smaller independent modules, each with a specific purpose. These microservices can work together through APIs to create more agile, scalable applications.

There are several advantages to this approach. For one, building a small module with one specific function is much easier than developing a large monolithic and multi-functional application. This modularity enables developers to choose the language or technology that best facilitates the required functionality. It also simplifies maintenance, as each service can be independently modified or updated without having to create an entirely new application. And since a microservices architecture isolates functions, even security issues are confined to specific modules and can be addressed without affecting the functionality of the entire application.

Dynamic orchestration: Microservices architecture enables businesses to more efficiently leverage cloud functionalities, allowing for rapid scaling-up or scaling-down and deployment across data centers and cloud platforms. But as the architecture becomes more complex, all the microservices have to be orchestrated so that they can work as an application. The coordination of commands across multiple services is managed by orchestrators or orchestration engines such as Kubernetes and Docker Swarm. Dynamic orchestration enables automation across software lifecycle stages enabling the creation, updating, and removal of software versions based on predefined criteria including real-time traffic demands. Service instances can be automatically scaled up or back based on requirements.

In addition to all this, the cloud-native model involves almost every modern application development tool, framework, and technology available including DevOps, Agile, and continuous delivery. However, leveraging all these modern technologies to completely reinvent application development for the cloud paradigm is not without its challenges.

The security challenge

Security, alongside culture, complexity, lack of training, and monitoring, is a top-five challenge facing cloud-native development efforts, according to a survey from CNCF. Though security concerns continue to drop across editions of the survey, it makes sense to take a closer look at the issue especially since it is one of the keys with cloud-hosted solutions.

There are several inherently cloud-native characteristics that may actually make these environments easier to secure, like the immutability of container images, for example. In fact, early cloud-native adopters cite improved data security as one of the top organizational benefits of this new approach.

Nevertheless, the dynamic nature of cloud-native development introduces some new challenges that cannot be addressed by conventional security strategies. Organizations need to adopt DevSecOps practices ensuring that security is integrated across the application development lifecycle. Controls have to be implemented at the application level to ensure that services behavior is always consistent with intent.

For many industry commentators, cloud-native means using the cloud as it was intended to be used. But as mentioned earlier, culture, complexity, and training continue to be issues that can slow down enterprise adoption of this innovative new approach. One solution, proposed by Oracle, is to build a bigger tent where the emphasis will be on reducing the complexity of cloud-native while ensuring that all enterprises, both modern and traditional, get all the support they need on their cloud-native journey.

June 6, 2019

How PWAs Can Disrupt the Native Mobile App Ecosystem

How PWAs Can Disrupt the Native Mobile App Ecosystem

The increasing sophistication, functionality, and convenience of web apps have by and large relegated desktop applications to specialized functions. The question now is if web apps can do something similar with native mobile apps?

Interestingly enough, web apps, rather than native apps, seem to have been Steve Jobs’ first choice of app development framework at the launch of the iPhone in 2007. Though that never panned out, web apps may finally be ready to take on native mobile apps.

But before we get to the future of native apps, let’s take a quick look at a significant milestone in the development arc of web apps themselves – single-page applications or SPAs.

 

SPAs and MPAs

SPAs do exactly what it says on the tin; they comprise a single view page that includes all the presentation markup required by the application. The first click loads the entire page from the server, which is then modified dynamically based on subsequent user requests and interactions. Each interaction sends an AJAX call to the server which provides the data, rather than the entire HTML required to render only the components relevant to the request. In short, once a SPA has loaded, all interactions result in data rather than HTML transactions.

This is in marked contrast to the request-response architecture of traditional multi-page applications (MPAs), where entire HTML pages have to be fetched from the server for every user interaction.

 

Source:https://www.mindk.com/blog/single-page-applications-the-definitive-guide/

 

This contrast translates into some fundamental benefits for SPAs over MPAs, including quicker server response, lighter server loads, and limited network activity. There are, however, several other user- and development-side advantages to the SPA model over MPAs.

 

Benefits of SPAs

On the user front, eliminating the need for constant reloading enables a seamless and more responsive user experience. There is also significantly less lag between request and response, as all components of a SPA are loaded up front. Most importantly, SPAs are even able to function offline, as the first click downloads all data from the server and stores them locally.

On the development side of things, SPAs can streamline development and simplify deployment, compared to traditional MPAs that depend on server-side rendering. SPAs are also easily ported from web to mobile, as the same backend code can be reused and the single-page UI means that even the design can be transitioned without having to make too many alterations.

There are some limitations though. Primary among the limitations is the inability of search engines to index JavaScript and the security vulnerabilities of Cross-Site Scripting (XSS).

Some of the most popular web destinations today are SPAs, including Netflix, Gmail, Facebook, Twitter, and GitHub. And there are several modern JavaScript frameworks that can be used to build complex, functional and responsive single-page applications for the web, such as Angular, React, Knockout, Meteor, and Vue.

Though SPAs may have disrupted web apps, they are not, in spite of being mobile-adaptable, really qualified to challenge the dominance of native mobile apps. For that, we turn to a mobile-first approach called Progressive Web Applications (PWAs).

 

What are PWAs?

PWAs are essentially mobile apps delivered through the web rather than an app store. A Forbes article from March 2018 advised businesses without a mobile presence to simply opt for – and businesses with a mobile presence to quickly migrate to – PWAs. A good PWA, the authors counseled, could at one stroke replace a company’s desktop site, mobile site, and native app.

There has been a lot of similar buzz surrounding PWAs ever since Google publicized the concept in 2015 accompanied by a three-point performance manifesto. According to the manifesto, PWAs had to be reliable even in low connectivity or offline conditions, they had to be fast and responsive to user interactions, and they had to deliver the engagement and experience of a native app.

There is a concept called service workers that has been key to ensuring the reliability and functionality of PWAs even in low connectivity or offline conditions. Service workers are background scripts that sit between the PWA and server, have the ability to cache data for offline use, and act as a proxy to either retrieve elements from the server or the cache depending on the network. Currently, around 90 percent of the global user base is on browsers that support service workers, opening up the theoretical possibility for PWAs to mount a challenge against the virtual monopoly of native apps.

PWAs are positioned in the sweet spot between mobile websites, which are easy to access but

short on experience, and native apps, which definitely deliver on experience once the adoption barriers are addressed. According to a 2017 U.S. Mobile Apps Report, a majority of users do not download any apps in a month. Then there is the issue of discoverability in an ever-proliferating mobile app ecosystem. And finally, there’s the process of downloading, activating and accessing the app, where every additional step can lose another 20 percent of prospective users.

 

PWAs — Advantages

Progressive Web Apps address many of the issues associated with native apps. PWAs do not have to be installed to be experienced as they can run in a web browser. But they can be installed, if the user chooses, without having to take a detour to an OS-specific app store. PWAs do not take up too much space, as there are no humongous APKs to be downloaded, and they are thrifty with memory and data use. From a development perspective, they are relatively cheaper and quicker to build and deploy, as building one cross-platform application can extend the same functionality and experience to all OSes and devices.

 

Source:http://webagility.com/posts/how-progressive-web-apps-make-the-web-great-again

 

On top of all this is the return on investment. For instance, the user acquisition costfor web apps is estimated at one-tenth of that for native apps. Then there are the performance stats in terms of load times, time spent on site, conversions, engagement etc.

PWAs combine the best of what the mobile web and native apps have to offer, while at the same time eliminating many of the limitations associated with these formats. Even some of the biggest beneficiaries of the native app model are now backing the PWA trend. For smaller businesses without the resources to build a desktop site, a mobile site and a couple of native apps, PWAs present a simple and economic route to be a part of the mobile-first economy. True, native apps will still have their own utility, but the future may just be more progressive than native.

 

June 3, 2019

Analytics and Insight: Problem Solving & The Essence of Data Science

Analytics and Insight: Problem Solving & The Essence of Data Science

What is Data Science? What does a Data Scientist actually do? What do you look for in a Data Scientist?  Where did you get those shoes? In the myriad of questions which I am asked as a Data Scientist, these stand out both in terms of frequency of ask and the earnestness with which they are asked. While business objectives (and job seekers, and training programs) desire simple and straightforward answers, reality is complicated by the fact that the answers are all highly nuanced and somewhat esoteric.

Well, all except that last question.  There the simple answer is Fluevog.

While putting together a presentation where I was asked to answer these questions for a group of students, I came across an infographic created by the IBM® BigInsight™ Team that summed up a Data Scientist as an amalgamation of Analytics and Insight. This resonated with how I have often thought about Data Science in my experience. Now certainly there are other features one would look for in a Data Scientist but these two stand out, in what I see are orthogonal manners.

 

What do you look for in a Data Scientist?

Analytics, statistic, math, plots – these are all tools that are used as a means to an end and like most tools, they come with manuals. The instructions may look like gibberish to some, but they exist. Dictionaries exist to define the terms and help a user grasp the meanings; they are available to all. There is a marvelous democratization of knowledge. In particular, in a business setting the derivation of new laws of mathematics or the creation of novel statistical tests are rarely necessary. While not everyone may enjoy math, or find it intuitive, the mechanics are able to be learned and employed by a significant number of people. In other words, given an investment of time, analytics are an open book.

On the other hand, insight and intuition are incredibly difficult to teach. At some level, they rely upon the innate curiosity and thought patterns of an individual. Hiring managers and proto-Data Scientists alike are oft stymied here. Why? The underlying question they want to answer is essentially: How can I know if this person is able to solve complex problems in a convincing manner and is then able to implement that solution such that others can be benefitted. I think back to my undergraduate years when taking freshman physics. This was a class required by many majors as a prerequisite. While the material clicked in my mind, I was drafted to be part of a large study group where, despite my best efforts, I never successfully helped some people understand the problems we were working. These were not unintelligent people; they would end up being successful doctors, chemists, advertisers and a myriad of other careers. They simply did not think in the same manner. They were unable to lay hold of the elusive insight that unraveled the twisted knot of the problems.

 

What does a Data Scientist actually do?  

All people use data to some extent. Not all people have the wherewithal to rapidly place raw data in a heretofore uncontemplated context and apply it to solve a problem that, to this point, did not have an answer. Insight with the ability to use data to answer questions is a rare talent. That is what a Data Scientist does and why they are so valuable. Note that this is a very general statement. It does not qualify the type of data. It does not limit the field of inquiry. It does not guide the sort of statistical tests that should be known. As a result, Data Scientists come in a variety of guises.

One Data Scientist may be more attuned to the nuances of statistical tests whereas another may know the intricacies of neural networks. Some Data Scientists may be better coders and others might be great communicators. The common factor that unites them in a single category of humanity is the tenacious curiosity that leads them to find the insights that solve the problem before them. Despite where an individual Data Scientist’s strength may be, they are marked by an ability to quickly learn and grow – but learning, growth, and insight alone do not themselves define a job field.

 

What is Data Science?  

There are many ways to define Data Science as a domain; likely as many ways as there are people to whom this question is posed at the moment. I would posit that this comes from the fact that to define the field of Data Science we must combine the skills, insight, and problem-solving in a business context. The exact mixture of these depends on the individuals involved and the questions that must be answered, or more to the point the questions that must be asked. In general, data science is the ability to master new domains and techniques with a critical process enables them to solve problems and to understand how things work. Simply put, data science is problem-solving in a digital environment.

 

 

Dr. Kevin Croxall, Expeed Software

Dr. Kevin Croxall, Expeed Software

 

Kevin Croxall is Director of Data Science for Expeed Software. He is a data and research scientist with more than a decade of comprehensive experience in data science project design and implementation. He has a broad range of experience in software development geared toward pipeline development, statistical analysis, and data visualization and presentation.

 

May 9, 2019

© 2019 Expeed Software