This is the first time that we at Catalyze are attending AWS re:invent. We always knew that the event was large but we're surprised to see and hear that the event has grown substantially since last year. AWS re:invent expects to have about 32,000 attendees this year with attendee profiles ranging from engineers to CIOs and a broad range of industries.
This is a recap of observations during the first half of the conference. We'll publish another recap at the end of the conference.
From what we understand this is the first year that there was a specific track and presentations focused around healthcare and life sciences with some big names presenting. This track was on Monday before the formal kickoff on Tuesday when the keynotes begin to happen. Presenters included the following:
- Cambia: who spoke on large scale machine learning and data science / big data
- Cerner: who spoke about having an on-demand DR strategy for their HealtheIntent platform
- Merck: who spoke about remaining in continuous compliance with life sciences and other regulatory standard
- UCSC Genomics Institute: who spoke about large scale cancer genomic data analysis
- Claritas Genomics: who participated in a panel with Motorola and Coursera discussing how to remain in compliance with regulatory requirements in education (FERPA), healthcare (HIPAA) and legal (CJIS).
These tracks were extremely popular with long lines. The overflow lines were very long as well and wound around entire hallways.
As you can gather from the talk titles themselves, a lot of the focus was on regulatory requirements and big data. We saw compliance emerge as potentially the most important topic amongst healthcare.
Healthcare, being more conservative than most industries, is embarking down this cloud journey with hybrid cloud deployments being the initial strategy. Organizations want to move to the cloud, but not immediately. They are looking for more of a transition.
The move to the cloud is being looked at as an opportunity because various reasons such as:
- Cost of Elasticity: as mentioned earlier, a lot of the initial efforts seem to be focused around big data and analytics. The numbers thrown out during those sessions were invariably in the multiple petabytes range. At that scale, the cloud is an ideal option with immediate scale available as needed.
- Clean slate: While some organizations have been in the cloud for a little while, the move to the cloud gives these organizations a chance to re-do and re-architect their solutions. Such as making them more service oriented, more micro services etc.
- Repeatability and reusability: While there is a heavy initial focus on the US, the organizations are also global in scope. With the added regulations around data sovereignty that are being imposed by governments, any IT strategy must be globally scalable. The cloud with multiple points of presence globally allows for addressing this need in a much more cost effective and faster manner.
- Building DevOps competency: This is increasingly important. Companies are realizing that scaling locally or globally requires automation. Especially because of repeatability as that is a core requirement for regulatory purposes. Managing separate global deployments becomes cost effective only if they're not snowflakes.
- Availability of toolsets: With a plethora of services such as EMR (elastic map reduce) which includes support for Hadoop, Spark etc., prebuilt AMIs to support analytics such as Tableau, Qlik and more, RDS and more, initial setup and learning curves are much shortened.
- Simplified logging and audit trails: A lot of the regulatory requirements come down to detailed audit trails and data extraction from logs. The cloud provides these raw data spouts quickly. Some effort is required, of course, to connect, parse and present this raw data in a usable format.
While all that is great, there still are some challenges.
- Volume of data: There is no simple answer here especially because of the large volumes of data mentioned earlier. Things like Snowball help get the initial load done but ongoing challenges like DR (disaster recovery) remain a challenge. Especially if that requires a 2x investment in infra. Cerner has approached this through an on-demand DR strategy where the raw data is replicated always but the infra needed to run analytics etc is spun up whenever needed and data reprocessed. This has the challenge of a full 100% recovery taking up to 12 hours but is still an impressive effort.
- Networking and encryption in transit: Networking is still a bit esoteric and challenging especially in the compliance world combined with enterprisey components. Combined these two require full encryption in-transit using rotating TLS keys, 2FA (two factor authentication), ADFS (Active Directory Federation Service) and IAM (AWS' Identity and Access Management) integrations and ongoing management of ACLs. Additional requirements like bastion servers gating access and separate VPCs for "management" and compute / data are also recommended as best practice.